This document is the web-based version of a presentation given through the University of Idaho library workshop series on September 12, 2017. The text is fairly sparse because this is primarily a reference based on workshop slides. However, it does provide plenty of examples that scaffold through ggplot2 complexity, and has a list of resources for learning more at the end. Good luck, and have fun!
an R package for data visualization
stands for “grammar of graphics”
Popular (well-supported, great community)
Open source (like all of R)
Easy to use (after a learning curve)
Aesthetically pleasing
Built for multi-variate data
Reproducible figures
You can make anything!
Teach you enough that you know how to teach yourself more!
Teach you enough that you know how to teach yourself more!
Introduce “grammar of graphics” structure
Send you away with a list of resouces (including these slides)
Grammar of graphics
ggplot(data = diamonds, aes(x = carat, y = price, color = clarity)) +
geom_point() +
facet_grid(color ~ cut)
Download scripts from the following site: https://is.gd/IYoXwA
Open RStudio.
In RStudio, open “install_packages.R”. Highlight the text and click “Run”.
Still in RStudio, open “workshop_script.R”. We’ll work from this for the rest of the presentation.
Load packages with the “library()” commands at the top of the script.
First, we need data.
Let’s use the built-in R dataset, “diamonds”.
carat | cut | color | clarity | depth | table | price | x | y | z |
---|---|---|---|---|---|---|---|---|---|
0.23 | Ideal | E | SI2 | 61.5 | 55 | 326 | 3.95 | 3.98 | 2.43 |
0.21 | Premium | E | SI1 | 59.8 | 61 | 326 | 3.89 | 3.84 | 2.31 |
0.23 | Good | E | VS1 | 56.9 | 65 | 327 | 4.05 | 4.07 | 2.31 |
0.29 | Premium | I | VS2 | 62.4 | 58 | 334 | 4.20 | 4.23 | 2.63 |
0.31 | Good | J | SI2 | 63.3 | 58 | 335 | 4.34 | 4.35 | 2.75 |
0.24 | Very Good | J | VVS2 | 62.8 | 57 | 336 | 3.94 | 3.96 | 2.48 |
ggplot(data = diamonds)
ggplot(data = diamonds,
aes(x = carat, y = price))
ggplot(data = diamonds,
aes(x = carat, y = price)) +
geom_point()
ggplot(data = diamonds,
aes(x = carat, y = price, color = clarity)) +
geom_point()
ggplot(data = diamonds,
aes(x = carat, y = price), color = clarity) +
geom_point()
ggplot(data = diamonds,
aes(x = carat, y = price)) +
geom_point(color = "magenta")
(with a small difference:)
ggplot(data = diamonds,
aes(x = carat, y = price)) +
geom_point(aes(color = clarity))
ggplot(data = diamonds,
aes(x = carat, y = price)) +
geom_point(aes(color = clarity)) +
geom_smooth()
ggplot(data = diamonds,
aes(x = carat, y = price)) +
geom_point(aes(color = clarity)) +
geom_smooth(color = "black", size = 0.8, linetype = 2)
ggplot(data = diamonds,
aes(x = carat, y = price)) +
geom_point(aes(color = clarity)) +
geom_smooth(color = "black", size = 0.8, linetype = 2) +
theme_few()
ggplot(data = diamonds,
aes(x = carat, y = price)) +
geom_point(aes(color = clarity)) +
geom_smooth(color = "black", size = 0.8, linetype = 2) +
facet_wrap(~cut)
ggplot(data = diamonds, aes(x = carat, y = price, color = clarity)) +
geom_point(size = 0.5) +
facet_grid(color~cut)
ColorBrewer is useful and popular:
ggplot(data = diamonds, aes(x = carat, y = price)) +
geom_point(aes(color = clarity)) +
theme_few() +
scale_color_brewer(type = "qual", palette = "Set2")
ggplot(data = diamonds,
aes(x = clarity, y = price)) +
geom_violin()
ggplot(data = diamonds,
aes(x = price, y = cut)) +
geom_density_ridges()
ggplot(data = diamonds,
aes(x = price, y = cut, color = cut, fill = cut)) +
geom_density_ridges(alpha = 0.8, scale = 5) +
scale_fill_viridis(option = "A", discrete = TRUE) +
scale_color_viridis(option = "A", discrete = TRUE) +
theme_few()
ggplot(data = diamonds,
aes(x = price, y = cut, fill = cut)) +
geom_density_ridges(alpha = 1, scale = 5) +
scale_fill_manual(values = wes_palette("Darjeeling")) +
theme_few()
kable(head(iris))
Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
---|---|---|---|---|
5.1 | 3.5 | 1.4 | 0.2 | setosa |
4.9 | 3.0 | 1.4 | 0.2 | setosa |
4.7 | 3.2 | 1.3 | 0.2 | setosa |
4.6 | 3.1 | 1.5 | 0.2 | setosa |
5.0 | 3.6 | 1.4 | 0.2 | setosa |
5.4 | 3.9 | 1.7 | 0.4 | setosa |
iris_long <- melt(iris, id.vars = ("Species"))
kable(head(iris_long))
Species | variable | value |
---|---|---|
setosa | Sepal.Length | 5.1 |
setosa | Sepal.Length | 4.9 |
setosa | Sepal.Length | 4.7 |
setosa | Sepal.Length | 4.6 |
setosa | Sepal.Length | 5.0 |
setosa | Sepal.Length | 5.4 |
ggplot(iris_long, aes(x = Species, y = value, fill = variable)) +
geom_bar(stat = 'identity', width = 1) +
theme_bw()
ggplot(iris_long,
aes(x = Species, y = value, color = variable, fill = variable)) +
geom_bar(stat = 'identity', width = 1) +
coord_polar(theta = 'x') +
theme_bw()
kable(head(gapminder))
country | continent | year | lifeExp | pop | gdpPercap |
---|---|---|---|---|---|
Afghanistan | Asia | 1952 | 28.801 | 8425333 | 779.4453 |
Afghanistan | Asia | 1957 | 30.332 | 9240934 | 820.8530 |
Afghanistan | Asia | 1962 | 31.997 | 10267083 | 853.1007 |
Afghanistan | Asia | 1967 | 34.020 | 11537966 | 836.1971 |
Afghanistan | Asia | 1972 | 36.088 | 13079460 | 739.9811 |
Afghanistan | Asia | 1977 | 38.438 | 14880372 | 786.1134 |
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_point()
Who’s the outlier?
gapminder %>% filter(gdpPercap > 60000)
## # A tibble: 5 x 6
## country continent year lifeExp pop gdpPercap
## <fctr> <fctr> <int> <dbl> <int> <dbl>
## 1 Kuwait Asia 1952 55.565 160000 108382.35
## 2 Kuwait Asia 1957 58.033 212846 113523.13
## 3 Kuwait Asia 1962 60.470 358266 95458.11
## 4 Kuwait Asia 1967 64.624 575003 80894.88
## 5 Kuwait Asia 1972 67.712 841934 109347.87
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_hex()
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_density2d(aes(color = ..level..), bins = 20) +
scale_color_viridis()
p <- ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_point(aes(color = continent, size = pop), alpha = 0.8) +
scale_x_continuous(trans = 'log') +
facet_wrap(~year) +
scale_color_brewer(type = "Qual", palette = "Accent") +
theme_hc(bgc = 'darkunica') +
theme(text = element_text(size = 9))
p
Prepare data:
country_df <- map_data('world') %>%
rename("country" = "region")
country_df$country[country_df$country == "USA"] <- "United States"
#Take the mean across all years for each country:
gapminder_means <- gapminder %>%
group_by(country, continent) %>%
summarise(lifeExp = mean(lifeExp),
pop = mean(pop),
gdpPercap = mean(gdpPercap))
plot_dat <- left_join(gapminder_means, country_df, by = "country")
ggplot(plot_dat) +
geom_polygon(aes(x = long, y = lat, fill = lifeExp, group = group)) +
scale_fill_viridis(option = "A") +
coord_quickmap() +
theme_few()
What questions could we ask with this data?
How could we visually answer those questions?
Adrienne Marshall mars7850@vandals.uidaho.edu