forked from isi-avbulimen/R-cookbook
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path06-plotting.Rmd
474 lines (353 loc) · 20.9 KB
/
06-plotting.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
# Plotting and Data Visualisations {#plots}
This chapter provides some examples of how to visualise data in R.
For charts, we'll look at a few examples of how to create simple charts in base R but this chapter will focus mainly in using {ggplot2} to plot charts. For maps, we'll look at how to produce static choropleth maps using {ggplot2} and how to produce interactive maps using {leaflet}.
```{r include=FALSE}
# Every file must contain at least one R chunk due to the linting process.
library(readr)
library(ggplot2)
library(dplyr)
library(plotly)
library(leaflet)
```
## Plotting in base R
Before we look at using {ggplot2}, we'll look at how to plot simple charts using functions in base R. We'll use road accident data from 2005 to 2017 to demonstrate how to create line and bar charts.
```{r}
# read in road accident data
road_acc <- readr::read_csv(
file = "data/road_accidents_by_year_and_severity.csv")
head(road_acc)
```
### Line charts in base R ###
We want to plot a line chart of total road accidents against year. First we need to group our data by accident year and summarise to get a total sum of accidents for each year:
```{r}
road_acc_total <- road_acc %>%
dplyr::group_by(accident_year) %>%
dplyr::summarise(total = sum(total))
```
We'll now use the base R function 'plot', specifying the x and y axis and well as the plot type 'l' for line charts:
```{r}
plot(
x = road_acc_total$accident_year,
y = road_acc_total$total,
type = "l"
)
```
We can make our plot look better by specifying a colour, labelling the axes and giving our plot a title:
```{r}
plot(
x = road_acc_total$accident_year,
y = road_acc_total$total,
type = "l",
col = "red",
xlab = "Year",
ylab = "Total",
main = "Total road accidents per year, 2005 - 2017"
)
```
We can also force our plot to start at 0:
```{r}
plot(
x = road_acc_total$accident_year,
y = road_acc_total$total,
type = "l",
col = "red",
xlab = "Year",
ylab = "Total",
main = "Total road accidents per year, 2005 - 2017",
ylim=c(0, max(road_acc_total$total))
)
```
### Bar charts in base R ###
We want to plot a bar chart showing the total number of accidents of each severity in 2017. First let's filter the road accidents data for 2017:
```{r}
road_acc_2017 <- road_acc %>%
dplyr::filter(accident_year == 2017)
road_acc_2017
```
To create a bar chart, we use the base R function 'barplot' and then specify the variable we want to plot as the height argument:
```{r, echo=FALSE}
options(scipen=999)
```
```{r, message=FALSE}
barplot(height = road_acc_2017$total)
```
This isn't very useful as we can't see what the different severity types are. So we can specify these in the names.arg argument, as well as add a few extra features to make the plot look better:
```{r}
barplot(height = road_acc_2017$total,
names.arg = c("Fatal", "Serious", "Slight"),
width = 2, #width of the bars
col = "lightblue",
xlab = "Severity",
ylab = "Total accidents",
main = "Total accidents by severity, 2017")
```
## Plotting withh {ggplot2}
**What is {ggplot2}?**
The 'gg' in {ggplot2} stands for 'grammar of graphics'. This is a way of thinking about plotting as having grammar elements that can be applied in succession to create a plot. This is the idea that you can build every graph from the same few components: a dataset, geoms (marks representing data points), a co-ordinate system and some other things.
The ggplot() function from the {ggplot2} package is how you create these plots. You build up the graphical elements using the + symbol. Think about it as placing down a canvas and then adding layers on top.
**Why should I use {ggplot2} instead of the plot functions in base R?**
As with most things in R, there are multiple ways to do the same thing. This applies to creating visualisations as well. As will be demonstrated below we can replicate the graphs we have created above using {ggplot2} instead, pretty well.
One method is not necessarily better than the other, but at DfT we advocate using {ggplot2} when plotting charts. There is consistency in the way that {ggplot2} works which makes it easier to get to grips with for beginners. It is also part of the {tidyverse} which we have used earlier on in this cookbook so it shares the underlying design philosophy, grammar, and data structures.
### Line charts with {ggplot2}
When plotting with {ggplot2}, we start with the ggplot() function which creates a blank canvas, then the next layer we add is the plot type. For line charts, this would be ggplot() + geom_line(). Within these functions, we specify our data and our aesthetic mappings i.e. what variables to map to the x and y axes from the specified data. If we were create a simple line chart for total accidents against year, the code would read as follows:
```{r}
ggplot(data = road_acc_total) +
geom_line(mapping = aes(x = accident_year, y = total))
```
So a reusuable template for making graphs would be as below, with the bracketed sections in the code replaced with a dataset, a geom function and a collection of mappings:
ggplot(data = name_of_dataset) +
geom_function(mapping = aes())
You can find a range of different plot types available in {ggplot2}, as well as tips on how to use them in the {ggplot2} cheatsheet (https://www.rstudio.com/wp-content/uploads/2016/11/ggplot2-cheatsheet-2.1.pdf).
Let's create the same the line chart we created with base R, showing total road accidents against year:
```{r}
ggplot(data = road_acc_total) +
geom_line(aes(x = accident_year, y = total), col = "red") +
xlab("Year") +
ylab("Total") +
ggtitle("Total road accidents per year, 2005 - 2017") +
scale_x_continuous(breaks = seq(2000, 2017, 2)) +
expand_limits(y=0) +
theme_classic()
```
Here we have specified the colour of the line (notice this is outside of the aes() function). We have also labelled the x and y axes and given the plot a title.
We have used scale_x_continuous() function to specify the intervals in the year variable and have used a theme to set the style of our plot (more on this later).
### Aesthetic mappings
To get more insight into our data, we can add a third variable to our plot by mapping it to an aesthetic. This is a visual property of the objects in our plot such as the size, the shape or the colour of our points. For example, if we want to see the total number of road accidents by severity type against year, we can map the 'name' variable to the colour aesthetic:
```{r}
ggplot(data = road_acc) +
geom_line(mapping = aes(x = accident_year, y = total, colour = name)) +
scale_x_continuous(breaks = seq(2000, 2017, 2))
```
So unlike before, we specify the colour within the aes() function, rather than outside of it.
### Bar charts with {ggplot2}
When creating bar charts with {ggplot2}, we can use geom_bar() or geom_col(). geom_bar() makes the height of the bar proportional to the number of cases in each group while geom_col() enables the heights of the bars to represent values in the data. So if your data is not already grouped you can use geom_bar() like so:
```{r, message=FALSE}
messy_pokemon <- readr::read_csv(
file = "data/messy_pokemon_data.csv")
head(messy_pokemon)
ggplot(data = messy_pokemon) +
geom_bar(mapping = aes(x = weight_bin))
```
This has grouped our data by weight_bin with the height of the bars representing the number of pokemon in each weight bin.
Let's recreate the total accidents by severity chart, now using {ggplot2} instead:
```{r}
ggplot(data = road_acc_2017) +
geom_col(mapping = aes(x = name, y = total), fill = "lightblue", col = "black")+
xlab("Severity") +
ylab("Total accidents") +
ggtitle("Total accidents by severity, 2017")+
theme_classic()
```
Here we have used geom_col() instead but our data is already grouped so we simply want the height of the bars to represent the values in the data i.e. the number of accidents of each type of severity.
You could also use geom_bar() in this situation but you would need to add an extra argument: stat = "identity".
```{r}
ggplot(data = road_acc_2017) +
geom_bar(mapping = aes(x = name, y = total), fill = "lightblue", col = "black", stat = "identity")+
xlab("Severity") +
ylab("Total accidents") +
ggtitle("Total accidents by severity, 2017") +
theme_classic()
```
## DfT colours
So far we've used colours built into R and referred to them by name e.g. red, lightlue etc. In order to make charts using DfT colours, we can specify what colours we want using hexcodes. For example, for the previous bar chart we can set the levels of severity to different DfT colours.
```{r}
ggplot(data = road_acc_2017) +
geom_col(mapping = aes(x = name, y = total, fill = name))+
xlab("Severity") +
ylab("Total accidents") +
ggtitle("Total accidents by severity, 2017") +
theme_classic() +
scale_fill_manual(values = c("#C99212", "#D25F15", "#006853"))
```
Here we map the name variable to the fill argument within the aesthetic.
## DfT Theme
We mentioned themes earlier. Themes are used to set the style of your plot and can give your plots a consistent customized look. You can customise things such as titles, labels, fonts, background, gridlines, and legends. We have been using theme_classic() so far but {ggplot2} has a number of built-in themes that you can use for your plots available here: [https://ggplot2.tidyverse.org/reference/ggtheme.html](https://ggplot2.tidyverse.org/reference/ggtheme.html)
You can modify aspects of a theme using the theme() function. For example, for our previous accidents plot, we can remove the legend title and move the position of the plot title.
```{r}
ggplot(data = road_acc_2017) +
geom_col(mapping = aes(x = name, y = total, fill = name))+
xlab("Severity") +
ylab("Total accidents") +
ggtitle("Total accidents by severity, 2017") +
theme_classic() +
scale_fill_manual(values = c("#C99212", "#D25F15", "#006853"))+
theme(legend.title = element_blank(),
plot.title = element_text(hjust = 0.5))
```
### Custom DfT Theme
Instead of using the built-in themes, we can create our own theme to apply to our plots. Below I have created a 'DfT theme' for line charts which sets things such as the font types, sizes, axes lines and title positions to make the plot consistent with what we might put in a DfT publication.
You can adjust and tweak this theme and the colours to create plots in the correct style for your own publications.
```{r}
# set DfT colours
dft_colour <- c("#006853", # dark green
"#66A498", # light green
"#D25F15", # orange
"#E49F73", # light orange
"#C99212", # yellow
"#E9D3A0", # pale yellow
"#0099A9", # blue
"#99D6DD") # light blue
# create theme
# note: 'sans' in R refers to the font 'Arial' in Windows
theme_dft <- ggplot2::theme(axis.text = element_text(family = "sans", size = 10, colour = "black"), # axis text
axis.title.x = element_text(family = "sans", size = 13, colour = "black", # x axis title
margin = margin(t = 10)),
axis.title.y = element_blank(),
plot.title = element_text(size = 16, family = "sans", hjust = 0.5),
plot.subtitle = element_text(size = 13, colour = "black", hjust = -0.05,
margin = margin(t = 10)),
legend.key = element_blank(), # make the legend background blank
legend.position = "bottom", # legend at the bottom
legend.direction = "horizontal", # legend horizontal
legend.title = element_blank(), # remove legend title
legend.text = element_text(size = 9, family = "sans"),
axis.ticks = element_blank(), # remove tick marks
panel.grid = element_blank(), # remove grid lines
panel.background = element_blank(), # remove background
axis.line.x = element_line(size = 0.5, colour = "black"))
# use the DfT colours and the DfT theme for the accidents by severity line chart
ggplot(data = road_acc) +
geom_line(mapping = aes(x = accident_year, y = total, colour = name), size = 1.5) +
labs(title = "Accidents by severity, 2005 to 2017",
x = "Accident Year",
y = "")+
scale_x_continuous(breaks = seq(2005, 2017, 2))+
scale_colour_manual(values = dft_colour) + #here is where you apply the dft colours
theme_dft #here we specify our custom theme
```
## More {ggplot2} charts
This section provides more code examples for creating {ggplot2} themes, including the DfT theme code.
First, some more record-level road accident data is read in which can be used with the charts. Note that the data is being read in as a .rds object. A .rds object is an R data object which can be read in as a data frame:
```{r}
# read in road accident data
road_acc_data <- readr::read_rds("data/road_accidents_2017.RDS")
head(road_acc_data)
```
### Scatter plots
The scatter plot example will plot the number of accidents in 2017 for each police force.
To create a scatter plot use **geom_point** in the ggplot2 code.
Note that to draw out a colour for the points, the code specifies which colour in the list of DfT colours (see DfT colours sub-chapter 7.4) the points will be. In this case, the third colour in the list will be the colour of the points.
Labels are added to the scatter plot using **geom_text**.
```{r}
# First get total number of accidents for each police force
accident_pf <- road_acc_data %>%
dplyr::group_by(Police_Force) %>%
dplyr::tally()
# use the DfT colours and the DfT theme for the accidents by police force scatter chart
ggplot(data = accident_pf, aes(x = Police_Force, y = n)) +
geom_point(color = dft_colour[3], size = 1.5) +
labs(title = "Reported Road Accidents by Police Force",
x = "Police Force",
y = "Number of Accidents")+
scale_y_continuous(breaks = seq(0, 30000, 2000)) + # set y axis to go from 0 to 30,000
geom_text(aes(label=Police_Force), size=3, hjust = 0, vjust = 0) + # amend hjust and vjust to change position
theme_dft #here we specify our custom theme
```
The police forces are labelled with numbers, but the chart shows that police force 1 (Metropolitan Police) has the highest number of road accidents in 2017.
### Horizontal bar chart
The scatter plot showing the number of accidents by police force could also be shown in a horizontal bar chart.
Use **geom_col** plus **coord_flip** to create a horizontal bar chart.
For the horizontal bar chart, bars will be shown in descending order, with the police force with the largest value at the top of the chart. This is done by ensuring the data is arranged by the number of accidents ("n").
As this is categorical data, police force is made a factor, with each police force made a separate level.
```{r}
# Arrange police force data by size
accident_pf <- arrange(accident_pf, n)
# take a subset for charting purposes
accident_pf_small <- dplyr::filter(accident_pf, n < 600)
# Make police force a factor
accident_pf_small$Police_Force <-
factor(accident_pf_small$Police_Force, levels = accident_pf_small$Police_Force)
# use the DfT colours and the DfT theme for the accidents by police force scatter chart
ggplot(data = accident_pf_small) +
geom_col(mapping = aes(x = Police_Force, y = n, fill = Police_Force)) +
coord_flip() + # make bar chart horizontal
labs(title = "Reported Road Accidents by Police Force, 2017",
x = "Police Force",
y = "Number of Accidents")+
scale_y_continuous(breaks = seq(0, 500, 50)) + # set y axis running from 0 to 500
scale_fill_manual(values = dft_colour) +
theme_dft + #here we specify our custom theme
theme(legend.position = "none") # command to remove legends
```
The chart shows that police force 98 (Dumfries and Galloway) recorded the lowest number of accidents in 2017.
### Stacked bar chart
This example will create a stacked bar chart showing the percentage of road accidents in each accident severity category, for each year.
Creating a **percentage stacked bar** requires using the **geom_bar** command in ggplot2 and setting the position to **fill**.
```{r}
# Make accident year a factor so year on x-axis is labelled individually
#(otherwise axis will be continuous)
road_acc_year <- road_acc
# make year a factor
road_acc_year$accident_year <- as.factor(road_acc_year$accident_year)
# use the DfT colours and the DfT theme for the accidents by police force scatter chart
ggplot(data = road_acc_year, aes(fill = name, y=total, x= accident_year)) +
geom_bar(position="fill", stat="identity") + # geom bar with position fill makes stacked bar
labs(title = "Percentage Accident Severity, by Year",
x = "Accident Year",
y = "% Accident Severity")+
scale_fill_manual(values = dft_colour) +
theme_dft
```
If you want the stacked bar chart to show numbers instead of percentages use **position = "stack"** instead.
```{r}
# use the DfT colours and the DfT theme for the accidents by police force scatter chart
ggplot(data = road_acc_year, aes(fill = name, y=total, x= accident_year)) +
geom_bar(position="stack", stat="identity") +
labs(title = "Number of accidents by severity, by Year",
x = "Accident Year",
y = "% Accident Severity")+
scale_fill_manual(values = dft_colour) +
theme_dft
```
## Interactive charts with {plotly}
{plotly} is a graphing library which makes interactive html graphs. It uses the open source JavaScript graphing library plotly.js. It is great for building dashboards or allowing the user to interact with the data themselves.
{plotly} is not necessarily good for publications as the charts are html but can be useful for exploratory analysis or QA notes. It allows you to zoom into certain parts of the chart and toggle between different categories.
It can be used in two ways - either with {ggplot2} (easier option) or using the plot_ly() wrapper directly which gives you more control.
### {plotly} with {ggplot2}
Taking our previous accidents by severity plot, we can simply assign this to an object and use the ggplotly() function to make it interactive.
```{r, message=FALSE}
library(plotly)
road_acc_chart <- ggplot(data = road_acc) +
geom_line(mapping = aes(x = accident_year, y = total, colour = name), size = 1.5) +
labs(title = "Accidents by severity, 2005 to 2017",
x = "Accident Year",
y = "")+
scale_x_continuous(breaks = seq(2005, 2017, 2))+
scale_colour_manual(values = dft_colour) +
theme_dft
plotly::ggplotly(road_acc_chart)
```
Since we set our legend to be horizontal in our DfT theme, we get a warning message as plotly.js does not support horizontal legend items yet.
### plot_ly()
Using plot_ly() is similar to {ggplot2} in some ways as we build the plot in layers but we use the pipe operator rather than a plus sign, and the ~ sign before the variables in our data:
```{r}
plotly::plot_ly(road_acc, x = ~accident_year, y = ~total) %>% # specify the data and x and y axes
plotly::add_lines(color = ~name, colors = dft_colour[1:3]) %>% # colour lines by severity
plotly::layout(title = "Accidents by severity, 2005 to 2017",
xaxis = list(title = "Accident Year"),
yaxis = list(title = ""))
```
You can find more information about using {plotly} in R from the following websites:
* Graphing library with example code: [https://plot.ly/r/](https://plot.ly/r/)
* Cheat sheet: [https://images.plot.ly/plotly-documentation/images/r_cheat_sheet.pdf](https://images.plot.ly/plotly-documentation/images/r_cheat_sheet.pdf)
* E-book: [https://plotly-r.com/index.html](https://plotly-r.com/index.html)
## Mapping in R
There are a wide range of packages you can use to produce maps in R. For static choropleth maps, we're going to focus on using {ggplot2} which we are now familiar with.
### Mapping with {ggplot2}
To complete.
### Mapping with {leaflet}
The {leaflet} package can be used to create interactive maps in R. Similar to {ggplot2}, you start with a base map and then add layers (i.e. features). We're going to map some search and rescue helicopter data, using the longitude and latitude.
```{r}
library(leaflet)
sarh <- readr::read_csv(file = "data/SARH_spatial.csv")
head(sarh)
#plot the interactive map using leaflet
leaflet::leaflet(data = sarh) %>%
leaflet::addProviderTiles(provider = providers$Esri.NatGeoWorldMap) %>% #select the type of map you want to plot
leaflet::addMarkers(lng = ~longitude_new,
lat = ~ latitude_new,
popup = ~ htmltools::htmlEscape(Base), # what appears when you click on a data point
label = ~ htmltools::htmlEscape(Domain) # what appears when you hover over a data point
)
```
If you want to save this as a static map, you could simply export is an image. More information on using {leaflet} in R can be found here: [https://rstudio.github.io/leaflet/](https://rstudio.github.io/leaflet/)