forked from isi-avbulimen/R-cookbook
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathbookdown-demo.Rmd
596 lines (420 loc) · 32.1 KB
/
bookdown-demo.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
---
title: "DfT R Cookbook"
author: "Isi Avbulimen, Hannah Bougdah, Tamsin Forbes, Tim Taylor"
date: "`r Sys.Date()`"
site: bookdown::bookdown_site
output:
bookdown::gitbook:
df_print: kable
fig_width: 7
fig_height: 6
documentclass: book
bibliography: [book.bib, packages.bib]
biblio-style: apalike
link-citations: yes
github-repo: departmentfortransport/R-cookbook
description: "Guidance and code examples for R usage for DfT and beyond"
---
# {-}
Placeholder
## Coding standards
## Data
## Work in progress
<!--chapter:end:index.Rmd-->
# The basics {#basics}
Placeholder
## R family
## DfT R/RStudio - subject to change
## RStudio IDE
### Left (bottom left if you have scripts open)
### Top right; **Environment**, and other tabs
### Bottom right
### Other handy buttons in RStudio IDE
### RStudio menu bar a few pointers
## Projects
### Folders
### sessionInfo()
## R memory
## A note on rounding
## Assignment operators `<-` vs `=`
## Arithmetic operators
## Relational operators
## Logical operators
## Vectors
### Types {#vector-types}
### Conversion between atomic vector types
<!--chapter:end:01-basics.Rmd-->
# Data Importing/Exporting and interaction with other programmes {#data-import}
Placeholder
## Libraries
## Star functions
## Navigating folders
### Down
### Up
## .rds
## .csv
## .xlsx and .xls
### Single worksheet - single workbook
### Single worksheet - many workbooks
### Many worksheets - single workbook
### Many worksheets - many workbooks
### Non-rectangular data - single worksheet - single workbook
### Non-rectangular data - single worksheet - many workbooks
### Non-rectangular data - many worksheets - single workbook
### Non-rectangular data - many worksheets - many workbooks
## Exporting to .xlsx
## .sav
## SQL
## GCP
### BigQuery
## Copy data to clipboard
<!--chapter:end:02-data-import.Rmd-->
# Table/Data Frame manipulation {#tables}
Placeholder
## Pivot and reshape tables
## Dropping and selecting columns
## Rename variables
## Filtering data
## Group data
## Order data
## Get counts of data
## Combine tables
## Joining tables
## Select specific columns in a join
## Sum rows or columns
### Sum rows
### Sum columns
## Replace NAs or other values
## Reordering rows/columns
### Reordering rows
### Reordering columns
## Creating new variables
## Summarising data
## Look up tables
<!--chapter:end:03-table-manipulation.Rmd-->
# Working with dates and times {#dates-times}
Placeholder
## Working with dates
### Converting a character to a date
### Get year from date
### Get month from date
### Get day from date
### Get weekday from date
### Get quarter from date
### Find difference between two dates
### Convert month (integer to character)
### Convert month (character to integer)
### Merge separate date information into a date
## Working with date-times
### Converting a character to a date-time
### Extract date from date time variable
### Convert character to hms (time) variable
### Extract hour from time
### Extract minute from time
### Extract second from time
### Merge separate time information into one variable
### find the difference between two times
<!--chapter:end:04-dates.Rmd-->
# Working with factors {#factors}
Placeholder
## Common uses
### Tabulation of data
### Ordering of data for output
### Statistical models
## Other things to know about factors
### Renaming factor levels
### Combining factors does not result in a factor
### Numeric vectors that have been read as factors
## Helpful packages
<!--chapter:end:05-factors.Rmd-->
# Plotting and Data Visualisations {#plots}
Placeholder
## Plotting in base R
### Line charts in base R ###
### Bar charts in base R ###
## Plotting withh {ggplot2}
### Line charts with {ggplot2}
### Aesthetic mappings
### Bar charts with {ggplot2}
## DfT colours
## DfT Theme
### Custom DfT Theme
## More {ggplot2} charts
### Scatter plots
### Horizontal bar chart
### Stacked bar chart
## Interactive charts with {plotly}
### {plotly} with {ggplot2}
### plot_ly()
## Mapping in R
### Mapping with {ggplot2}
### Mapping with {leaflet}
<!--chapter:end:06-plotting.Rmd-->
# DfT RAP guidance {#RAP}
Placeholder
## What
## When
## Additional resources
<!--chapter:end:07-RAP.Rmd-->
# Spatial Analysis in R {#spatial}
```{r setup and read in zapmap, echo=FALSE, include=FALSE}
library(dplyr) #manipulating data
library(readr) #reading in data
library(tmap) ## mapping software
library(sf)
#library(rmapshaper)
knitr::opts_chunk$set(echo = FALSE)
Sys.setenv(TZ='UTC')
```
## What is Spatial Data?
For most work in R we work with flat data, i.e. data that is only 1 or 2 dimensions. A data frame, for example, has two dimensions (rows representing observations and columns representing variables), and no other information outside of that two dimensional array. Spatial data however needs some extra information for it to accurately represent actual locations on the Earth: the coordinates of the object; and a system of reference for how the coordinates relate to a physical location on Earth.
## Packages
To hold spatial data, we need to leverage packages which exist outside of Base R. We will be using the {sf} package for these example, but note that it is common to see spatial data held in the {sp}, {geojson}, and {raster} packages as well, all of which have their own advantages and disadvantages (it is not uncommon to have to switch from one to another to leverage these advantages - more on that later).
The advantage of using the {sf} package is that data is kept in as similar a format to a flat data frame as possible. This makes it easier to work with our data for two reasons. First, it lets us use {dplyr} data manipulations or join our spatial data with non-spatial data (some spatial data formats don't let you do this). Second, it's just simplier to wrap your head around.
## Reading in Spatial Data
Let's have a look at some data. We can read in our data as a spatial object using the **st_read** funciton from the {sf} package (if it is saved in an .shp format and has necessary metadata). We'll be working with some shapefiles which represent local authorities which are sourced from the ONS.
*Reading in Spatial Data*
```{r read in data, echo = TRUE, eval = TRUE, error = FALSE}
LAs <- sf::st_read("data/ONS Geography")
head(LAs)
```
Looking at the message we get when we load in our data, we can find out some basic information about our sf dataframe:
* It tells us that it is a "Simple feature collection with 391 features and 10 fields". In other words, this is telling us that it is an sf dataframe of 391 observations (features) and 10 variables (fields).
* The geometry type is "MULTIPOLYGON". To break this down, the "POLYGON" is telling us we're dealing with distinct shapes for each observation (as opposed to lines or dots), and the "MULTI" is telling us that one observation can be represented by multiple polygons. An example of this would be the Shetland Islands, a single local authority made up of many small polygons.
* The "bbox"" gives us the coordinates of a bounding box which will contain all the features of the sf data frame.
* The "epsg" and "proj4string" gives us the coordinate reference system. The most common code you will see is 4326 which refers to the "World Geodetic System 1984". This is the coordinate reference system you are probably most familiar with (without realising it), as it is used in most GPS and mapping software. Points in the WGS system are referred to by their "latitude" and "longitude". In this case, the epsg code is "NA", in which case we will have to refer to the documentation to discern the correct epsg code to apply to the data.
We can also ask R to give us all of this info with the **st_geometry** function.
*Inspect spatial data*
```{r echo = TRUE, eval = TRUE, error = FALSE}
sf::st_geometry(LAs)
```
## CRS Projections
[There's a lot that can be said](https://www.axismaps.com/guide/general/map-projections/) about the utlity of different Coordinate Reference Systems, but long-story-short they all have unique advantages and disadvantages as they distort shape, distance, or size in different ways. In our day-to-day work, you're most likely to encounter two coordinate reference systems: the World Geodatic System (WGS84 - EPSG code: 4326), or the Ordance Survey Great Britain 1936 system (OSGB36 - EPSG code: 27700).
The WGS84 system is best suited to global data - for example plotting international ports or a world map of countries. It is a geodatic system, which means the points it defines refer to a point on a 3d elipsoid which represents the earth.
The OSGB1936 system is best suited to mapping the UK or areas within it. It is a projected system, which means it projects the 3d curve of the earth into a 2d object. An advantage of this is that eastings (the x-axis of this projection) remains consistent at all northings (the y-axis). For example, a 1000 unit change in eastings is a 1km change *anywhere*, whereas a 1 unit change in longitude is a different distance depending on the latitude, as the earth gets narrower the further away you get from the equator.
Complexities aside, practically what this means is that you will always want *all* of you're spatial data to be in the same CRS, otherwise spatial joins and mapping functions will start throwing up errors. We can change the crs projection very easily with the {sf} function **st_transform**.
*Changing Coordinate Reference System*
```{r echo = TRUE, eval = TRUE, error = FALSE}
LAs <- sf::st_transform(LAs, crs = 27700)
sf::st_crs(LAs)
```
This changes the CRS projection from the WGS84 system to the OSGB1936 system, converting all of the coordinates to the new system. We can use the crs argument to change to any number of different projections.
Note that this only works if the spatial object has data on the coordinate reference system already. If your data is missing *all* of this information, you will have to set the crs manually (if you know what the CRS projection *should be*):
*Setting Coordinate Reference System*
```{r echo = TRUE, eval = FALSE, error = FALSE}
sf::st_crs(sf_object) <- 27700
```
## Manipulating Spatial Data
As mentiond above, one of the key advantages of working with spatial data in {sf} format is that we can use tidyverse functions to manipulate our data. Let's show this in action by creating a new variable.
The variable "lad19cd" has a unique code for each local authority. All codes begin with a letter, followed by a numeric series. The letter refers to the country ("E" for England, "W" for Wales, "S" for Scotland, and "N" for Northern Ireland). We can use this to create a new variable called "country" for each feature.
*Manipulating {sf} data with dplyr*
```{r, echo = TRUE, eval = TRUE, error = FALSE}
LAs <- LAs %>% dplyr::mutate(country = dplyr::case_when(stringr::str_detect(lad19cd, "W") ~ "Wales",
stringr::str_detect(lad19cd, "S") ~ "Scotland",
stringr::str_detect(lad19cd, "N") ~ "Northern Ireland",
stringr::str_detect(lad19cd, "E") ~ "England"))
LAs %>% dplyr::group_by(country) %>% dplyr::tally()
```
As we can see, we've manipulated our data in exactly the same way as we would have with a flat dataframe.We can also join spatial and non-spatial data together in the same way we normally would with dataframes.
Let's take some population data and join it to our Local Authorities.
*Joining Spatial and non-spatial data*
```{r, echo = FALSE, eval = TRUE, error = FALSE, warning = FALSE, message = FALSE}
LAs_pop <- read_csv("data/LAs_Populations.csv")
LAs_pop <- LAs_pop %>%
dplyr::rename(lad19cd = Code, population = `All ages`, LA_name = Name)
LAs_pop <- LAs_pop %>%
dplyr::select(lad19cd, population)
```
```{r, echo = TRUE, eval = TRUE, error = FALSE, message = FALSE}
summary(LAs_pop)
```
This dataframe of local authority populations has 2 variables, local authority code (lad19cd) and population. We can use a {dplyr} **left_join** to join this to our {sf} dataframe as we would with a flat dataframe.
```{r, echo = TRUE, eval = TRUE, error = FALSE}
names(LAs_pop)
names(LAs)
LAs <- dplyr::left_join(LAs, LAs_pop, by = "lad19cd")
```
```{r, echo = FALSE, eval = TRUE, error = FALSE}
LAs$population <- as.numeric(LAs$population)
```
## Spatial Analysis - Joins
A common task we might want to conduct is to manipulated a spatial object based on a relationship with another spatial object. To do this, we will use the **st_join** function from the {sf} package. The **st_join** allows us to join data on a number of different relationships, the simplest of which examines whether an object intersects another object - i.e. if there is any point where the geometries of both objects are identical. This could be where two lines cross, where two different spatial points objects have the same gemoetry, or it could be a point falling within the bounds of a polygon.
Let's look at some data representing the coordinates of road traffic accidents in Great Britain.
*Read in csv*
```{r, echo = FALSE, eval = TRUE, error = FALSE, warning = FALSE}
crash_data <- read.csv("data/road_traffic_accidents.csv")
crash_data <- crash_data %>% dplyr::select(-Longitude, -Latitude, -Local_Authority_.District., -Local_Authority_.Highway.)
crash_data$Date <- as.Date(crash_data$Date)
crash_data <- crash_data %>% dplyr::filter(Date >= as.Date("01/01/2018"))
```
```{r, echo = TRUE, eval = TRUE, error = FALSE, warning = FALSE}
class(crash_data)
names(crash_data)
```
Our crash data is in a flat csv format, BUT it has variables representing the easting and northing of where each acccident took place. We can use this to turn our crash data into an sf object with the function **st_as_sf**. We take our dataset, define which variables R should use to define coordinates in the "coords" argument, and define the "crs" that those coordinates are plotted in.
*Transforming flat data frames to spatial data*
```{r, echo = TRUE, eval = TRUE, error = FALSE, warning = FALSE}
crashes <- crash_data %>% dplyr::filter(!is.na(Location_Easting_OSGR) & !is.na(Location_Northing_OSGR)) %>%
sf::st_as_sf(coords = c("Location_Easting_OSGR","Location_Northing_OSGR"), crs = 27700)
sf::st_geometry(crashes)
```
The crashes sf dataframe holds information on all reported road traffic accidents in the UK since 2005. It has a number of variables for each accident, such as the accident severity, number of vehicles, number of casualties, time of day, and road and weather conditions.
Let's have a look at the most severe accidents (accidents with 5 or more casualties).
*Quick map of spatial data*
``` {r, echo = TRUE, eval = TRUE, error = FALSE, warning = FALSE}
crashes %>% dplyr::filter(Number_of_Casualties == 5) %>% tmap::qtm()
```
The **qtm** function is a useful function from {tmap} that let's us quickly create maps with a minimum number of arguments. This is often useful for sense-checking data and can help you spot errors in data manipulation when working with spatial data. Here, for example, we can see that our data largely makes sense, so our conversion to {sf} was likely correct. We can (roughly) make out the shape of Great Britain, and we see some clusters around major cities (London, Birmingham, Manchester etc.). We can also see that we don't appear to have data for Northern Ireland in this dataset (either that or Northern Irish drivers are much safer drivers).
Let's see can we work out which Local Authority has had the most road traffic casualties in this period.
We can use a **spatial join** to assign a local authority to each road traffic accident. This works similarly to a {dplyr} left_join, with all of the variables from the **y** table being added to each observation of the **x** table which match eachother by a defined variable. The difference here is that the two tables are joined based on a shared **geometry**, rather than a shared value of a variable. The shared geometry here being whether a point intersects with a polygon (in other words, whether the point falls within a polygon).
*Joining spatial data*
```{r, echo = TRUE, eval = TRUE, error = FALSE, warning = FALSE}
crashes_join <- sf::st_join(crashes, LAs, join = st_intersects) # essentially a left_join of LAs to crashes
crashes_join$lad19nm <- as.character(crashes_join$lad19nm)
LAs_casualties <- crashes_join %>% dplyr::group_by(lad19nm) %>% dplyr::summarise(casualties = sum(Number_of_Casualties)) # making an sf data frame called LAs_casualties which is the result of a simple group_by and summarise.
LAs_casualties %>% dplyr::arrange(desc(casualties))
```
We can see that Birmingham has had more casualties in this period than any other local authority.
The *LAs_casualties* dataset that we created above has the same local authorities names as our original *LAs* sf dataframe, meaning we can easily left_join this table to our original LAs_shapefile. But first, we will turn it into a flat dataframe (ie, with no geographic information) with the **st_set_geometry** function. If we did not get rid of the geometry before left_joining, we would create a "geometry_collection" object for each local authority, which would be a mess of polygons and points within those polygons for each feature. This is because our **group_by** function above also grouped together the geometries of the observations it was grouping together. If we look at the outout above, we can see that the geometry type of LAs_casualties is "MULTIPOINT", so the "Birmingham" observation has 3,532 points representing where each road accident happened, rather than outlining the polygon of Birmingham in any meaningful way.
```{r, echo = TRUE, eval = TRUE, error = FALSE, warning = FALSE}
LAs_casualties <- LAs_casualties %>% sf::st_set_geometry(NULL)
LAs <- dplyr::left_join(LAs, LAs_casualties, by = "lad19nm")
```
We now have our casualties data joined to our original local authorities shapefile. We can now use this to make graphical representations of our data.
## Plotting and simplifying
Let's have a look at one of our Local Authorities in isolation, Cornwall.
```{r, echo = FALSE, eval = TRUE, error = FALSE}
LAs %>% dplyr::filter(lad19nm == "Cornwall") %>% tmap::qtm()
```
We can see that our polygon is quite detailed. While this detail is desirable for when we conduct our spatial analysis, if we want to plot all of our local authorities at the same time, it can be better to plot simplified polygons. This is for 3 reasons. First, simplification means that R can create and output plots quicker; Second, outputs will have a smaller file size (particularly useful for interactive plots or when we share our outputs); and third, sometimes simplified polygons actually look better in plots as too-detailed borders can come out looking murky due to over-plotting, particularly in static plots.
To alleviate this, we can simplify our geometry features before plotting. For this, we will use the function **ms_simplify** from the package {rmapshaper}. This function simplifies spatial data while maintaining key topology. Simplifying functions in other packages can do funny things such leave gaps between polygons with internal borders when simplifying. **ms_simplify** doesn't do this.
*Simplifying Spatial Data*
```{r, echo = TRUE, eval = TRUE, error = FALSE}
#LAs_simp <- rmapshaper::ms_simplify(LAs, keep = 0.05, keep_shapes = TRUE)
```
This command creates a new sf data frame called "LAs_simp", which has all the same information as the original LAs sf dataframe, but has a simplified geometry with fewer points. The "keep" argument specifies how much geographic information we want to hold onto. Here we only keep 5% of the original points. The "keep_shapes" argument specifies whether or not we want to allow the function to delete some features entirely for simplification purposes. This can be useful for polygons with lots of small islands as individual entries (where you are indifferent to whether or not you keep all islands in your dataset), BUT use with caution, as if you simplify too much R might delete entire features (observations) to fulfill the simplification. Here we set "keep_shapes = TRUE"" so we don't lose any features.
Let's have a look at the difference between our original and our simplified polygon:
```{r, echo = FALSE, eval = TRUE}
Cornwall1 <- LAs %>% dplyr::filter(lad19nm == "Cornwall") %>% tmap::qtm(title = "Original")
#Cornwall2 <- LAs_simp %>% dplyr::filter(lad19nm == "Cornwall") %>% tmap::qtm(title = "5%")
#tmap::tmap_arrange(Cornwall1, Cornwall2)
```
Even with only 5% of the original points, we can see that we've held onto a lot of the information that we'd want if we were plotting these polygons.
However, we should always use simplification with caution, and only simplify our spatial objects *after* we've carried out spatial analysis. Note that we can also simplify spatial data which are lines, reducing the number of vertices in lines, but we cannot simplify spatial points data, as their geometries (a single point for each entry) is already as small as it can be.
## Mapping Spatial Data
There are a number of packages that you can use to make maps with spatial data in R, including {leaflet}, {ggmap}, {mapdeck}, and {ggplot2}. Here, we will be using the package {tmap}. {tmap} is a highly versatile package for mapping spatial data, with a number of advantages over other packages:
* {tmap} uses a similar grammar to {ggplot2}, where aesthetic layers are layered on top of each other;
* the same piece of {tmap} code can be used to make either static or interactive maps, and;
* {tmap} can work with spatial data in a number of different formats, including {sp}, {raster}, {geojson} and {sf}.
Similar to using {ggplot2}, the first layer of a tmap defines the spatial object that we wish to plot in the function **tm_shape**. We then add (+) an aesthetic element based on this spatial object Most often this aesthetic element will be one of type **tm_polygons**, **tm_line**, or **tm_dots**, depending on the spatial data type.
Let's see this in action, and map our local authorities based on how many road traffic casualties there were in each.
*Create static map in tmap*
```{r, echo = TRUE, eval = TRUE, error = FALSE, warning = FALSE}
#LAs <- rmapshaper::ms_simplify(LAs, keep = 0.05, keep_shapes = TRUE) # Simplifying polygons. Remember to only do this after you've finished your spatial analysis, or to save the simplified version as a different name.
tmap::tm_shape(LAs) + # defining what shapefile to use
tmap::tm_polygons(col = "casualties", # setting the variable to map the colour aesthetic to. tmap takes variable names in quotation marks
style = "quantile", # the style option lets us define how we want the variable split
n = 5, # number of quantiles
pal = "YlGnBu" # choose from a number of palettes available from rcolorbrewer, or define your own
) +
tmap::tm_borders(col = "#000000", # defining polygon border as black
lwd = 0.3 # setting border width
) +
tmap::tm_layout(legend.outside = TRUE # putting the legend outside the plot rather than inside
)
```
One problem with this plot is that we still have Northern Ireland in the plot, and we only have NAs for northern Ireland (as our casualties dataset didn't have accidents which happened there). To fix this, we can utilise a simple {dplyr} filter to remove features which have an *NA* value for casualties. We can then pipe a filtered sf dataframe into the **tm_shape** function.
```{r, echo = TRUE, eval = TRUE, error = FALSE, warning = FALSE}
LAs %>%
dplyr::filter(!is.na(casualties)) %>%
tm_shape() + # defining what shapefile to use
tm_polygons(col = "casualties", # setting the variable to map the colour aesthetic to. tmap takes variable names in quotation marks
style = "quantile", # the style option lets us define how we want the variable split
n = 5, # number of quantiles
pal = "YlGnBu" # choose from a number of palettes available from rcolorbrewer, or define your own
) +
tm_borders(col = "#000000", # defining polygon border as black
lwd = 0.3 # setting border width
) +
tm_layout(legend.outside = TRUE # putting the legend outside the plot rather than inside
)
```
Though this plot is useful, and gives us a sense of the distribution of accidents, it's difficult to make out values of smaller local authorities. What would make this easier is if we changed our map to an interactive map. We can do this very easily, using the same chunk of code, by simply setting the **tmap_mode** to "view".
*Create interactive map in tmap*
```{r, echo = TRUE, eval = TRUE, error = FALSE, warning = TRUE}
tmap_mode("view")
LAs %>%
filter(!is.na(casualties)) %>%
tm_shape() + # defining what shapefile to use
tm_polygons(col = "casualties", # setting the variable to map the colour aesthetic to. tmap takes variable names in quotation marks.
style = "quantile", # the style option lets us define how we want the variable split
n = 5, # number of quantiles
pal = "YlGnBu" # choose from a number of palettes available from rcolorbrewer, or define your own
) +
LAs %>%
filter(!is.na(casualties)) %>%
tm_shape() +
tm_borders(col = "#000000", # defining polygon border as black
lwd = 0.3 # setting border width
) +
tm_layout(legend.outside = TRUE # putting the legend outside the plot rather than inside
)
```
It's worth noting that some arguments in {tmap} are only availble in static plots, and some only available in interactive maps. However, R will just skip the arguments that don't apply, so it wont break your code to leave them in. Here, the "legend.outside" argument is meaningless in an interactive plot, so is skipped by R when creating this interactive plot.
A problem with this interact plot is that when you click on a local authority, you can't find out it's name. The default popup variable is the object id, because it's the first variable in the sf dataframe, which isn't very useful. We can manually set the popup variables instead.
We can also make the map look a bit nicer by increasing the transparency of the polygons with the "alpha" argument.
Finally we can pick from any number of basemaps. {tmap} leverages the {leaflet} package for basemaps, and you can find a list of available basemaps [here](https://leaflet-extras.github.io/leaflet-providers/preview/).
*Improving Interactive Map*
```{r, echo = TRUE, eval = TRUE, error = FALSE, warning = TRUE}
tm_basemap(server = "CartoDB.Positron") + # defining basemap
LAs %>%
filter(!is.na(casualties)) %>%
tm_shape() + # defining what shapefile to use
tm_polygons(col = "casualties", # setting the variable to map the colour aesthetic to. tmap takes variable names in quotation marks.
style = "quantile", # the style option lets us define how we want the variable split
n = 5, # number of quantiles
pal = "YlGnBu", # choose from a number of palettes available from rcolorbrewer, or define your own
alpha = 0.4, # setting transparency
popup.vars = c("Local Authority" = "lad19nm", "casualties"), # setting variables to be seen on click
id = "lad19nm" # Setting a variable to display when hovered over
)
```
There are myriad aesthetic possibilities in {tmap} and related packages. It's not uncommon for a map to be built first in {tmap}, before exported as {leaflet} map and further edited in the {leaflet} package, though we wont cover that here. It's also possible to plot more than one spatial object at the same time. For example, layering polygons, points and lines from various shapefiles. Simply define the new spatial object with **tm_shape** and add aesthetics as before.
Let's see this in action, adding a layer to our map with the worst car accidents in our dataset (accidents with over 10 casualties).
*Mapping multiple spatial objects*
```{r, echo = TRUE, eval = TRUE, error = FALSE, warning = TRUE}
tm_basemap(server = "CartoDB.Positron") + # defining basemap
LAs %>%
filter(!is.na(casualties)) %>%
tm_shape() + # defining what shapefile to use
tm_polygons(col = "casualties", # setting the variable to map the colour aesthetic to. tmap takes variable names in quotation marks.
style = "quantile", # the style option lets us define how we want the variable split
n = 5, # number of quantiles
pal = "YlGnBu", # choose from a number of palettes available from rcolorbrewer, or define your own
alpha = 0.4, # setting transparency
popup.vars = c("Local Authority" = "lad19nm", "casualties"),
id = "lad19nm") +
crashes %>% filter(Number_of_Casualties == 10) %>%
tm_shape() + # defining our second shapefile to use and plot in the same output as above
tm_dots(col = "red", # calling tm_dots on point objects
alpha = 0.7, # making the dots partially transparent
jitter = 0.05, # adding a small amount of random jittering to points so overlayed points are visible
id = "Number_of_Casualties"
)
```
## Other Spatial Types
We have worked only with the {sf} package here, but it's important to be aware of different types of spatial data formats that exist in R.
Most similar to the {sf} package is {sp}. sp objects hold the different spatial and non-spatial components of a dataset in different "slots", creating a hierarchical data structure, where an observation in one slot is linked to an observation in another slot. This works similarly to a relational database in SQL. For example, if our local authorities were in sp format, we would access names of local authorities by calling something like "LAs_sp@data$lad19nm". The "@" symbol specifies we want the dataframe "slot", and then we call the variable with the "$" as usual. Other slots in an sp object are the bounding box, the proj4string (a way of defining the projection, essentially a long-form crs), and the spatial data (points, polygons etc.). A drawback of holding data like this is that you have to use base R to manipulate data. {dplyr} will not work with sp objects, making sp objects more cumbersome to work with.
{sf} was developed to replace {sp}, but as {sf} is a relatively new package, there may be times when you encounter a function you need that only works on an {sp} object. When this happens, we can quickly convert objects back and forth between *sf* and *sp* with the following commands:
*Converting data from sf to sp objects*
```{r, echo = TRUE, eval = FALSE, error = FALSE, warning = TRUE}
object_sp <- as(object_sf, class = "SPATIAL") # sf to sp object
object_sf <- st_as_sf(object_sp) # sp to sf object
```
Another type of spatial data package to be aware of is the {raster} package. Raster objects are in a gridded data format: they define a rectangle, with a point of origin and an end point, and then define the number of rows and columns to split that rectangle into, making a grid of equal cell sizes. This format spares R from having to plot a polygon for every single grid square. It's often used for altitude or temperature data, but can be used for a wide range of data (eg emissions, or calculating distance from features of interest across a defined spatial area). Raster objects are held in the {raster} package, though plotting can be done with {tmap}, {leaflet}, and a range of mapping packages.
## Further Resources
For a range of shapefiles, the [ONS data portal](http://geoportal.statistics.gov.uk/) is a good place to look. It has local authorities with joinable data on average wages, populations, etc. as well as a number of other shapefiles for points/areas of interest such as national parks, hospitals, output areas of different sizes, and rural/urban areas.
For resources on using tmap and other mapping functions, see [this online workshop](http://www.seascapemodels.org/data/data-wrangling-spatial-course.html) which covers a range of topics. It should take about 4 hours to do everything in the workshop (but also good to just find individual bits of code). It mostly uses {ggplot2} but it finishes with some work with {raster} and {tmap}.
There's also [this very good blog post](https://www.jessesadler.com/post/gis-with-r-intro/), which does a good job of peaking under the hood of how sf and sp objects work without getting *too* technical.
For a more detailed overview of mapping in R, see [this amazing online book](https://geocompr.robinlovelace.net/) from Robin Lovelace. Of particular interest are Chapters 5 and 8, but the whole book is useful.
<!--chapter:end:08-spatial.Rmd-->