7 Spatial data

From displaying simple point data to examining collision density along routes or between areas, the geographic property of STATS19 data is one of its most useful attributes. Mapping is a hugely useful and powerful aspect of R and has many applications in road safety, both in understanding geographic trends and presenting insight to colleagues. This aspect of R is covered in detail in the book Geocomputation With R (Lovelace, Nowosad, and Muenchow 2019). By mapping collision data in R, you can add layers containing other geographic datasets to further understand the reasons for certain trends. This can lead to new opportunities for intervention and collaboration with other parties who may have mutually compatible solutions for reaching their goals. We will use the following packages in this section:

library(sf)        # load the sf package for working with spatial data
library(tidyverse) # load the tidyverse as before

7.1 sf objects

All road crashes happen somewhere and, in the UK at least, all collisions recorded by the police are given geographic coordinates. These can help in prioritising interventions to save lives by focusing on ‘crash hotspots.’ R has strong geographic data capabilities with the sf package, providing a generic class for spatial vector data. Points, lines and polygons are represented in sf as objects in a special ‘geometry column’, typically called ‘geom’ or ‘geometry’, extending the data frame class we’ve already seen in crashes, created in Section 5.1 (repeated here to consolidate data frame creation):

crashes = tibble(
  casualty_type = c("pedestrian", "cyclist", "cat"),
  casualty_age = seq(from = 20, to = 60, by = 20),
  vehicle_type = c("car", "bus", "tank"),
  dark = c(TRUE, FALSE, TRUE)
)

Create an sf data frame called crashes_sf that expands the crashes data frame to include a geometry column based on the crashes longitude and latitude data as follows:

crashes_sf = crashes # create copy of crashes dataset
crashes_sf$longitude = c(-1.3, -1.2, -1.1)
crashes_sf$latitude = c(50.7, 50.7, 50.68)
crashes_sf = st_as_sf(crashes_sf, coords = c("longitude", "latitude"), crs = 4326) # st_as_sf converts longitude and latitude coordinates into spatial objects using a specified Coordinate Reference System (see 7.6)
# plot(crashes_sf[1:4]) # basic plot
# mapview::mapview(crashes_sf) # for interactive map

Exercises:

  1. Plot only the geometry column of crashes_sf (Hint: the solution may contain $geometry). If the result is like that in Figure 7.1, congratulations, it worked!
  2. Plot crashes_sf, only showing the age variable.
  3. Plot the 2nd and 3rd crashes, showing which happened in the dark.
  4. Bonus: How far apart are the points? (Hint: sf functions begin with st_)
  5. Bonus: Near which settlement did the tank runover the cat?
The `crashes_sf` dataset shown in map form with the function `plot()`.The `crashes_sf` dataset shown in map form with the function `plot()`.The `crashes_sf` dataset shown in map form with the function `plot()`.

Figure 7.1: The crashes_sf dataset shown in map form with the function plot().

7.2 Reading and writing spatial data

You can read and write spatial data with read_sf() and write_sf(), as shown below (see ?read_sf):

write_sf(zones, "zones.geojson") # saves the spatial dataset called zones as geojson file type
write_sf(zones, "zmapinfo", driver = "MapInfo file") # saves the dataset as a MapInfo file
read_sf("zmapinfo") # reads-in mapinfo file

See Chapter 6 of Geocomputation with R for further information (Lovelace, Nowosad, and Muenchow 2019).

7.3 sf polygons

sf objects can also represent administrative zones. This is illustrated below with reference to zones, a spatial object representing the Isle of Wight, that we will download using the pct package (Note: the [1:9] appended to the function selects only the first 9 columns).

zones = pct::get_pct_zones("isle-of-wight")[1:9]

Exercises:

  1. What is the class of the zones object?
  2. What are its column names?
  3. Print its first 2 rows and columns 6:8 (the result is below).
## Simple feature collection with 2 features and 5 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -1.301131 ymin: 50.69052 xmax: -1.28837 ymax: 50.70547
## geographic CRS: WGS 84
## # A tibble: 2 x 6
##   geo_code    all bicycle  foot car_driver                              geometry
##   <chr>     <dbl>   <dbl> <dbl>      <dbl>                    <MULTIPOLYGON [°]>
## 1 E01017326   698      23   285        286 (((-1.289993 50.69766, -1.290177 50.…
## 2 E01017327   720      25   225        374 (((-1.295712 50.69383, -1.29873 50.6…

7.4 Spatial subsetting and sf plotting

Like index and value subsetting, spatial subsetting can be done with the [] notation. We can identify the crashes (crashes_sf) that occur in the Isle of Wight (zones) by subsetting as follows (i.e. subset zones by whether it contains data in crashes_sf):

zones_containing_crashes = zones[crashes_sf, ]

To plot a new layer on top of an existing sf plot, use the add = TRUE argument, e.g. as follows:

plot(zones$geometry) # plot just the geometry of one layer
plot(zones_containing_crashes$geometry, col = "grey", add = TRUE)

Remember to plot only the geometry column of objects to avoid multiple maps. Colours can be set with the col argument.

Exercises:

  1. Plot the geometry of the zones, with the zones containing crashes overlaid on top in red (see Figure 7.2).
  2. Plot the zone containing the 2nd crash in blue (see Figure 7.2).
  3. Bonus: Plot all zones that intersect with a zone containing crashes, with the actual crash points plotted in black (see Figure 7.2).
Illustration of the results of spatial subsetting.Illustration of the results of spatial subsetting.Illustration of the results of spatial subsetting.

Figure 7.2: Illustration of the results of spatial subsetting.

7.5 Geographic joins

Geographic joins involve assigning values from one object to a new column in another, based on the geographic relationship between them. With sf objects, the data from the crashes_sf dataset is joined onto the ‘target’ zones dataset, to create a new object called zones_joined:

zones_joined = st_join(zones[1], crashes_sf)

The above code takes the geo_code column data from zones, matches it to the geometry column in crashes_sf and then joins it to the crashes that have occurred in those geo_codes. The matched, joined geo_code is a new column in the zone_joined dataset. We now know the administrative geo_code in which each crash occured.

Exercises:

  1. Plot the casualty_age variable of the new zones_joined object (see Figure 7.3, to verify the result).
  2. How many zones are returned in the previous command?
  3. Select only the geo_code column from the zones and the dark column from crashes_sf and use the left = FALSE argument to return only zones in which crashes occurred. Plot the result. (Hint: it should look like the map shown in Figure 7.3)

See Chapter 4 of Geocomputation with R (Lovelace, Nowosad, and Muenchow 2019) for further information on geographic joins.

Illustration of geographic joins.Illustration of geographic joins.

Figure 7.3: Illustration of geographic joins.

7.6 Coordinate Reference Systems

A Coordinate Reference Systems (CRS) is used for plotting data on maps. There are many systems in use but they can generally be classified into two groups; ‘projected’ and ‘geographic’. A projected system, such as Eastings/Northings, plots locations onto a flat 2D projection of the Earth’s surface. A geographic system, such as Longitude/Latitude, refers to locations on the 3D surface of the globe. Distance and direction calculations work differently between geographic and projected CRSs, so it is often necessary to convert from one to another. Fortunately, R makes this very easy, and every CRS has its own unique reference number. For example, 27700 for the British National Grid system.

CRSs define how two-dimensional points (such as longitude and latitude) are actually represented in the real world. A CRS value is needed to interpret and give true meaning to coordinates. You can get and set CRSs with the command st_crs(). Transform CRSs with the command st_transform(), as demonstrated in the code chunk below, which converts the ‘lon/lat’ geographic CRS of crashes_sf into the projected CRS of the British National Grid:

crashes_osgb = st_transform(crashes_sf, 27700)

Exercises:

  1. Try to subset the zones with the crashes_osgb. What does the error message say?
  2. Create zones_osgb by transforming the zones object.
  3. Bonus: Use st_crs() to find out the units measurement of the British National Grid.

For more information on CRSs see Chapter 6 of Geocompuation with R (Lovelace, Nowosad, and Muenchow 2019).

7.7 Buffers

Buffers are polygons surrounding geometries, usually with fixed distance. For example, in road safety research a 30m buffer can be created around crash locations to identify crashes that happened in close proximity to a particular junction or road segment.

Exercises:

  1. Find out and read the help page of sf’s buffer function.
  2. Create an object called crashes_1km_buffer representing the area within 1 km of the crashes_osgb object and plot the result using the command: plot(crashes_1km_buffer). As a fun bonus, try: mapview::mapview(crashes_1km_buffer).
  3. Bonus: Try creating buffers on the geographic version of the crashes_sf object. What happens?

7.8 Attribute operations on sf objects

We can do non-spatial operations on sf objects because they are data.frames. Try the following attribute operations on the zones data:

# load example dataset if it doesn't already exist
zones = pct::get_pct_zones("isle-of-wight")
sel = zones$all > 3000  # create a subsetting object
zones_large = zones[sel, ] # subset areas with a population over 3,000
zones_2 = zones[zones$geo_name == "Isle of Wight 002",] # subset based on 'equality' query
zones_first_and_third_column = zones[c(1, 3)]
zones_just_all = zones["all"]

Exercises:

  1. Practice the subsetting techniques you have learned on the sf data.frame object zones:
    • Create an object called zones_small, which contains only regions with less than 3000 people in the all column.
    • Create a selection object called sel_high_car which is TRUE for regions with above median numbers of people who travel by car and FALSE otherwise.
    • Create an object called zones_foot which contains only the foot attribute from zones.
    • Bonus 1: plot zones_foot using the function plot to show where walking is a popular mode of travel to work.
    • Bonus 2: building on your answers to previous questions, use filter() from the dplyr package to subset small regions where car use is high.
  2. Bonus: What is the population density of each region (Hint: you may need to use the functions st_area(), as.numeric() and use the ‘all’ column)?
  3. Bonus: Which zone has the highest percentage of people who cycle?

7.9 Mapping road crash data

So far we have used the plot() function to make maps. That’s fine for basic visualisation, but for publication-quality maps we recommend using tmap. See Chapter 8 of Geocomputation with R (Lovelace, Nowosad, and Muenchow 2019) for further explanation and alternatives. After installation, load the package as follows:

library(tmap)
tmap_mode("plot") # this sets the tmap mode to plotting as opposed to interactive
## tmap mode set to plotting

Exercises:

  1. Create the plots of the zones object using plot() and tm_shape() + tm_polygons() functions (see Figure 7.4).
  2. Create an interactive version of the tmap plot by setting tmap_mode("view") and re-running the plotting commands.
  3. Add an additional layer to the interactive map showing the location of crashes, using marker and dot symbols.
  4. Bonus: Change the default basemap (Hint: you may need to search in the package documentation or online for the solution).
Illustration of the plot and tmap approaches for creating maps.Illustration of the plot and tmap approaches for creating maps.

Figure 7.4: Illustration of the plot and tmap approaches for creating maps.

7.10 Analysing point data

Based on the saying, “Don’t run before you can walk,” we’ve learned the vital foundations of R before tackling a real dataset. Temporal and spatial attributes are key to road crash data, hence the emphasis on lubridate and sf. Visualisation is central to understanding data and influencing policy, which is where tmap comes in. With these solid foundations, plus knowledge of how to ask for help (we recommend reading R’s internal help, asking colleagues, searching the internet and creating new questions or comments on online forums such as StackOverflow or GitHub and we suggest you follow this order of resources to get help), you are ready to test the methods on some real data.

Before doing so, take a read of the stats19 vignette, which can be launched as follows:

vignette(package = "stats19") # view all vignettes available on stats19
vignette("stats19") # view the introductory vignette

This should now be sufficient to tackle the following exercises:

  1. Download and plot all crashes reported in Great Britain in 2018. (Hint: see the stats19 vignette)
  2. Find the function in the stats19 package that converts a data.frame object into an sf data frame. Use this function to convert the road crashes into an sf object, called crashes_sf, for example.
  3. Filter crashes that happened in the Isle of Wight based on attribute data. (Hint: the relevant column contains the word local)
  4. Filter crashes happened in the Isle of Wight using geographic subsetting. (Hint: remember st_crs()?)
  5. Bonus: Which type of subsetting yielded more results and why?
  6. Bonus: How many crashes happened in each zone?
  7. Create a new column called month in the crash data using the function lubridate::month() and the date column.
  8. Create an object called a_zones_may representing all the crashes that happened in the Isle of Wight in the month of May.
  9. Bonus: Calculate the average (mean) speed limit associated with each crash that happened in May across the zones of the Isle of Wight (the result is shown in the map).

Other mapping functions that can be powerful presentation tools are also explained in Geocomputation With R (Lovelace, Nowosad, and Muenchow 2019). These include leaflet maps, which have interactive properties and can be shared as .html files, and shiny apps that give even more interactivity as well as the possibility to embed on websites.

7.11 Analysing crash data on road networks

Road network data can be accessed from a range of sources, including OpenStreetMap (OSM) and Ordnance Survey. We will use some OSM data from the Isle of Wight, which can be loaded as follows:

u = "https://github.com/ropensci/stats19/releases/download/1.1.0/roads_key.Rds" 
roads_wgs = readRDS(url(u))
roads = roads_wgs %>% st_transform(crs = 27700)

You should already have road crashes for the Isle of Wight from the previous stage. If not, load crash data as follows:

u = "https://github.com/ropensci/stats19/releases/download/1.1.0/car_accidents_2017_iow.Rds"
crashes_iow = readRDS(url(u))
  1. Plot the roads with the crashes overlaid.
  2. Create a buffer around the roads with a distance of 200m.
  3. How many crashes fall outside the buffered roads?
  4. Bonus: Use the aggregate() function to identify how many crashes happened per segment and plot the result with tmap and plot the crashes that happened outside the road buffers on top, as shown in Figure 7.5 (hint: see ?aggregate.sf and take a read of Section 4.2.5 of Geocomputation with R (Lovelace, Nowosad, and Muenchow 2019)). Try undertaking the same steps for a different region of your choice. An example for Essex is shown in the code chunk, which results in Figure 7.6, (thanks to Will Cubbin).17
Maps of the Isle of Wight.Maps of the Isle of Wight.

Figure 7.5: Maps of the Isle of Wight.

remotes::install_github("itsleeds/osmextract") # install github package for osm data
library(osmextract)
region_name = "essex" # "essex" can be changed to another area name as required
osm_data = oe_get(region_name, extra_tags = c("maxspeed", "ref"))
table(osm_data$highway)
#filter osm_data to show only major roads
roads = osm_data %>%
  filter(str_detect(highway, pattern = "moto|prim|seco|tert|trunk"))
# transform geometry and save
roads = st_transform(roads, 27700) #converts to projected BNG system for later use
# plot(roads$geometry) # basic plot of roads
tm_shape(roads) + tm_lines("maxspeed", showNA = T, lwd = 2)
saveRDS(roads, file = "roads.Rds") # Saves road dataset for future use
Roads in Essex downloaded with the code shown above.

Figure 7.6: Roads in Essex downloaded with the code shown above.

Bonus exercises

Identify a region and zonal units of interest from http://geoportal.statistics.gov.uk/ or from the object police_boundaries in the stats19 package.

  1. Read them into R as an sf object.
  2. Create a map showing the number of crashes in each zone.
  3. Identify the average speed limit associated with crashes in each zone.
  4. Identify an interesting question you can ask to the data and use exploratory data analysis to find answers.
  5. Check another related project for further information on smoothing techniques of counts on a linear network.

  1. The code chunk downloads and filters road data to show only main roads, and saves as an .Rds file for later use in your R project. The geometry of the roads.Rds file from this example is a one dimensional line representing the centre of the road. Some functions require the road to be treated as 2D polygon with a width property. You can use the st_buffer() function to turn this vector into a polygon.↩︎