vignettes/pct.Rmd
pct.Rmd
Set eval=TRUE
to run this code when knitting:
knitr::opts_chunk$set(eval = FALSE)
The goal of the pct
package is to increase the
accessibility and reproducibility of the outputs from the Propensity to
Cycle Tool (PCT), a research project and web application hosted at www.pct.bike. The tool is one of just
~300 central
government websites exempt from the requirement to transition to the
.gov.uk domain name, and is a recommended source of evidence in the
preparation of Local Walking and Cycling plans (LCWIPs), as outlined in
technical
guidance1, supporting the
Cycling and Walking Infrastructure Strategy (CWIS),
an amendment of the Infrastructure
Act 2015. For an overview of the data provided by the PCT, clicking
on the previous link and trying it out is a great place to start. An
academic paper on
the PCT provides detail on the motivations and methods underlying the
project2.
Since work on the package began in 2015, for example during the ODI
Leeds Hack my Route hackathon
(see an early prototype of the tool at
https://twitter.com/robinlovelace/status/611979463803432960
),
the features and demand for the PCT have evolved substantially. In early
2019, for example, the School
travel layer was added to the main PCT site to provide evidence
nationwide on the potential benefits of scenarios of cycling uptake, and
where safe routes to school should be prioritised3. In fact, a major aim of the PCT
was to enable people to extend the tool2:
We envision stakeholders in local government modifying scenarios for their own purposes, and that academics in relevant fields may add new features and develop new use cases of the PCT.
Motivated by this vision of adaptable transport planning tools, this introductory vignette demonstrates how the package works with an example from the Isle of Wight, an island just off the southern coast of Britain, with a population of ~140,000 people. Before demonstrating some of the package’s key functions, it’s worth providing a little context.
The Propensity to Cycle Tool was commissioned by the UK’s Department for Transport to help planners and others prioritise investment and policies to get people cycling, as outlined in the Government report National propensity to cycle: full report with annexes4. However, the academic team leading the project had a wider sub-aim: of making transport evidence more accessible, encouraging evidence-based transport policies, and encouraging a more democratic transport planning process, and that means open transport data and open source transport modelling tools2.
The code base underlying the PCT is publicly available (see github.com/npct). However, the code hosted there is not easy to run or reproduce, which is where this package comes in: it provides quick access to the data underlying the PCT and enables some of the key results to be reproduced quickly. It was developed primarily for educational purposes (including for upcoming PCT training courses) but it may be useful for people to build on the the methods, for example to create a scenario of cycling uptake in their town/city/region.
In summary, if you want to know how PCT works, be able to reproduce some of its results, and build scenarios of cycling uptake to inform transport policies enabling cycling in cities worldwide, this package is for you!
You can install the development version of the package as follows:
remotes::install_github("ITSLeeds/pct")
Load the package as follows:
We will also use the following packages in this tutorial:
From feedback, we hear that the use of the data is critical in decision making. Therefore, one area where the package could be useful is making the data “easily” available to be processed.
To download the data within www.pct.bike, we have added a suite of functions:
get_pct()
get_pct_rnet()
get_pct_zones()
get_pct_lines()
get_pct_centroids()
get_pct_routes_fast()
get_pct_routes_quiet()
There are other get_()
functions that get official data
underlying the PCT, as we will see in a later section. For now, let’s
see how the functions work. To get the centroids in Isle of Wight at the
lower-resolution (smaller files) MSOA level (LSOA level data is returned
by default or by replacing msoa
with lsoa
in
the code below) you would run:
wight_centroids = get_pct_centroids(region = "isle-of-wight", geography = "msoa")
wight_zones = get_pct_zones(region = "isle-of-wight", geography = "msoa")
Let’s verify that the data gave us what we would expect to see:
The results are indeed as we would expect, with the centroid data showing points and the zone data showing zones. The zones with higher cycling levels are in the more densely populated south of the island, as we would expect. Likewise, the following command downloads the desire lines for the Isle of Wight:
wight_lines_pct = get_pct_lines(region = "isle-of-wight", geography = "msoa")
The rest of the get_pct_
functions are similar to the
above two examples and download data from www.pct.bike.
However, the base of these functions is get_pct()
, which
takes the following arguments:
base_url = "https://github.com/npct/pct-outputs-regional-R/raw/master"
:
just in case if you wanted to download the data from a similar
serverpurpose = "commute"
: soon there will be “schools” and
maybe other modes, but currently commute is the only option.geography = "msoa"
: MSOA or LSOAregion = NULL
: regions within
pct::pct_regions
layer = NULL
: one of z
(zones),
c
(centroids), l
(desire lines),
rf
(routes fast), rq
(routes quiet) or
rnet
.extension = ".Rds"
as PCT data is available in
various formats. For the purpose of this package we have made the
default option of “Rds”.To compare the downloaded data with data in the PCT web app, we will
take a subset of the wight_lines_pct
dataset. The top 30
travelled desire lines by number of commuters who use cycling as their
main mode is taken in the following code chunk. The reason for selecting
the top 30 will become apparent (the wight_lines_30
object
is provided in the PCT package):
The resulting wight_lines_pct
and
wight_lines_30
datasets are available in the package. We’ll
use the smaller one for speed. Note: these contain many variables, three
of which (the number of people cycling, driving and walking along the
desire lines from the 2011 Census) are shown below for the Isle of
Wight:
lwd = wight_lines_30$all / mean(wight_lines_30$all) * 5
plot(wight_lines_30[c("bicycle", "car_driver", "foot")], lwd = lwd)
To provide another view of the data, focus on cycling, let’s create a leaflet map:
pal = colorNumeric(palette = "RdYlBu", domain = wight_lines_30$bicycle)
leaflet(data = wight_lines_30) %>%
addTiles() %>%
addPolylines(weight = lwd,
color = ~ pal(bicycle)) %>%
addLegend(pal = pal, values = ~bicycle)
There was a reason for selecting the top 30 lines: it mirrors the view of the desire lines available from the PCT web application for the island, available at www.pct.bike/m/?r=isle-of-wight (note that Straight Lines is selected from the Cycling Flows dropdown menu in the image below, and by default shows the top 30 flows by number of bicycle trips).
The previous section showed that data downloaded with
get_pct*()
functions get the results generated by
the PCT. However, they do not reproduce the results generated
by the PCT, starting from first principles and publicly available,
official data. Underlying the PCT is origin-destination data from the
2011 Census. The MSOA-level data is open access, so we only provide
access to this dataset. The following command gets the
origin-destination data for the Isle of Wight:
wight_od_all = get_od(region = "wight")
summary(wight_od_all$geo_code1 %in% wight_centroids$geo_code)
summary(wight_od_all$geo_code2 %in% wight_centroids$geo_code)
Note that all the origin codes match the Isle of Wight centroid
codes, but most of the destination zones do not. This is because many
people on the island work outside the island. get_od()
by
default returns only OD pairs in which the commute trips originate from
the area
entered.
To make the dataset smaller and simpler, let’s subset it so it only
contains OD pairs in which the origin and destination are in
the island (the resulting wight_od
data is provided in the
package):
To convert the results to geographic desire lines, we can use the
function od2line()
from the stplanr
package:
The previous code chunk downloads and processes 324
origin-destination pairs, representing inter-zonal commuting trips made
by 42,139 people on the island (population: 140,000). By
default, the function includes intra-zonal flows, but these can be
omitted as follows (the argument omit_intrazonal
in
get_od()
does the same thing):
wight_lines_census = wight_lines %>%
filter(geo_code1 != geo_code2)
nrow(wight_lines_census)
sum(wight_lines_census$all)
Another OD data processing step developed for the PCT was converting oneway lines into 2 way lines. This can be done as follows:
wight_lines_census1 = od_oneway(
wight_lines_census,
attrib = c("all", "bicycle")
)
nrow(wight_lines_census1) / nrow(wight_lines_census)
sum(wight_lines_census1$all) / sum(wight_lines_census$all)
Note that the resulting lines contain 50% of the number of lines, but the same number of trips: this is because 2 separate lines between the same zones have been converted into 1 line representing the combined number of trips in both directions, for each OD pair. This step is not essential but it has a couple of advantages: it was used in the PCT to make the routing more computationally efficient (less work computing the same route twice); and it makes visualising the lines and routes simpler.
Now that the lines data contains data on 2 way trips between zones, we can estimate routes (note: the results on the PCT website contain estimated uptake levels from intrazonal flow) Visually, this involves converting the straight desire lines shown in the previous map into routes that can be cycled, as shown in the next code chunk. Note: this code does not run dynamically, because you need an CycleStreets.net API key for this, and it takes some time:
wight_routes_fast = route(
l = wight_lines_census1,
route_fun = cyclestreets::journey,
plan = "fastest")
You can download these routes as follows:
u = "https://github.com/ITSLeeds/pct/releases/download/0.5.0/wight_routes_fast.Rds"
wight_routes_fast = readRDS(url(u))
A sample of these is provided in the package as
wight_routes_30_cs
, which was generated as follows:
wight_routes_30_cs = wight_routes_fast %>%
group_by(geo_code1, geo_code2) %>%
summarise(
all = mean(all),
bicycle = mean(bicycle),
av_incline = weighted.mean(gradient_smooth, w = distances),
length = sum(distances),
time = sum(time)
) %>%
ungroup() %>%
top_n(30, bicycle)
A simple verification that we have the right desire lines matched to the routes involves plotting the Euclidean vs Route distance, e.g. as follows:
d = as.numeric(st_length(wight_lines_census_30)) / 1000
plot(d, wight_routes_30_cs$length / 1000, xlim = c(0, 10))
abline(a = c(0, 1))
How well does that match the route distance data downloaded from the PCT?
plot(wight_lines_30$rf_dist_km, wight_routes_30_cs$length)
Almost perfectly for most of the routes. Differences can be explained by changes in infrastructure since the PCT results were first generated (these will be updated in the Propensity to Cycle Tool on-line data later in 2019).
We now have everything needed to estimate cycling uptake for each desire lines on the Isle of Wight (we’ll do the calculation on the top 30 by current cycling levels).
Functions named with uptake_*()
estimate cycling
uptake:
uptake_pct_godutch()
: generates the “GoDutch” scenario
level of cycling based on a particular route’s hilliness percentage and
length.uptake_pct_govtarget()
: generates the UK government
target again based on the hilliness and length parameters.We will estimate cycling potential with
uptake_pct_godutch()
, using the length
and
av_incline
from the wight_routes_30_cs
object.
pcycle_govtarget = uptake_pct_govtarget(
distance = wight_routes_30_cs$length,
gradient = wight_routes_30_cs$av_incline * 100
)
In terms of cycling uptake, the results are shown below:
wight_routes_30_cs$govtarget = wight_lines_census_30$bicycle +
pcycle_govtarget * wight_lines_census_30$all
wight_routes_30_cs$govtarget_pct = wight_lines_30$govtarget_slc
ggplot(wight_routes_30_cs) +
geom_point(aes(length, govtarget), colour = "red") +
geom_point(aes(length, govtarget_pct), colour = "blue")
cor(wight_routes_30_cs$govtarget, wight_routes_30_cs$govtarget_pct)
The final computational stage is also one of the most important from
a policy perspective: estimating cycling potential down to the street
level, to help prioritise investment where it is most needed. This work
is done by the overline2()
function, as follows:
wight_routes_30_ls = sf::st_cast(wight_routes_30_cs, "LINESTRING")
rnet = overline(wight_routes_30_ls, "govtarget")
plot(rnet)
Running the same function for all routes in
wight_routes_fast
, generates the packaged data object
wight_rnet
, which was created as follows:
wight_routes_fast_gt = wight_routes_fast %>%
group_by(geo_code1, geo_code2) %>%
mutate(
govtarget = uptake_pct_govtarget(sum(distances), mean(gradient_smooth)) *
(sum(all) + sum(bicycle))
)
wight_routes_fast_gt = sf::st_cast(wight_routes_fast_gt, "LINESTRING")
wight_rnet = overline(wight_routes_fast_gt, "govtarget")
pal = colorNumeric(palette = "RdYlBu", domain = wight_rnet$govtarget)
leaflet(data = wight_rnet) %>%
addTiles() %>%
addPolylines(color = ~ pal(govtarget)) %>%
addLegend(pal = pal, values = ~govtarget)