install.packages("stats19")
4 R packages
4.1 What are packages?
R has over 20k+ packages published on the official ‘CRAN’ site and many more published on code-sharing sites such as GitHub, as outlined in Chapter 1. Packages are effectively plugins for R that extend it in many ways. Packages are useful because they enhance the range of things you can do with R, providing additional functions, data and documentation that build on the core (known as ‘base’) R packages. They range from general-purpose packages, such as {tidyverse}
and {sf}
, to domain-specific packages, such as {stats19}
.
This chapter demonstrates the package lifecycle with reference to {stats19}
and provides a taster of R’s visualisation capabilities for general-purpose packages {ggplot2}
and {dplyr}
. The {stats19}
package is particularly relevant for reproducible road safety research: its purpose is to download and clean road traffic collision data from the UK’s Department for Transport. Domain-specific packages, such as stats19
, are often written by subject-matter experts, providing tried and tested solutions within a particular specialism. Packages are reviewed by code experts prior to being made available via CRAN.
Regardless of whichever packages you install and use, you will take the following steps:
- installing the package;
- loading the package;
- using the package; and
- updating the package.
Of these, the third stage takes by far the most amount of time. Stages 1, 2 and 4 are equally important, however; you cannot use a package unless it has been properly installed, loaded and, to get the best performance out of the latest version, updated when new versions are released. We will learn each of these stages of the package lifecycle with the {stats19}
package.
4.2 The {stats19} R package
Like many packages, {stats19}
was developed to meet a real-world need. {STATS19} data is provided as a free and open resource by the Department for Transport, encouraging evidence-based and accountable road safety research and policy interventions. However, researchers at the University of Leeds found that repeatedly downloading and formatting open {STATS19} data was time-consuming, taking valuable resources away from more valuable (and fun) aspects of the research process. Significantly, manually recoding the data was error prone. By packaging code, we found that we could solve the problem in a free, open and reproducible way for everyone.
By abstracting the process to its fundamental steps (download, read, format), the {stats19}
package makes it easy to get the data into appropriate formats (of classes tbl
, data.frame
and sf
), ready for further processing and analysis. The package built upon previous work, with several important improvements, including the conversion of crash data into geographic data in a {sf}
data frame for geographic research. It enables creation of geographic representations of crash data, geo-referenced to the correct coordinate reference system, in a single function called format_sf()
. Part-funded by the RAC Foundation, the package should be of use to academic researchers and professional road safety data analysts working at local authority and national levels in the UK.
The following sections demonstrate how to install, load and use packages with reference to {stats19}
. This information can be applied in relation to any package.
4.3 Installing packages
The {stats19}
package is available on CRAN. This means that it has a web page on the CRAN website at cran.r-project.org with useful information, including who developed the package, what the latest version is, and when it was last updated (see cran.r-project.org/package=stats19). More importantly, being ‘on CRAN’ (which technically means ‘available on the Comprehensive R Archive Network’) means that it can be installed with the command install.packages()
as follows:
You might think that now that the package has been installed we can start using it, but that is not true. This is illustrated in the code below, which tries and fails to run the find_file_name()
function from the {stats19}
package to find the file containing {STATS19} casualties data for the year 2020. Check that this function exists by running the following command ?find_file_name
:
find_file_name(years = 2020, type = "casualt")
Error in find_file_name(years = 2020, type = "casualt"): could not find function "find_file_name"
4.4 Loading packages
After you have installed a package the next step is to ‘load’ it. Load the {stats19}
package, that was installed in the previous section, using the following code:
library(stats19)
What happened? Other than the message telling us about the package’s datasets (most packages load silently, so do not worry if nothing happens when you load a package), the command above made the functions and datasets in the package available to us. Now we can use functions from the package without an error message, as follows:
find_file_name(years = 2020, type = "casualt")
[1] "dft-road-casualty-statistics-casualty-2020.csv"
This raises the question: how do you know which functions are available in a particular package? You can find out using the autocompletion, i.e. by pressing Tab
after typing the package’s name, followed by two colons. Try typing stats19::
and then hitting Tab
, for example. You should see a load of function names appear, which you view by pressing Up
and Down
on your keyboard.
The final thing to say about packages is that they can be used without being loaded by typing package::function()
. We used this before in Section 2.8, where we imported csv data using the {readr}
package via {readr}::read_csv()
. So stats19::find_file_name(years = 2020, type = "casualt")
works even if the package isn’t loaded.
You can test this by running the sf_extSoftVersion()
command from the sf
package. This command reports the versions of key geographic libraries installed on your system. In the first attempt below, the command fails and reports an error. In the second and third attempts, utilising ::
and library
, you can see that the command succeeds:
# try running a function without loading the sf package first
sf_extSoftVersion()
Error in sf_extSoftVersion(): could not find function "sf_extSoftVersion"
# run a function from a package's namespace without loading it but using ::
::sf_extSoftVersion() sf
GEOS GDAL proj.4 GDAL_with_GEOS USE_PROJ_H
"3.12.1" "3.8.4" "9.4.0" "true" "true"
PROJ
"9.4.0"
# run a function call after loading the package (the most common way)
library(sf)
sf_extSoftVersion()
GEOS GDAL proj.4 GDAL_with_GEOS USE_PROJ_H
"3.12.1" "3.8.4" "9.4.0" "true" "true"
PROJ
"9.4.0"
As a bonus, try running the command sf::sf_extSoftVersion
without the brackets ()
. What does that tell you about the package?
4.5 Using packages
After loading a package, as described in the previous section, you can start using its functions. In the {stats19}
package that means the following command get_stats19()
will now work:
= get_stats19(year = 2020, type = "accidents")
crashes_2020 nrow(crashes_2020)
[1] 91199
This command demonstrates the value of packages. It would have been possible to get the same dataset by manually downloading and cleaning the file from the STATS19 website on data.gov.uk. However, by using the package, the process has been achieved much faster and with fewer lines of code than would have been possible using general-purpose base R functions. The result of the nrow()
function call shows that we have downloaded a decent amount of data representing over 100k road traffic casualty incidents across Great Britain in 2020.
We will use other functions from the package in subsequent sections of this guide. If you would like to learn more about {stats19}
and how it can be used for road safety research, check out its vignettes. The {stats19}
vignette, for example, should appear in the Help panel in the bottom right panel in RStudio after running the following command:
vignette("stats19")
4.6 Updating packages
Packages can be updated with the command update.packages()
or in ‘Tools > Check for Package Updates’ in RStudio. You only need to install a package once but packages can be updated many times. It is important to update packages regularly because updates will offer bug-fixes and other improvements. To update just one package, you can give the function a package name, e.g.:
update.packages(oldPkgs = "stats19")
Completing the following short exercises will ensure you’ve got a good understanding of packages and package versions.
- Take a look in the ‘Packages’ tab in the ‘Files’ pane in RStudio (bottom right by default).
- What version of the
{stats19}
package is installed on your computer? - What happens the second time you run
update.packages()
. Why?
4.7 {ggplot2}
{ggplot2}
is a generic plotting package that is part of the ‘tidyverse’ meta-package. The {tidyverse}
is an ‘Opinionated collection of R packages designed for data science’. {ggplot2}
is flexible, popular and has dozens of add-on packages which build on it, such as {gganimate}
. To plot non-spatial data, it works as follows (the command should generate the image shown in Figure 4.1, showing a bar chart of the number of crashes over time):
library(ggplot2)
ggplot(crashes_2020) + geom_bar(aes(date), width = 1)

A key feature of the {ggplot2}
package is the function ggplot2()
. This function initiates the creation of a plot by taking a data object as its main argument followed by one or more ‘geoms’ that represent layers (in this case a bar chart represented by the function geom_bar()
). Another distinctive feature of ggplot2()
is the use of +
operator to add layers.
The package is excellent for generating publication-quality figures. Starting from a basic idea, you can make incremental tweaks to a plot to get the output you want. Building on the figure above, we could make the bin width (width of the bars) wider, add colour depending on the crash severity and use count (Figure 4.2) or proportion (Figure 4.3) as our y axis, for example, as follows:
ggplot(crashes_2020) + geom_bar(aes(date, fill = accident_severity), width = 1)
ggplot(crashes_2020) +
geom_bar(aes(date, fill = accident_severity), width = 1, position = "fill") +
ylab("Proportion of crashes")


The package is huge and powerful, with support for a very wide range of plot types and themes, so it is worth taking time to read the documentation associated with the package, starting with the online reference manual and heading towards the online version of the package’s official book. As a final taught bit of {ggplot2}
code in this section, create a facetted plot showing how the number of crashes per hour varies across the days of the week by typing the following into the Source Editor and running the chunk line-by-line (the meaning of the commands should become clear by the end of the next section):
library(tidyverse)
|>
crashes_2020 mutate(hour = lubridate::hour(datetime)) |>
mutate(day = lubridate::wday(date)) |>
filter(!is.na(hour)) |>
ggplot(aes(hour, fill = accident_severity)) +
geom_bar(width = 1.01) +
facet_wrap(~day)

Exercises: 1. Install a package that builds on {ggplot2}
that begins with gg
. Hint: enter install.packages(gg)
and hit Tab
when your cursor is between the g
and the )
. 2. Open a help page in the newly installed package with the ?package_name::function()
syntax. 3. Load the package. 4. Bonus: try using functionality from the new ‘gg’ package building on the example above to create plots like those shown below (Hint: the right plot below uses the economist theme from the {ggthemes}
package; try other themes).
4.8 {dplyr}
Another useful package in the tidyverse is {dplyr}
, which stands for ‘data pliers’, which provides a handy syntax for data manipulation. {dplyr}
has many functions for manipulating data frames and using the pipe operator |>
. The pipe operator puts the output of one command into the first argument of the next, as shown below (Note: the results are the same):
library(dplyr)
class(crashes_2020)
[1] "spec_tbl_df" "tbl_df" "tbl" "data.frame"
|> class() crashes_2020
[1] "spec_tbl_df" "tbl_df" "tbl" "data.frame"
We will learn more about this package and its other functions in Chapter 5.