This is some text:
casualty_type = c("pedestrian", "cyclist", "cat")
casualty_age = seq(from = 20, to = 60, by = 20)
crashes = data.frame(casualty_type, casualty_age) nrow(crashes)
Practical 1: Introduction to Transport Data Science
Agenda
- Lecture: an introduction to Transport Data Science (30 min)
- See the slides
- Q&A (15 min)
- Break and networking (15 min)
- Data science and a good research question (30 min)
- Data science foundations (guided): Project set-up and using RStudio or VS Code as an integrated development environment (30 min)
- Focussed work (1 hr)
What is transport data science and thinking of a good research question
- Based on the contents of the lecture, come up with your own definition of data science
- How do you see yourself using data science over the next 5 years?
- Think of a question about a transport system you know well and how data science could help answer it, perhaps with reference to a sketch like that below
How to come up with a good research question
- Think about the data you have access to
- Think about the problems you want to solve
- Think about the methods you want to use and skills you want to learn
- Think about how the final report will look and hold-together
How much potential is there for cycling across the transport network?
How can travel to schools be made safer?
How can hospitals encourage visitors to get there safely?
Where’s the best place to build electric car charging points?
See openstreetmap.org or search for other open access datasets for more ideas
1 Data Science foundations
Review of homework
Create a new folder (or R project with RStudio) called ‘practical1’
In it create file called foundations.qmd
Type the following
- Knit the document by pressing Ctrl+Shift+K in RStudio or VS Code, with the ‘Knit’ button in RStudio, or by typing
quarto render foundations.qmd
in the PowerShell or Terminal console, the result should look like this (see Figure 1):
![](images/rstudio-foundations.png)
The document you are reading is a Quarto document, so we can show the output of the contents of the code in foundations.qmd
here:
= c("pedestrian", "cyclist", "cat")
casualty_type = seq(from = 20, to = 60, by = 20)
casualty_age = data.frame(casualty_type, casualty_age)
crashes nrow(crashes)
[1] 3
There are different ways to execute code. When you run the code by ‘compiling’ the document, the objects are created in a different session. For interactive data analysis, it is better to run the code in the console.
Make sure that you also execute the code in the console so that the objects are created in memory. Do that by placing the cursor in the code chunk and pressing Ctrl+Enter (or Cmd+Enter on a Mac) or by copying and pasting the code into the console (not recommended). We now have a data frame object stored in memory (technically in the global environment) that is used as the basis of the questions.
Next, add to the data frame you created by adding the following code to the code chunk in the .qmd file:
= c("car", "bus", "tank")
vehicle_type $vehicle_type = vehicle_type crashes
What just happened?
We will explore this together, and then you can try the following data manipulation exercises:
1.1 Data object manipulation basics
- Use the
$
operator to print thevehicle_type
column ofcrashes
.
In R the $
symbol is used to refer to elemements of a list. So the answer is simply:
$vehicle_type crashes
[1] "car" "bus" "tank"
- Subset the crashes with the
[,]
syntax
Try out different combinations on the dataframe crashes
to see what happens. For example, try:
1,] crashes[
casualty_type casualty_age vehicle_type
1 pedestrian 20 car
1] crashes[,
[1] "pedestrian" "cyclist" "cat"
1,1] crashes[
[1] "pedestrian"
- Subset the object with the
[[
syntax.
The [[
operator is used to extract elements from a list. Try:
1]] crashes[[
[1] "pedestrian" "cyclist" "cat"
2]] crashes[[
[1] 20 40 60
- Bonus: what is the
class()
of the objects created by each of the previous exercises?
- Explore how many R classes you can find
- Bonus (advanced): reproduce the above with Python using the
pandas
orpolars
package
1.2 Data science on real data
To get some larger datasets, try the following (from Chapter 8 of RSRR)
::install_cran("stats19")
remoteslibrary(stats19)
= get_stats19(year = 2020, type = "collision")
ac = get_stats19(year = 2020, type = "cas")
ca = get_stats19(year = 2020, type = "veh")
ve # population hurt by road traffic collisions in 2020:
nrow(ca) / 67e6) * 100 (
[1] 0.1725134
Challenge: reproduce the above code in Python using the pystats19
package
# Install the package, e.g. with pip
!pip install pystats19
import pystats19
# See the documentation at https://github.com/Mayazure/py-stats19
Let’s go through these exercises together:
- Subset the
casualty_age
object using the inequality (<
) so that only elements less than 50 are returned. - Subset the
crashes
data frame so that only tanks are returned using the==
operator. - Bonus: assign the age of all tanks to 61.
- Try running the subsetting code on a larger dataset, e.g. the
ac
object created previously
- Coerce the
vehicle_type
column ofcrashes
to the classcharacter
. - Coerce the
crashes
object into a matrix. What happened to the values? - Bonus: What is the difference between the output of
summary()
oncharacter
andfactor
variables?
- We’ll explore this together
2 Self-study practical (1 hr)
Read and try to complete the exercises in Chapters 1 to 5 of the book Reproducible Road Safety Research with R.
It assumes that you have recently updated R and RStudio on your computer. For details on installing packages see here.
2.1 Bonus: data science and transport
- Work through Chapter 13 of the book Geocomputation with R, taking care to ask questions about any aspects that you don’t understand (your homework will be to complete and make notes on the chapter, including reproducible code).
3 Homework
Reproduce the code I wrote during this session, e.g. by copy-pasting this code into the console or source editor of RStudio: github.com/github.com/itsleeds/tds/blob/main/p1/p1project/foundations.qmd
- See the raw file at github.com/itsleeds/tds/raw/refs/heads/main/p1/p1project/foundations.qmd
- See the rendered result at itsleeds.github.io/tds/p1/p1project/foundations.html
Work through Chapter 13 of the Geocomputation with R book. Make notes in a .qmd file that you can bring to the class to show colleagues and the instructor next week.
Think of a research question that you could answer with data science, and write it down in a .qmd file. Include a sketch of the data you would need to answer the question.
Sign-up to the Cadence platform as outlined at itsleeds.github.io/tds/p2/#the-cadence-platform