Practical 1: Introduction to Transport Data Science

Agenda

  1. Lecture: an introduction to Transport Data Science (30 min)
  2. Q&A (15 min)
  3. Break and networking (15 min)
  4. Data science and a good research question (30 min)
  5. Data science foundations (guided): Project set-up and using RStudio or VS Code as an integrated development environment (30 min)
  6. Focussed work (1 hr)
  • Working through the questions on processing OD data and running the code in Sections 13.1 to 13.4 the Transport chapter of Geocomputation with R and answering the questions for the Bristol dataset

What is transport data science and thinking of a good research question

  • Based on the contents of the lecture, come up with your own definition of data science
  • How do you see yourself using data science over the next 5 years?
  • Think of a question about a transport system you know well and how data science could help answer it, perhaps with reference to a sketch like that below

How to come up with a good research question

  • Think about the data you have access to
  • Think about the problems you want to solve
  • Think about the methods you want to use and skills you want to learn
  • Think about how the final report will look and hold-together

How much potential is there for cycling across the transport network?

How can travel to schools be made safer?

How can hospitals encourage visitors to get there safely?

Where’s the best place to build electric car charging points?

See openstreetmap.org or search for other open access datasets for more ideas

1 Data Science foundations

Read and try to complete the exercises in Chapters 1 to 4 of the book Reproducible Road Safety Research with R. It assumes that you have recently updated R and RStudio on your computer. For details on installing packages see here: https://docs.ropensci.org/stats19/articles/stats19-training-setup.html

  • Create a new folder (or R project with RStudio) called ‘practical1’
  • In it create file called foundations.qmd
  • Type the following

  • Knit the document by pressing Ctrl+Shift+K, with the ‘Knit’ button in RStudio, or by typing quarto render foundations.qmd in the PowerShell or Terminal console, the result should look like this:

This is some text:

casualty_type = c("pedestrian", "cyclist", "cat")
casualty_age = seq(from = 20, to = 60, by = 20)
crashes = data.frame(casualty_type, casualty_age)

We now have a data frame object stored in memory (technically in the global environment) that is used as the basis of the questions.

To get some larger datasets, try the following (from Chapter 8 of RSRR)

library(stats19)
Data provided under OGL v3.0. Cite the source and link to:
www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
ac = get_stats19(year = 2019, type = "collision")
Files identified: dft-road-casualty-statistics-collision-2019.csv
   https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-collision-2019.csv
Attempt downloading from: https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-collision-2019.csv
Data saved at /tmp/RtmpjdoYWT/dft-road-casualty-statistics-collision-2019.csv
Reading in: 
/tmp/RtmpjdoYWT/dft-road-casualty-statistics-collision-2019.csv
date and time columns present, creating formatted datetime column
Warning in format_stats19(x, type = "Accident"): NAs introduced by coercion
Warning in format_stats19(x, type = "Accident"): NAs introduced by coercion
Warning in format_stats19(x, type = "Accident"): NAs introduced by coercion
ca = get_stats19(year = 2019, type = "cas")
Files identified: dft-road-casualty-statistics-casualty-2019.csv
   https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-casualty-2019.csv
Attempt downloading from: https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-casualty-2019.csv
Data saved at /tmp/RtmpjdoYWT/dft-road-casualty-statistics-casualty-2019.csv
Warning: The following named parsers don't match the column names:
accident_severity, carriageway_hazards, date, day_of_week,
did_police_officer_attend_scene_of_accident, first_road_class,
first_road_number, junction_control, junction_detail, Latitude,
light_conditions, local_authority_district, local_authority_highway,
local_authority_ons_district, location_easting_osgr, location_northing_osgr,
longitude, lsoa_of_accident_location, number_of_casualties, number_of_vehicles,
pedestrian_crossing_human_control, pedestrian_crossing_physical_facilities,
police_force, road_surface_conditions, road_type, second_road_class,
second_road_number, special_conditions_at_site, speed_limit, time,
trunk_road_flag, urban_or_rural_area, weather_conditions, vehicle_text,
vehicle_type, age_band_of_driver, age_of_driver, age_of_vehicle,
driver_home_area_type, driver_imd_decile, engine_capacity_cc,
first_point_of_impact, generic_make_model, hit_object_in_carriageway,
hit_object_off_carriageway, journey_purpose_of_driver, junction_location,
propulsion_code, sex_of_driver, skidding_and_overturning,
towing_and_articulation, vehicle_direction_from, vehicle_direction_to,
vehicle_leaving_carriageway, vehicle_left_hand_drive,
vehicle_location_restricted_lane, vehicle_manoeuvre
Warning in asMethod(object): NAs introduced by coercion
ve = get_stats19(year = 2019, type = "veh")
Files identified: dft-road-casualty-statistics-vehicle-2019.csv
   https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-vehicle-2019.csv
Attempt downloading from: https://data.dft.gov.uk/road-accidents-safety-data/dft-road-casualty-statistics-vehicle-2019.csv
Data saved at /tmp/RtmpjdoYWT/dft-road-casualty-statistics-vehicle-2019.csv
Warning: The following named parsers don't match the column names: accident_severity, carriageway_hazards, date, day_of_week, did_police_officer_attend_scene_of_accident, first_road_class, first_road_number, junction_control, junction_detail, Latitude, light_conditions, local_authority_district, local_authority_highway, local_authority_ons_district, location_easting_osgr, location_northing_osgr, longitude, lsoa_of_accident_location, number_of_casualties, number_of_vehicles, pedestrian_crossing_human_control, pedestrian_crossing_physical_facilities, police_force, road_surface_conditions, road_type, second_road_class, second_road_number, special_conditions_at_site, speed_limit, time, trunk_road_flag, urban_or_rural_area, weather_conditions, age_band_of_casualty, age_of_casualty, bus_or_coach_passenger, car_passenger, casualty_class, casualty_home_area_type, casualty_imd_decile, casualty_reference, casualty_severity, casualty_type, pedestrian_location, pedestrian_movement, pedestrian_road_maintenance_worker, sex_of_casualty, vehicle_text
NAs introduced by coercion
# pip install stats19
import stats19
ac = stats19.get_stats19(year = 2019, type = "collision")
ca = stats19.get_stats19(year = 2019, type = "cas")
ve = stats19.get_stats19(year = 2019, type = "veh")

2.3.1. Use the $ operator to print the vehicle_type column of crashes.

- In R the `$` symbol is used to refer to elemements of a list. So the answer is simply `crashes$vehicle_type`

2.3.2. Subsetting the crashes with the [,] syntax

- Try out different combinations on the dataframe

2.3.3. Bonus: what is the class() of the objects created by each of the previous exercises?

- Explore how many R classes you can find

Let’s go through these exercises together:

  1. Subset the casualty_age object using the inequality (<) so that only elements less than 50 are returned.
  2. Subset the crashes data frame so that only tanks are returned using the == operator.
  3. Bonus: assign the age of all tanks to 61.
  • Try running the subsetting code on a larger dataset, e.g. the ac object created previously
  1. Coerce the vehicle_type column of crashes to the class character.
  2. Coerce the crashes object into a matrix. What happened to the values?
  3. Bonus: What is the difference between the output of summary() on character and factor variables?
  • We’ll explore this together

2 Data Science foundations

Work through Chapter 13 of the book Geocomputation with R, taking care to ask questions about any aspects that you don’t understand (your homework will be to complete and make notes on the chapter, including reproducible code).

3 Homework

  • Complete working through Chapter 13 of the Geocomputation with R book. Make notes in a .qmd file that you can bring to the class to show colleagues and the instructor next week.