Data Wrangling - Tasks

In this section, we will practice data wrangling using Python and the pandas package. Follow the steps below to manipulate and explore a dataset.

  1. Create a new Python script in your scripts folder and name it 2-data-wrangling-and-visualization.py.

  2. Import pandas by adding the following line at the top of your script:

    import pandas as pd
  3. Download the crashes file (here) and save it in your data folder.

  4. Read the crashes dataset into Python using the pd.read_csv() function and assign it to a variable named crashes:

    crashes = pd.read_csv("data/crashes.csv")
  5. Explore the dataset:

    • Use the head() method to view the first few rows of the dataset.
    • Use the info() method to understand the structure of the dataset.
    • Use the describe() method to get a summary of the dataset.
  6. Data Wrangling Tasks:

    • Create a new DataFrame named crashes_filtered that includes only cyclists.
    • Create a new DataFrame named crashes_dark that includes only crashes that occurred in dark conditions.
    • Create a new DataFrame named crashes_dark_cyclist that includes only crashes that involved cyclists and occurred in dark conditions.
    • Create a summary table named crashes_by_type that shows the median age by casualty type.

R version: If you prefer R, the same exercises are available in R. Create a new R script (2-data-wrangling-and-visualization.R), load the tidyverse package, and use read_csv(), filter(), select(), group_by(), and summarise() to complete the same tasks.

Reuse