Lab 2: Subway Staffing

The goals for this tutorial are:

The assignment requirements are as follows:

  1. Create a dashboard in the index.md file that answers (at least the first) three questions posed in the lab below. Please also include written context for the answers that is supported by visualizations.
  2. Use at least one transform in a plot you create. We've used bin and group, but there's more options like centroid, dodge, hexbin, etc. Try to get creative, but of course, you're only limited to what the data, scales, and marks allow.
  3. Use at least one annotation on a visualization to illustrate something of note. This could be a moment in time, or a statistical detail (mean, median), or a tip mark callout.
  4. Submit your deployed link as a comment on the lab 2 commons post.

Some tips for this assignment:


Data Context

The NYC Transit Authority has collected operational data from their Manhattan subway stations and needs your help to make critical staffing decisions for next summer. They are planning for a busy event season in 2026 and want to ensure they have adequate staff at the right stations.

Some notes:

Specifically, your dashboard should answer:

  1. How did local events impact ridership in summer 2025? What effect did the July 15th fare increase have?
  2. How do the stations compare when it comes to response time? Which are the best, which are the worst?
  3. Which three stations need the most staffing help for next summer based on the 2026 event calendar?
  4. [BONUS] If you had to prioritize one station to get increased staffing, which would it be and why?

The data could yield additional interesting trends, but these questions are the most important. The data has been provided in multiple datasets, all in the /data folder.

NYC Subway Operations Datasets

local_events.csv (Summer 2025)

Column Name Type Range/Values Description
date Date 2025-06-01 to 2025-08-14 Date of the event
event_name Categorical Various Type of event (concerts, festivals, etc.)
nearby_station Categorical 25 stations Subway station nearest to the event
estimated_attendance Integer 500-15,000 Actual attendance at the event

upcoming_events.csv (Summer 2026 - Planning)

Column Name Type Range/Values Description
date Date 2026-06-01 to 2026-08-14 Date of the planned event
event_name Categorical Various Type of event (concerts, festivals, etc.)
nearby_station Categorical 25 stations Subway station nearest to the event
expected_attendance Integer 500-15,000 Projected attendance at the event

incidents.csv (10 Years of Historical Data)

Column Name Type Range/Values Description
date Date 2015-01-01 to 2025-08-14 Date of the incident
station Categorical 25 stations Subway station where incident occurred
severity Categorical low, medium, high Severity level of the incident
staffing_count Integer 3-20 Number of staff on duty at the time
response_time_minutes Numeric 2-25 minutes Time to respond to the incident

ridership.csv (Summer 2025)

Column Name Type Range/Values Description
date Date 2025-06-01 to 2025-08-14 Date of ridership count
station Categorical 25 stations Subway station
entrances Integer 2,000-27,000+ Number of people entering the station
exits Integer 2,000-27,000+ Number of people exiting the station