Using R to explore the relationship between San Francisco MUNI citations and the weather

Anybody who rides San Francisco’s MUNI regularly has had their Clipper Card checked by a fare inspector. Anecdotally, it also seemed to me that my Clipper Card was being checked more often when it rained. Using data obtained from the SFMTA, it appears that rainy days do in fact result in more citations – but determining why that’s the case is a more complicated question.

Based on citations issued from March 2017 to March 2018, which I obtained through a public records request, SFMTA issued an average of 156 citations on days when it rained, vs. 127 citations on days when it did not rain. Ridership, and as a result, citations, varies a lot on weekends vs. weekdays. Looking just at weekdays, the pattern holds: an average of 171 citations were issued on rainy weekdays vs. an average of 148 on weekdays when it did not rain. Of the 362 days analyzed, 53 days were rainy.

There are a number of other factors that may influence the number of citations given out on any given day – below, we’ll see that none of them completely accounts for the increased rate on rainy days.

What about the day of the week? Does it rain more on high-ridership days?

The day of week presumably also impacts how many citations fare inspectors hand out. People generally ride public transit less on weekends, so there are fewer people in total who could possibly be riding without having paid. It’s also possible that it happened to rain more on days when people were taking public transit more often – looking at the one year of data, the 53 days of rain were not distributed evenly across all days:

A simple linear model can account for the day of the week, estimating the number of citations handed out based on (1) the day of week and (2) whether it rained or not. When accounting for day of the week, the presence of rain was still associated with an increase of about 18 citations on a given day. (See the end of this post for details of the regression model.)

Do people just ride MUNI more often on rainy days?

Another possible explanation for the greater number of citations on
rainy days is that people simply take public transportation more when it
rains – if more people take transit and a fixed percentage of them have
not paid, then the number of citations would increase on rainy days.
The SFMTA does not collect daily ridership numbers – I filed a public
requests request for daily numbers, and was informed that they base
their ridership data based on sampling.

Although there’s no San Francisco specific data, existing research seems to suggest that rain actually decreases ridership – a study of bus ridership in Pierce County, Washington notes “Rain negatively affected ridership in all four seasons,” while a study of the Chicago Transit system
found that “Generally, good weather tends to increase ridership, while
bad weather tends to reduce it.” As a result, it seems unlikely that the
increase in citations on rainy days is due simply to an overall
increase in ridership.

What about seasonality?

Another possible explanation is seasonality – if it’s more likely to rain during times of the year when people ride MUNI more, then it’s possible that greater ridership leads to the increase in citations. Could seasonality be a confounding variable, causing more rain, a greater number of riders, and therefore more citations? That doesn’t seem to be the case. Using data on average weekday boardings by month from MUNI, we can compare the monthly number of citations with overall ridership. The opposite actually appears to be true – the greater ridership in a given month, the fewer citations.

Who gets citations on rainy days?

My original thinking behind this dataset was that people needed to
seek shelter when there is bad weather, and often the easiest shelter is
public transit. When more people who cannot afford MUNI board without
buying a ticket, those same people would be the ones most likely to
receive a citation. It’s hard to know for sure, but the data that MUNI
provided also don’t seem to support this idea.

In addition to providing the time and location of citations, the
dataset includes (1) the total amount of the fee that has been paid and
(2) the total amount of the fee that is still due. Note that fees are
not static; if someone doesn’t pay the citation by its due date, additional fees can be added, meaning a Fare Evasion fee of $125 can increase with every payment date that is missed.

While it’s impossible to know the financial situations for each
person cited, whether or not someone paid their citation fine is a
possible proxy – if a person doesn’t pay their fine, that person is more
likely to not have the money to do so. If poorer residents are the ones
who are ticketed more often on rainy days, then we would expect to see
that rainy days have more unpaid citations. This doesn’t seem to be the
case – citations given on rainy days have been paid at roughly the same
rate (38%) as days when it did not rain (37%).

Weather

# of days

Citations issued

Citations paid

% of citations paid

No Rain

309

39,369

14,703

37%

Rain

53

8,272

3,166

38%

Further reading

The topic of fare inspectors has been covered a few times recently – the SF Examiner wrote about quotas and inspectors, while Hoodline wrote an interesting analysis of where fare inspectors are most likely to hand out citations.

Based on the data and analysis above, it’s not entirely clear why
more citations occur on rainy days. Even though citations were paid at a
similar rate on rainy days vs. non-rainy days, it would be interesting
to explore in more detail the relationship between weather and who is
being fined. If low-income communities are more impacted by fines on
rainy days, the data here suggest that they are still able to pay those
fines – perhaps with the help of The City of San Francisco’s Financial Justice Project, which has been working on helping ensure that fines and fees are not overly harsh on low-income communities.

Data from the SFMTA

Data on individual citations was obtained via public records request from the SFMTA. Each row in this dataset represents a single citation, and includes a number of fields – the ones used here were:

The time of the violation

The location of the violation

The exact violation code – e.g. “7.2.101” refers to a general “Fare Evasion” citation, see Fiscal Year 2018 fines

The total amount due

The total amount already paid

The dataset provided by the SFMTA included one year and one month
(from March 2017 through March 2018, so 13 months) of citations – about
50,000 in total. The analysis done here only used exactly one year of
that data, or roughly 47,000 citations.

Data from the NOAA

To see the relationship between weather and citation data, historical weather data was downloaded from the NOAA. The NOAA makes daily summaries by location available for download; the raw data used in this analysis is available on Github. I requested it from the downtown San Francisco station – click “Add to Cart,” then select the date range and specific fields of interest. Despite being referred to as a “cart,” it’s free.

A day was classified as “rainy” if the PRCP field (“precipitation”) indicated there was any precipitation that day. In the 362 days analyzed, 53 days had some rain and were therefore classified as “rainy” for the purposes of this analysis.

Simple linear model

The output of a linear regression model is shown below; the number of tickets as a function of (1) whether it rained or not and (2) the day of the week. As mentioned above, a rainy day was associated with ~18 more tickets when compared to the number of citations issued on a day without rain.

Subscribe to our newsletter

Storybench on Twitter

Collaborative, Open, Mobile

Over the last three years, Storybench has interviewed 72 data journalists, web developers, interactive graphics editors, and project managers from around the world to provide an “under the hood” look at the ingredients and best practices that go into today’s most compelling digital storytelling projects.

They boil down to three key areas of emphasis: 1) highly networked, team-based collaboration; 2) an ethos of open-source sharing, both within and between newsrooms; 3) and mobile-driven story presentation. Read our paper here.

What is Storybench?

Storybench takes an “under the hood” look at the latest in digital storytelling, from data viz and investigative journalism to VR and digital humanities. In addition to in-depth interviews with industry practitioners, we offer hands-on tutorials that can be “downloaded” right into the classroom or newsroom.

Want to contribute to Storybench? Pitch us or join us for a graduate degree in the Media Innovation program at Northeastern University’s School of Journalism.

The Reinventing Local TV News Project, from Northeastern's School of Journalism, is looking hard at the formats and practices of local news stations, and suggesting new ways of telling stories that can better engage diverse audiences. Read our inaugural post here.