The researchers noted that such disparities alone are not necessarily indicative of racial bias. However, by looking at the rate at which officers discover contraband on searched drivers, they find evidence that minorities are held to a double standard and searched on the basis of less evidence.

The Stanford Open Policing Project — which the Stanford Computational Journalism Lab is a part of — aims to help researchers, journalists and policymakers investigate and improve interactions between police and the public. Learn more at https://openpolicing.stanford.edu

These findings are based on a nationwide database – which the Stanford researchers created – of state patrol stops. The database contains key details from millions of records collected from 2011 to 2015 and is part of an effort to statistically analyze police practices.

Along with the findings they are sharing today, the researchers are releasing their entire dataset, complete with online tutorials, so that policy makers, journalists and citizens can do their own analyses through this new Stanford Open Policing Project.

“We have seen many anecdotal reports of bias but we need more than that to accurately show the experience of motorists who are minorities,” said Cheryl Phillips, a professor of journalism and part of the Stanford Computational Journalism Lab. “We need an open and transparent platform where we can identify systematic bias and that will enable policy change.”

In an academic paper being released today, the researchers explain how they assembled more than 60 million police reports from 20 states that collect enough information about traffic stops – including the race or ethnicity of drivers – to permit statistical inferences when looking at the data in aggregate.

“We’ve created a platform to help researchers and policymakers understand and improve policing,” said Sharad Goel, an assistant professor in the Department of Management Science & Engineering and leader of the Stanford Law, Order and Algorithms Project, which uses computational tools to study criminal justice issues.

The threshold test

The intellectual heart of the project involved the development of a more nuanced and statistically valid way to infer racial or ethnic discrimination after a person is pulled over for a traffic stop. The researchers call their approach the threshold test. It quantifies the following question: once a driver is pulled over, what level of suspicion must an officer have to conduct a search, and how does this threshold of suspicion relate to the race or ethnicity of the driver?

Last year, the computer scientists released their first scientific paper on the threshold test, applying their statistical measure to 4.5 million traffic stops from 100 cities in North Carolina. That study found evidence of a double standard: officers required less suspicion to search black or Hispanic drivers than white drivers.

Applying this test to their newly collected records, the researchers found that the same basic pattern holds across the country.

“This is not a regional phenomenon, it’s national,” Phillips said. “For journalists, being able to examine both the local stops and the bigger picture represents stories that need to be told.”

From zero to 100 million

The project began more than two years ago when Phillips and a group of journalism graduate students began requesting traffic stop data from state police agencies. Some jurisdictions didn’t offer data electronically and among the states that made electronic records available, data often weren’t collected in a uniform format, creating barriers to systematic analysis.

In 2015, Goel and his students brought their computational skills to the effort. The Stanford team won a grant from the nonprofit John S. and James L. Knight Foundation to collect, analyze and share this sizable data set.

The data negotiation process has become part of the educational experience in Phillips’ classes. And collecting the data, standardizing it and bringing it into one repository means examining racial patterns in stops on a much broader scale is now possible for academic researchers, policymakers and journalists.

“It took thousands of hours to corral the different state datasets into a consistent format for analysis,” Goel said, adding that, among other things, the researchers recommend uniform standards for collecting traffic stop data to facilitate public oversight.

Moving forward

The researchers plan to continue collecting data from other states. So far, the project has obtained at least some data from 31 states, though the analysis has been focused on the 20 states with data that were more detailed. The project has now begun requesting similar data from cities. They also are creating tutorials to help explain the subtle differences between concepts like “discrimination” and “disparate impact” to help provide a shared vocabulary upon which to understand and improve police policies.

The team hopes to attract other researchers, policymakers and citizens to join their open effort. For her part, Phillips is focusing first on giving journalists both the data and a more sophisticated set of tools to support their role as public watchdogs. The project has already heard from dozens of journalists interested in using the data and tapping the expertise of the computer scientists.

“In a time of shrinking newsrooms and declining resources, we think this is a new model for helping journalists do their jobs,” she said.

Other members of the team are Emma Pierson and Sam Corbett-Davies, doctoral students in computer science; Camelia Simoiu and Jan Overgoor, doctoral students in management science and engineering; and Vignesh Ramachandran (Stanford Journalism MA ’12), a member of the Stanford Computational Journalism Lab.