Predictive Policing Reinforces Police Bias

Issues surrounding policing in the United States are at the forefront of our national attention. Among these is the use of “predictive policing,” which is the application of statistical or machine learning models to police data, with the goal of predicting where or by whom crime will be committed in the future. Today Significance magazine published an article on this topic that I co-authored with William Isaac. Significance has kindly made this article open access (free!) for all of October. In the article we demonstrate the mechanism by which the use of predictive policing software may amplify the biases that already pervade our criminal justice system. Below, we give a brief summary.

Police databases organize information about crimes recorded by police. But not every crime that is committed has an equal chance of being recorded by police. For example, crimes that are committed in areas that are heavily patrolled by police are more likely to be recorded than those that occur in areas where the police spend little time. Therefore, locations that are heavily patrolled by police are over-represented in the police data.

The goal of every machine learning algorithm is to learn patterns in the data that are fed into the software. When presented with police data, the algorithm will learn patterns in the police data—what the machine learns about, in fact, is patterns not about crime, per se, but about how police record crime. Because police record crime unevenly throughout a city, the patterns of recorded crime may differ substantially from the true patterns of crime. The pattern the algorithm is likely to learn is that most crimes occur in the over-represented locations. These may not actually be the locations with the highest level of crime.

Using the patterns it has learned, the algorithm can then make predictions about the future distribution of crime and additional police will be dispatched to the locations with the highest predicted rate of crime. Because the algorithm has learned that the over-represented locations have the most crime, more police will be dispatched to those areas and even more crime will be observed in those locations. These newly observed crimes are then fed back into the algorithm to repeat this process on subsequent days. This creates a vicious cycle where police are sent to certain locations because they believed those were the locations with the most crime, and they believe those same locations have the most crime because those were the locations to which they were sent.

In our article, we demonstrate this phenomenon using the Oakland Police Department’s recorded drug crimes. We apply a real predictive policing algorithm to assess where the algorithm would have suggested police look for additional drug crimes. We find that targeted policing would have been dispatched almost exclusively to lower income, minority neighborhoods. This is because, in records from the time prior to running the algorithm, the majority of the drug crimes recorded were in these same neighborhoods.

Using public health data on the demographics of drug users combined with high resolution US Census data, we estimate the number of drug users residing in each location throughout the city of Oakland. If we assume that drug users commit drug crimes where they live, then our estimates suggest that the kinds of drug crime recorded in a few specific neighborhoods in Oakland are, in fact, occurring in many neighborhoods throughout Oakland. While police data suggests that drug crimes primarily occur in few locations throughout the city, public health-based estimates suggest that drug use is much more widespread.

If this algorithm actually had been used with the goal of predicting and preventing crime in Oakland, it would have failed. Instead of revealing insights into drug use that were previously unknown to police, it would have, instead, simply sent police back into the communities they were already over-policing.

Using machine learning to predict where crime will happen in the future might seem like it could help police departments deploy police more effectively. Unfortunately, machine learning based on partial, inadequate data means that the predictions recycle the bias embedded in existing police practice while obscuring the problems with that practice. Predictive policing makes policing even more unfair than it already is.