How to Fight Bias with Predictive Policing

How to Fight Bias with Predictive Policing

Law enforcement’s use of predictive analytics recently came under fire again. Dartmouth researchers made waves reporting that simple predictive models—as well as nonexpert humans—predict crime just as well as the leading proprietary analytics software. That the leading software achieves (only) human-level performance might not actually be a deadly blow, but a flurry of press from dozens of news outlets has quickly followed. In any case, even as this disclosure raises questions about one software tool’s credibility, a more enduring, inherent quandary continues to plague predictive policing.

Crime-predicting models are caught in a quagmire doomed to controversy because, on their own, they cannot realize racial equity. It’s an intrinsically unsolvable problem. It turns out that, although such models succeed in flagging (assigning higher probabilities to both black and white defendants with equal precision, as a result of doing so they also falsely flag black defendants more often than white ones. In this article I cover this seemingly paradoxical predicament and show how predictive policing—more generally, big data in law enforcement—can be turned around to make the legal system fairer in this unfair world.

Predictive policing introduces a quantitative element to weighty law enforcement decisions made by humans, such as whether to investigate or detain, how long to sentence and whether to parole. When making such decisions, judges and officers take into consideration the calculated probability a suspect or defendant will be convicted for a crime in the future. Calculating predictive probabilities from data is the job of “predictive modeling” (aka machine learning) software. It automatically establishes patterns by combing historical conviction records, and in turn these patterns—together a predictive model—serve to calculate the probability for an individual whose future is as-yet unknown.

Whereas “colorblind,” crime-predicting models treat races differently from one another, they don’t explicitly incorporate race—or any protected class—into their calculations (although religion has been a consideration). Despite this, black defendants are flagged as higher risk more often than white ones.

This disparity is a direct consequence of the racially imbalanced world in which we live. For example, a defendant’s number of prior convictions is a standard input for predictive models, because defendants that have previously been convicted are more likely to re-offend (after release) than those who have not. Because more black defendants have prior convictions, this means predictive models flag (that is, assign higher probabilities to) black defendants more often than white ones. A black defendant isn’t flagged by race, but is more likely to be flagged nonetheless.

Today’s heated dispute, however, isn’t about this higher rate of flagging—more specifically, it’s about a higher rate of falsely flagging. Predictive models incorrectly flag black defendants who will not re-offend more often than they do for white defendants. In what is the most widely cited piece on bias in predictive policing, ProPublica reports the nationally used COMPAS model (Correctional Offender Management Profiling for Alternative Sanctions) falsely flags white defendants at a rate of 23.5 percent and black defendants at 44.9 percent. In other words, black defendants who don’t deserve it are erroneously flagged almost twice as much as undeserving whites. To address this sort of disparity, researchers at Google propose an affirmative action–like policy whereby a disenfranchised group is held to a more lenient standard. (Their interactive demo depicts the case of flagging for loan defaults rather than future crime, but the same concept applies.)

In opposition, advocates of COMPAS counter each flag is equally justified for both races. Responding to ProPublica, the creators of COMPAS point out that among those flagged as higher risk, the portion falsely flagged is similar for black and white defendants: 37 and 41 percent, respectively. In other words, among defendants who are flagged, it is erroneous for white and black defendants equally often. Others data scientists agree this meets the standard to exonerate the model as unbiased.

It appears, however, each individual flag is racially equitable, but the overall rates of false flagging are not. Although they may seem to contradict one another, these two things both hold true:

—If you’re flagged, the chances it was deserved are equal, regardless of race.

—If you don’t deserve to be flagged, you’re more likely to be erroneously flagged if you’re black.

Who’s right? These two views counter each other, and yet each appears valid on its own. On one hand, all flags seem to be equally well deserved. For defendants who are assigned higher probabilities, the rate of subsequent prosecutions is the same for both white and black defendants. On the other hand, among defendants who won’t re-offend, black individuals face a higher risk of being falsely flagged. A more nuanced position claims that to settle the matter we must agree on how fairness is defined.

But instead of crossing swords, the ultimate resolution would be to agree on measures to combat racial inequity. Debating whether the COMPAS model deserves the indictment “biased” distracts from the next course of action. Rather than only vetting a predictive model for whether it worsens racial injustice, let’s enhance predictive policing to actively help improve things. The key impetus to do so comes directly from the seeming paradox behind this dispute over “bias” that makes it so sticky to resolve. The oddity itself brings to light a normally hidden symptom of today’s racial inequity: If predictive flags are designed so they indicate the same re-offense probability for both white and black defendants—that is, designed to be equally precise for both groups—then, given the higher overall rate of re-offense among black defendants, that group suffers a greater prevalence of false flags.

And what an astonishing inequity that is. For a defendant of any race, being flagged means enduring a substantial risk that the flag is false. This can result in additional years of incarceration, with no way of confirming whether it was warranted (because the jailed defendant loses the freedom to demonstrate a lack of future crimes). For the black population, enduring this risk more often than whites adds insult to injury: Not only are black people more likely to become defendants in the first place, black defendants are in turn more likely to be unjustly sentenced to additional years on the basis of a false prediction of future crime.

This inequity isn’t new. Even before predictive models, the common practice of considering a suspect’s conviction history would have contributed to the same kind of cyclic perpetuation for the African-American population. The difference now is that it’s been explicitly quantified and widely publicized. Awareness rises and the impetus to act will grow.

Given this revelation, predictive policing is in an ideal position to respond and do something about it. An undertaking to integrate technology that supports decision-making across law enforcement, predictive policing has built the ideal platform on which new practices for racial equity may be systematically and widely deployed. It’s an unprecedented opportunity for racial justice.

To that end, let’s educate and guide law enforcement decision makers on the observed inequity. Train judges, parole boards and officers to understand the pertinent caveats when they’re given the calculated probability a black suspect, defendant or convict will reoffend. In so doing, empower these decision makers to incorporate these considerations in whatever manner they deem fit—just as they already do with the predictive probabilities in the first place.

There are three crucial considerations to reflect on when working with re-offense probabilities:

Via proxies, the defendant’s race has influenced the calculated probability you’re looking at. Although race is not a direct input into the formula, the COMPAS model may incorporate unchosen, involuntary factors that approximate race such as family background, neighborhood (“Is there much crime in your neighborhood?”); education level (only partially chosen); and the behavior of family and friends. FICO credit scores have been similarly criticized for incorporating factors such as the “number of bank accounts kept, that could interact with culture—and hence race—in unfair ways.” Furthermore, the COMPAS model is sealed as a “black box,” so the ways in which it incorporates such factors is unknown to law enforcement, the defendant and the public. In fact, the model’s creators recently revealed it only incorporates a selection of six of the 137 factors collected, but which six remains a proprietary secret. However, the founder of the company behind COMPAS has stated, if factors correlated with race, such as poverty and joblessness, “…are omitted from your risk assessment, accuracy goes down.”

Keeping the inner working proprietary in this way is like having an expert witness without allowing the defense to cross-examine. It’s like enforcing a public policy the details of which are confidential. There’s a movement to make such algorithms transparent in the name of accountability and due process, in part forwarded by pertinent legislation in Wisconsin and in New York City, although the U.S. Supreme Court declined to take on a pertinent case last year.

The calculated probabilities disfavor black defendants due to biased ground truth. Conventional wisdom and anecdotal evidence support the presumption black individuals are investigated, arrested and therefore convicted more often than white individuals who have committed the same crime. As a result, the data analyzed to develop crime-predicting models includes more cases of white “false negatives” than black ones—criminals who got away with it. Because the prevalence of this is, by definition, not observed and not in the data, measures of model performance do not reveal the extent to which black defendants are unjustly flagged more often. After all, the model doesn’t predict crime per se; it predicts convictions—you don’t know what you don’t know. The problem of biased ground truth is frequently covered, such as by The Washington Post and by data scientists.

The black population is ravaged by false flags. As a result of being flagged more often, undeserving black defendants and suspects are wrongly flagged almost twice as often as undeserving whites. Unlike the first two points above, this does not necessarily mean the flags themselves are unfairly influenced by race. Taking this systematic issue into consideration, however, contributes to the greater good. It is an opportunity to help compensate for past and present racial injustices and the cycles of disenfranchisement that ensue. This is where predictive policing can de-escalate such cyclic patterns rather than inadvertently magnify them. Just as we protect suspects by limiting the power to convict when evidence has been illegally obtained, we can choose to take protective measures on behalf of this disenfranchised group as well. This is a unique opportunity for law enforcement to be a part of the solution rather than a part of the problem.

If we make it so, predictive policing could turn out to be a sheep dressed in wolf’s clothing. Unearthing inequity, it looks threatening—but it presents an unprecedented opportunity to implement new measures to fight social injustice. Crime-predicting models themselves must remain colorblind by design, but the manner in which we contextualize and apply them cannot remain so. Reintroducing race in this way is the only means to progress from merely screening predictive models for racial bias to intentionally designing predictive policing to actively advance racial justice.