A Misdirected COMPAS

I’m generally a fan of the use of actuarial tools in criminal justice. These tools help estimate the risk of a given event (say, failure to appear in court or rearrest) based on factors that are statistically correlated with that event. Part of my support for actuarial tools comes from my belief that criminal justice is as susceptible (if not moreso) to implicit bias as other human endeavors, and that, without necessarily meaning to, judges and prosecutors, who hold the majority of power in the criminal justice system, will implicitly make different evaluations of people based on their race. Discretionary decisions are hard to review. Actuarial tools can guide or replace that discretion with something that, in theory, is more accurate, transparent, and fair. Actuarial tools also fit well into the overall move to justify practices with evidence.

I am therefore extremely troubled to learn (via Doug Berman) about this study of the COMPAS risk assessment tool undertaken by Pro Publica (methodology here), which found that, as used in Florida, the predictions of risk were inaccurately biased on the basis of race. COMPAS, which stands for Correctional Offender Management Profiling for Alternative Sanctions, is the intellectual property of a private corporation, Northpointe. Jurisdictions using COMPAS enter data on a number of issues and a proprietary, non-public algorithm converts these data points into a risk score. By comparing COMPAS risk scores with criminal records, Pro Publica reporters were able to assess how accurate the predictions were. The results? “In forecasting who would re-offend, the algorithm made mistakes with black and white defendants at roughly the same rate but in very different ways,” over-estimating risk for black defendants (when compared with actual reoffending rates) and under-estimating risk for white defendants (when compared with actual reoffending rates). This means, in practical terms, that black defendants were unjustly denied favorable release terms and white defendants were unjustly granted them.

Because I am, in general, in favor of risk assessment tools, I want to take this opportunity to clarify that being in favor of evidence-based practices and risk-assessment tools does not mean that all tools are equal, nor that they can be installed without modification or monitoring. It is extremely important to have good tools, good training, and good practices.

There are a few best practices that should always be followed.

Validation.Validation means looking at how the risk assessment tool actually did at predicting results, changing the tool as necessary. It’s not just best practice to validate risk assessment tools—it’s non-practice not to validate them. Judicial discretion hard to review because we can’t read judges’ minds and know what went into a given decision. But actuarial tools can be checked and we can see how good they are at predicting the future. Unless we check, we’re just putting our blind faith in something because it has a number on it. That’s not evidence-based practice; that’s numerology. Even tools that are valid in one jurisdiction are not necessarily valid in another due to differences (often unobserved) between one place and another. If the tool doesn’t work, stop using it until you change it. After you use it, keep checking. Institutions, not journalists, need to take the responsibility on themselves to see if what the tool is telling them is actually true, and if there are unexplained racial disparities.

Use open tools, open data, and open methods. Part of the problem with COMPAS is that the tool itself is proprietary. It’s difficult to check without knowing how the data is scored (and scoring, not the question list, is the key—it gives you the verdict about risk that drives decisions). I think all tools should be open, all data collected should be auditable, and we should know how things are scored. All of this should also be challengeable. This is just basic Daubert reasoning for you lawyers out there.

Test on subpopulations. Given the disparate impact criminal justice has had on race and class (etc.), we should always do what ProPublica did: control for known risk factors and see if there are disparate racial impacts. This just seems like a no-brainer. The integrity of the criminal justice system—and people’s faith in it—demand no less.

I don’t think this example is fatal to the idea of risk assessment tools, just a cautionary note that you don’t check your skepticism at the door. Using the tools means accepting their limitations and knowing what they were designed to do—and what they can’t do. Don’t pound in a nail with the handle of a screwdriver. If you do use the tools, the benefits only accrue with changes to how you do business—including collecting data and changing operations based on that data. The main benefit of such tools is that they can be checked much more easily than discretionary decisions made elsewhere in the criminal justice system. But to do that, we need to have more openness with them—and administrators who understand that risk assessment is an iterative process that requires management and reassessment.

David Ball is an Associate Professor at Santa Clara School of Law. He writes and teaches primarily in the fields of criminal law and criminal procedure, with a special focus on sentencing and corrections. He also serves as the Co-Chair of the Corrections Committee of the American Bar Association.