Challenge The Bias

The Problem

Judges are constantly making decisions about whether defendants should be released or detained while awaiting a trial.

How fair are these rulings?

Human beings are easily biased, and studies have suggested that external factors (like a lunch break or a tough football loss) can sway their decisions. In the age of big data, it’s tempting to imagine that using a computer to make rulings might help counteract our own biases. In fact, “risk scores” generated by algorithms are used nationwide [1][2]. When used as black boxes, however, algorithms are no better (and maybe much worse) in the biases they propagate. A recent study of defendants in Broward County, Florida showed that Black defendants are far more likely to be assigned a high-risk score [3].

The Users

Data contains valuable information, but we need to understand how to interpret it, use it, and recognize its consequences. We believe that taking the time to understand the effects of using these risk scores with different thresholds will allow judges, lawyers and policy-makers to use data-driven models to make less biased decisions in the criminal justice system.

Current Models

Currently, the proprietary COMPAS risk assessment score is widely used in court, but evidence shows that this score is suboptimal for minorities and women, and there is currently no system to assess how to make decisions based on a score -- it is simply presented as a number to the judge.

Factors

Variables to consider when calculating a risk score

Gender

Age

Race

COMPAS recidivism score

COMPAS violent recidivism score

What is Fair?

“Fair” is a word we often throw around, but determining what is the most fair decision involves a lot of tricky tradeoffs to think about. We consider three types of fairness, and compare how models can be interpreted in each framework.

Equal Thresholds: Given an algorithmically-generated risk score, we say that any two people with the same risk score have the same ruling. For example, we could decide that any defendant, regardless of race, gender, or other factor, will be detained if their risk score is about 0.6.

Equal Detention Rates: Given two populations (i.e. male and female, or black and white), we want to detain an equal rate of people from both populations. This necessarily means we want different thresholds for different populations.

Equal False Positive Rates: Given two populations, we want to choose thresholds per population such that we enforce equal false positive rates (FPR = the fraction of people who did not reoffend who were detained wrongfully).

Gender Risk

It is important for risk scores that judges will use to make decisions accurately reflect true risk. We see here results from the COMPAS method which doesn't account for gender: for a given risk score, males are more likely to recidivate than females - this is clearly unfair

Accounting for Gender

A good risk score model will accurately determine the risk of recidivism; this is an example of a model that accounts for gender and now accurately reflects the true risk. However, are we comfortable including gender in a model? We want our data to allow policy makers to be able to make these decisions

Equal Detention Rates

Given two populations (i.e. male and female, or black and white), we want to detain an equal rate of people from both populations. This necessarily means we want different thresholds for different populations

Equal Thresholds

Given an algorithmically-generated risk score, we say that any two people with the same risk score have the same ruling. For example, we could decide that any defendant, regardless of race, gender, or other factor, will be detained if their risk score is about 0.6

We address the second and third points, showing that adding certain variables to an algorithm can make it more fair and how to make optimal decisions based on different concepts of fairness. By creating interpretable visualizations of these concepts, we hope to make fair, data-driven models easier to understand and adopt.