Data Mining on the Side of the Angels

HAMBURG – Here’s what Patrick Ball wants the technology world to know: Data, by itself, isn’t truth. Even big data. But data plus a little bit of science can get you close.

Ball is executive director of the Human Rights Data Analysis Group, an organization that uses data-analysis and statistical methods to shed light on notoriously murky cases of mass human-rights abuses. This year, his analysis and testimony helped produce a guilty verdict in a trial against Guatemalan General José Efraín Ríos Montt for genocide during his 1982 to 1983 presidency.

“One of the moral foundations of the international human-rights movement is to speak truth to power,” he said. “If that’s going to work, we have to speak the truth, we have to be right, we have to get the analysis right on. And that’s not always easy.”

The goal in Guatemala had been to determine whether the government’s widespread killing of indigenous people had in fact been genocide, which is legally defined as a focus on a particular group rather than indiscriminate killing.

A considerable amount of data on the era’s mass murders had been collected by various groups. But Ball said it was unclear how fully the various groups’ lists overlapped, and how representative they were of what had happened overall.

“In order to have information on a killing, we have to be told about it, or observe it,” he said. “But many killings are hidden.”

Drawing from four groups’ counts of killings in a three-country area during the period, and using the 1981 census as a comparison basis, his group developed estimates of the data’s reliability, or of how closely it was likely to correspond to – if necessarily incompletely – the actual events.

Comparing indigenous to non-indigenous deaths in the time period under investigation, they found that indigenous people had been almost eight times more likely to be killed under Ríos Mott than were non-indigenous citizens. Both the absolute number of people killed and the comparative risk for indigenous populations had been vastly higher under Ríos Mott than in previous or later periods.

“That shows tremendous planning and coherence, I believe,” Ball said. “When I showed this proof to the judge in the Ríos trial, I saw a light go on in her eyes.”

Was this genocide, then? Not yet. While it was certainly evidence consistent with genocide, the actual assessment was still a question for the courts. And indeed, Ríos Mott was declared guilty of genocide in May of 2013, the first time that a former head of state had been convicted of the crime against his own people. However, the country’s Constitutional Court overturned the verdict shortly afterward, and a new trial is scheduled.

Ball will be back at the next trial, with the same or similar evidence, he said. “I look forward to testifying again.”

He also appealed to his audience of technologists and hackers to draw a broader lesson from the story. The technology world is falling in love with the idea of big data, he said. But the human-rights world in particular shows that data alone is inevitably flawed, marred by deliberate deceptions and inevitable imperfections in the collection process.

“It’s tempting to think you have a lot of data, and so you have an answer. And in an industry context you might even be right,” he said. “But in human rights, violence is hidden. We don’t know if what we don’t know is systematically different from what we do know.”

Getting to truth – certainly in understanding mass killings, but in any other data set too – requires an ability to understand patterns, he said. Without a scientific analysis of knowns and possible unknowns, this is impossible.

With it, he said, stories that have been without endings for decades can be finally concluded.

“Justice brings closure to families that never know when to start talking about a loved one in the past tense,” he said.