Can the UNODC's Murder Statistics be Trusted?

My parents came to visit me in Malawi back in December, and this did wonders for my mom’s level of concern about my welfare. She was able to see that Malawi at least looks relatively safe. We got to discussing safety and violence after the horrific murders of 20 kindergarteners that month. I made the off-hand claim that I am physically safer here than in the US. I’ve heard about awful crimes in both places, but I’m convinced in particular that my chances of being murdered are much lower here.

I cannot possibly square that high of a murder rate with my experience here. I collected the data for my survey in Traditional Authority Mwambo, a rural area that conveniently has about 100,000 people in it. I was there for about 4 months, and during that time I befriended all of the local authorities, especially the police. In managing my research team, I was very cognizant of crime and our personal security, and pursued any and all rumors with my friends at the Jali and Kachulu police stations and at the local road traffic police as well. For their part, they were very open about the cases they were dealing with, and at one point the Jali police actually helped us find a different, more-secure place to stay out there. If Mwambo matched the national average, you’d expect 12 murders there over the course of four months. Even if the cities in Malawi had murder rates of 150 per 100,000, nearly triple the rate of the US city with the most murders per person, we would expect to see 7.5 murders a year and at least 1 or 2 over the course of 4 months. I heard about zero. I discussed a wide range of crimes, including some shootings, with local authorities there, but no homicides whatsoever.

Why am I writing about statistics from Wikipedia in the middle of the night? Because the Internet is serious business.

Data nerds such as myself like to talk about using the “smell test” on their results, and frankly this number just stinks every way I sniff it. Another way it smells is that nationwide, 36 murders per 100,000 people is about 100 murders per week. There are definitely murders reported in the Malawian press, but I would venture that I see about 1 or 2 per week, not 100. Alternatively, we can look at the distribution of all causes of death. Malawi has a death rate of 1350 per 100,000 people, so according to the UNODC murders cause 2.7% of all deaths in the country. That would mean that murder would rank above tuberculosis and ischemic heart disease in this ranking of the top ten causes of death in Malawi. Incidentally, it would also mean murder should itself be on that list, knocking off malnutrition.

The Wikipedia article has numerous caveats and hedges, including the suggestion that the data may include attempted murders as well as successful ones. However, it also has a link to the underlying table from the UN Office on Drugs and Crime. Annoyed by my inability to square the reported murder rate with other facts about Malawi, I decided to see where they were getting it from. In the footnotes, they attribute it to the World Health Organization Global Burden of Disease Mortality Estimates. After digging through the WHO website, I came to this page where one can download the datasets used for the Global Burden of Disease calculations. These are files that contain observations by year, country, gender, and disease, where disease is represented by an ICD code (there are different files for the ICD-7, ICD-8, ICD-9, and ICD-10 codes). If you know the ICD code you want you can look up total deaths as well as deaths by age bracket.

I didn’t get that far, though: none of the files have any entries for Malawi, and the data availability index doesn’t list Malawi data for any year. There is a country code for Malawi (1270) but it doesn’t actually appear to get used. I can’t say for certain where the claim of 36 murders per 100,000 people comes from, but I can tell you it’s definitely not from the WHO Mortality Database.

Now, any number of things could have gone wrong here. Maybe I took a wrong turn as I hunted for the WHO data the UNODC rely on, or overlooked something else obvious. It’s also possible that entries got miscoded, either in the UNODC or the WHO files, leading me astray. Or maybe there was private communication between those two UN offices, and the underlying data actually isn’t public.

Fortunately, there are tricks I can use even when I can’t get my hands on the actual data. Back in 1938, Frank Benford observed that many datasets have the property that the leading digits of numbers (the “7” in “743”, for example) are logarithmically distributed, and death rates were actually one of the examples he leaned on in demonstrating what we now call “Benford’s Law”. If the law holds exactly, we’d expect 30.1% of leading digits to be “1”s, 17.6% to be “2”s, and so on, with a known, predictable percentage for each digit. And we can run a statistical test to see if deviations from the expected pattern are large enough to be meaningful, or are just random fluctuations. Using the firstdigit package in Stata, I ran this test on the UNODC spreadsheet’s mortality rates from 2008, which is the most-populated year in the table. As you can see, there are more leading “1”s than we’d expect under Benford’s Law, and across all digits the deviation from Benford is statistically significant at the 5% level – the p-value is 0.011, so we’re just barely above the cutoff to get 3 stars in a journal article.

It’s possible to delve deeper: what I’m really curious about is not all the statistics – it would be hard to get the ones for big countries like the US wrong – but specifically the figures attributed to the WHO Global Burden of Disease. If I break the data down into observations that list “WHO” as the source and everything else, only the WHO data looks suspicious (p=0.040), while everything else conforms reasonably well to Benford’s Law (p=0.214).* Or I can use the slightly-broader “PH” category for all public health-derived rates. Those look iffy (p=0.025) whereas the non-PH murder rates look alright (p=0.154). What’s more these aren’t just cases of large samples helping me to find spurious “statistically significant” effects: there are just 61 values coded PH in the data, and 187 overall.

The takeaway from that is that not just the Malawi murder but all the UNODC data supposedly derived from public health sources is questionable. I’m not trying to claim that these statistics were necessarily faked intentionally. I can imagine a number of ways they could have been screwed up by mistake. There might even be some reason why Benford’s Law would hold for some of these murder rates and not for others. Even if there was intent I have no idea who might have been responsible. What I am trying to claim is that they shouldn’t be taken seriously, or relied on for anything of importance, until someone can verify their source. And I do think this matters. People rely on these numbers, and draw judgments based on them. A glance at the top-ranking countries on Wikipedia’s list, would, for example, neatly confirm someone’s preconceived notions about Africa being a violent place. The top three African countries on that list are Zambia, Uganda, and Malawi – all have their statistics attributed to the WHO, and none actually appear in the WHO mortality data.

EDIT: I changed the Wikipedia article to remove the entries that I tried to trace down but could not find, until the source of the UNODC numbers is located or they are replaced with something better (Nameless has a suggestion in this post’s comments).

* I looked at all this a while ago but was just sitting on it until a recent Andrew Gelman post that cites the UNODC statistics prompted me to do something with it. I know Gelman wouldn’t like the fact that I’m leaning on p-values for the Benford’s law analysis, but I just don’t have any intuitive grasp of chi-square values.

10 thoughts on “Can the UNODC's Murder Statistics be Trusted?”

Very clever.. Another possible angle to consider: WSJ, “In Medical Triumph, Homicides Fall Despite Soaring Gun Violence” http://online.wsj.com/article/SB10001424127887324712504578131360684277812.html It could be that the data isn’t conforming to Benford’s law because the homicide rate isn’t capturing the same phenomenon globally. In nations with less access to medical care, the homicide rate and the attempted homicide rate are probably extremely close; in more developed nations, they may be far apart.

This seems plausible, but to figure out whether it’s true we’d need to know why Benford’s law seems to hold for most (legitimate) mortality rates. My best understanding from reading about it is that in most cases we really don’t know why it works – in most applications it’s an empirical regularity rather than anything founded on basic mathematics. One case where its applicability _has_ been proven is when the data come from a mixture of different probability distributions (see this paper: http://digitalcommons.calpoly.edu/cgi/viewcontent.cgi?article=1042&context=rgp_rsr)

But that would seem to work in the opposite direction – imagine that homicides come from underlying attempted murder distributions with varying lethality. So you have a variety of distributions of attempted murder cases where a person is shot in the temple, of ones where they are gut-shot, etc. The overall distribution is the combination of . Advances in trauma care could be thought of as throwing out all but the most-severe distribution(s), so actual deaths/100,000 are only drawn from that one. That would make the overall homicide rate data _less_ Benford-like, though.

Or perhaps there’s another line of reasoning you’re going for, that I’m not seeing.

At this point I’m much more dubious about the seemingly-untraceable source of the UNODC’s figures than about the numbers themselves. Maybe they just did some weighted average of other African countries’ rates to get the figures for Malawi, Zambia, and Uganda, which might be fairly accurate – but if so, that should be documented somewhere.

If I’ve found the right documentation, then for Malawi the cause-specific mortality rates are forward projections from the GDB 2004 cause-specific rates to match overall 2008 mortality, with some adjustments to specific causes (but not to violence, which is the category that is evidently considered as murders). That document doesn’t link to the 2004 GDB Update but I think this is the report: http://www.who.int/healthinfo/global_burden_disease/GBD_report_2004update_full.pdf. In that document, Malawi is one of the countries that does not have vital registry data used – hence my inability to find any entries for it in the underlying datasets.

Its cause-specific mortality rates, along with those for 43 of the 46 listed African countries, were calculated by using an average of cause-specific mortality data observed in a range of African countries, including multiple sources from South Africa, and then adjusting those to account for all measured deaths in a country (as well as doing an income adjustment, but I can’t find the exact adjustment they did). Cause-specific mortality was observed in Madagascar, South Africa, Zimbabwe, as well as a range of sites from this report: http://www.scielosp.org/scielo.php?script=sci_arttext&pid=S0042-96862006000300012. None of the sites are in Malawi.

Thanks for the help in figuring out where these rates are coming from, this is really interesting. There’s no question that I might have missed something – let me know if I did.

I used to live in Mexico City, and would chuckle whenever I saw stats that showed Chicago had a similar and Detroit had a significantly higher murder rate than the Mexican capital.

Mexican authorities, notoriously corrupt, have had no incentive to collect accurate crime data. So the numbers that the UN provides for Mexico, I suspect, have always unreported homicides, rapes, and so on.

I bet that there are just as many instances of under-reporting as there are of over-reporting around the world, making cross-country comparisons difficult at best.

Africa has a lower perception of crime because most African nations are small with smaller populations than the U.S, therefore you don’t hear about as many murders as in the U.S press. I’m a little skeptical on the UNODC’s statistics on Africa because some of the countries listed have militias present that are causing problems and I think that should be counted as war violence rather than homicide. It also seems implausible that Somalia would have a lower murder rate than France considering that piracy and poverty runs rampant there, not to mention there are highway robberies still going on in Somalia. Iraq was also corrected in the talks section where Iraq’s murder rate was confirmed at 21 per 100,000 in 2008 and things have changed since 2008 but they haven’t changed that much, and you can’t use 2006 and 2007’s rate of 101 per 100,000 as evidence of drastic change because there was literally a sectarian war going on among the population in those two years. Even though the murder rate is probably lower than it was in 2008, I can’t imagine it going from a crime infested society to a crimeless society like France in only four years.

I trust the UNODC on the U.S and Europe, but Asia and Africa have flaws in them. I’ve heard in Egypt for example, that camel riders will just swoop by and kidnap travelers, yet the murder rate is only 1.4 per 100,000, lower than France. That kind of thing would never happen on some road in France yet Egypt has a lower rate than France. Maybe, the UNODC is using the statistics the Arab and Communist countries give them because they are all completely false. Yemen and Afghanistan are notoriously lawless, tribal fighting, riots, highway robberies, and general homicides are rampant in those two countries yet they both have a lower murder rate than America?

South American statistics also seem flawed. Brazil for example, was shown with a murder rate of 21 per 100,000 in 2012 on Wikipedia but when I checked the UNODC rates in the past they showed the exact same murder rate for Brazil in 2003. Brazil in 2003 had way more crime than Brazil does now so does that mean the UNODC is recycling old statistics and hoping we won’t notice? Meanwhile, in Colombia, the UNODC isn’t differentiating between war deaths and homicides. The FARC is a guerrilla insurgency not just a criminal organization. They’re goal is to overthrow the Colombian government and they happen to use the drug trade as a means of funding their operations. To add those deaths to the Colombian murder rate would be like adding the deaths from the Chechnyan wars in Russia to the murder rate because they use Afghan heroin to fund their operations.

Even Central America has glaring flaws. Mexico in 2012 had 24,000 people killed from the drug war alone not to mention approximately 10,000 disappearances that happened that year. The statistics show 27,000 people murdered in Mexico though, I guess we can’t count the people who have disappeared or the people murdered in Mexico that had nothing to do with the drug war. I’m guessing they aren’t counting the drug war because it had a + sign next to it linking to the drug war. In this case, I do think the Mexican drug war is just a criminalized war. The Mexican drug cartels aren’t trying to overthrow the government, they aren’t doing anything for political purposes, the drug cartels just wanna make money, so isn’t it just a criminalized war. If you include the drug war, disappearances, and non-drug related homicides all put together you have 61,000 homicides, giving Mexico a murder rate of 50.8 per 100,000, over twice as high as what they provide.

Basically, outside of the U.S and Europe these stats seem pretty flawed.