“Questioning The Lancet, PLOS, And Other Surveys On Iraqi Deaths, An Interview With Univ. of London Professor Michael Spagat”

Mike Spagat points to this interview, which, he writes, covers themes that are discussed on the blog such as wrong ideas that don’t die, peer review and the statistics of conflict deaths.

I agree. It’s good stuff. Here are some of the things that Spagat says (he’s being interviewed by Joel Wing):

In fact, the standard excess-deaths concept leads to an interesting conundrum when combined with an interesting fact exposed in the next-to-latest Human Security Report; in most countries child mortality rates decline during armed conflict (chapter 6). So if you believe the usual excess-death causality story then you’re forced to conclude that many conflicts actually save the lives of many children. Of course, the idea of wars savings lives is pretty hard to swallow. A much more sensible understanding is that there are a variety of factors that determine child deaths and that in many cases the factors that save the lives of children are stronger than the negative effects that conflict has on child mortality. . . .

We say that if the war is causing non-violent death rates to increase then you would expect non-violent deaths to increase more in the violent parts of Iraq then they do in the non-violent parts of Iraq. To the contrary, we find this just isn’t so. At least in our preliminary analysis, there seems to be very little correlation between violence levels and changes in non-violent death rates. This should make us wonder whether there is any reality behind the excess deaths claims that have been based on this Iraq survey. In fact, we should question the conventional excess-deaths idea in general.

Then information on some particular surveys, lots of details that are worth reading, fascinating stuff.

Then some general points that arose because some of the stuff being criticized appeared in high-profile scientific journals. Here’s Spagat again:

First of all, saying that something has to be right or is probably right because it has been peer reviewed is quite a weak defense. Peer review is a good thing, and it is a strength of scientific journals that there is that level of scrutiny, but if you look at the list of scientific claims that have turned out to be wrong and that have been published in peer reviewed journals….well…the list just goes on and on and on. Publishing in a peer reviewed journal is no guarantee that something is right. Some of the people who do the referee reports are more conscientious than others. In almost no cases does refereeing ever include an element of replication. Often referees don’t even know enough about literature cited to judge whether claims about the current state of knowledge are accurate or otherwise. Mostly people just assume what they’re being told by the authors of the paper is correct and valid. Peer review is better than no peer review, but it hardly guarantees that something is going to be correct. . . .

Journal peer review is just the beginning of a long peer review process. Thinking that journal peer review is the end of this process is a serious misunderstanding. Peer review is an ongoing thing. It is not something that ends with publication. Everything in science is potentially up for grabs, and people are always free to question. Anyone might come up with valid criticisms.

If you look at Burnham et al. there have been a number of peer reviewed articles that have critiqued it, and said it is wrong. So if you think peer review has to always be correct then you’re immediately in a logical conundrum because you’ve got peer reviewed articles saying opposite things. What do you do now?

Spagat concludes:

I’m happy to give people credit for doing difficult research in war zones. And I’m happy to admire the courage of people who do dangerous field work. But doing courageous field work doesn’t make your findings correct and we shouldn’t accept false claims just because someone had the guts to go out in the field and gather data. Science is a ruthless process. We have to seek the truth. Courage is not an adequate rebuttal to being wrong.

P.S. I was going through my old emails from several years ago and saw this exchange:

Someone asked me: Have you followed the debate on the Iraq death estimates obtained through survey methods? Reference: Mortality after the 2003 invasion of Iraq: a cross-sectional cluster sample survey Gilbert Burnham, Riyadh Lafta, Shannon Doocy, Les Roberts, The Lancet, Oct 13, 2006.

I replied: The study looked reasonable to me. And I pointed to this blog post from 2006, where I wrote some pretty general comments about cluster sampling. There was lively discussion in the comments section (at the time, these Iraq surveys were politically loaded, with people on the left grabbing on to evidence suggesting bad things were happening over there, and people on the right looking to discredit such claims), in particular involving the reluctance of the researchers to describe exactly what they were doing. I wrote:

Burnham et al. provide lots of detail on the first stage of the sampling (the choice of provinces) but much less detail later on. For example, they should be able to compute the probability of selection of each household (based on the selection of province, administrative unit, street, and household). Then they can see how these probabilities vary and adjust if necessary.

Unfortunately, it is a common problem in research reports in general: to lack details on exact procedures it’s surprisingly difficult for people to simply describe exactly what they did. (I’m always telling this to students when they write up their own research: Just say exactly what you did, and you’ll be mostly there.) This is a little bit frustrating but unfortunately is not unique to this study.

Unfortunately, I’d still have to go with this general position: it’s common to not share data or methods (indeed, as anyone knows who’s ever tried to write a report on anything, it can be surprisingly effortful to write up exactly what you did), so that alone is not evidence of a serious flaw in the research. However, given that serious flaws have been demonstrated in other ways (as discussed by Spagat), it becomes more relevant that the researchers can’t tell us what they did. At some point it’s up to them to defend their numbers. As Spagat says, it’s not enough just to point to publication in a top journal of various summary statistics.

I spent three years in the Sunni Triangle during the height of the war, much of it spent doing Intel and related activities. I have direct experience with “surveys” there, and the Marines in particular conducted an actual census in their AO’s (I don’t think anyone else did anything that extensive, but maybe the Army did in places) in order to understand the tribal demographics in street level detail.

I intended to regale everyone with funny war stories about “surveys” in highly tribal, low literacy, extremely low societal trust, societies in the middle a sectarian and a counter insurgency war in which minor conversations can, and did regularly get people killed, but I’ll cut it all short by say: it’s very unlikely that data is accurate enough to so some fancy “excess death” type analysis.

Right, that was my guess back in 2006: the Lancet survey was bogus because the Iraqi interviewers they hired would have had to have a death wish to carry out the random door-to-door methodology the Western researchers prescribed in an ethnic war zone. Not wanting to get holes drilled in their heads for asking the wrong people the wrong questions, the field researchers probably either made up results and/or conducted convenience samples of people who had been vetted by local warlords:

I was asked in private to expand on my experiences. As it happens, I had one notable success in this area, which I may write up one day as a blog post. Instead I’ll make a few random notes:

(1) The refuge factor: perhaps as much as 10% of population left the country with another 10% being internal refugees. This really screws things up since they left from areas hardest hit usually. For example, there was a town in my last AO outside of Baghdad which had been half Shia. After the Sunni slaughtered them wholesale, the remaining Shia left town. The Sunnies then took over their houses and were hoping no one would return (although a handful were trying to get their houses back when I was there).

So if you did a poll in that town asking about deaths in the family you’d get a low number despite the fact that a mini genocide had occurred there.

(2) Anbar was more Tribal than the rest of the country, but It’s hard to image getting real results in most areas without buying off the local Sheikh, which I never heard any survey groups doing.

(3) The sustained level of violence in Anbar (population 1.5m) during 2006 was about as bad the worst peak month in all of Afghanistan (pop 35m) at any point since 2001. There is no way someone was randomly sampling Anbar’s population during that time. I don’t even believe they were driving anywhere. This was a time when the Iraqi government was putting out public service announcements telling everyone to never stop at a Police checkpoint unless Americans were there, because doing so was liable to be a death sentence otherwise.

(4) For a time in Anbar, foreign Arabs were being regularly killed by the locals because of suspected Al-Qaida ties. When polling agencies hire locals, they’re typically not actual “locals”, they’re usually Arabs who at best speak the Iraqi dialect and often not even that. So they did not have anything like the freedom of movement that a Westerner would imagine.

(5) I think it VERY likely the locals hired fudged the results. Everything from never leaving their house and making up data, to asking only friends and family. There’s a good chance this would happen even if there were no danger. Their attitude to government make-work projects like this is significantly more cavalier than westerners are used too.

(6) We had assets available that could ask “poll” type questions of locals, which were able to operate once things cooled down a bit. We were even able to submit questions to these assets which they would then collect answers too. Even if we carefully crafted these questions using survey design best practices and worked closely with extremely good translators, the interviewers were usually unable to ask them directly. They had to use indirect methods, which were different from interview to interview, to get answers. It wasn’t possible to actually ask well crafted survey question consistently.

(7) Some of these interviews had to be cut short because the person being questioned would get violently angry at the interviewer for no reason. Basically, the subject matter would get them wound up and they’d take their anger out on pollster because they were the nearest person (and a free target since they weren’t really local)

(8) Iraqi’s believe in an order of magnitude more conspiracy theories than Americans, all of which are vastly more bizarre than “Obama’s a Muslim born outside the US” or “911 was a CIA operation”. If a stranger came up to them and started asking questions it would often set off God-only-knows-what conspiracy theories in their heads. I seriously wouldn’t be surprised if the median interviewee thought they were talking to the CIA. Actually, I’d be shocked if that weren’t true.

(9) I don’t think the average Iraq “gets” polls the way the average Westerner does. As far as I know they had almost no prior experience with them.

(10) To say that Iraqis have a different relationship with the Truth than Americans would be an … uh… understatement. Iraqis being polled are liable to lie for all kinds of reasons that a seasoned Western pollster wouldn’t be attuned too.

(11) It’s hard to describe just how little societal trust there is. I wasn’t at all surprising when al-Qaida operatives believe American reports rather than reports coming from their own al-Qaida chain of command. Iraq may have a been a relatively trusting society before 1980, but all that is long gone (same for Afghanistan).

(12) The one real success I had in this area involved a little bit of survey data combined with a massive amount of prior information. I would be far more trusting of any analysis exploiting that prior information (in whatever way), than any kind of classical statistical analysis of polling data in this case.

Spagat:”Peer review is better than no peer review, but it hardly guarantees that something is going to be correct.”

I fundamentally disagree. This implicitly assumes that peer review (r) is all about catching errors (e), and, to stretch the math, the problem is that de/dr is very low. But peer review is much more than that. In some ways it is also an opportunity for scientists/journal editors to defend their turf, vent their frustrations, and otherwise maximize their utility — which may have very little to do with discovering errors.

One need not push this line of thought too far to realize that:

1. If peer review does a poor job discovering errors, and guaranteeing the quality of a piece of research, and;
2. If peer review introduces criteria that may undermine quality/truth;

then it follows that peer review need not be better than no peer review. For that claim to have some validity we need to maximize point 1 and minimize point 2 above. In my limited experience, and with rare exceptions, I don’t think we are anywhere close.

I believe a system where Journals accept manuscripts automatically based on a set of checklists might display better operating characteristics.

I think another case in which peer review can be detrimental is that it gives the illusion of quality control without actually providing it.

It’s immediately evident to anyone who has been on a review panel for grants or editorial board for a journal that (a) agreement among reviewers is low, (b) some reviewers are more careful than others to actually read the submissions, and (c) some reviewers are very biased (in a “defending their turf” sense, which I’ve seen work both for and against someone on the same turf as the reviewer — it’s easier to be critical of something you understand well because you know the prior art and can follow the arguments more easily). As a result, the process is incredibly noisy.

First of all child mortality rates have been generally declining all over the world. This is quite a big success story in international development known as “the revolution in child survival”. There are powerful forces at work here described in page 116 of the pdf cited by Dan Wright (e.g., vaccination programs, oral rehydration therapy).

Second, in most places where there are armed conflicts, even quite big ones such as Iraq, there are still large swaths of peaceful territory. In these places the revolution in child survival just continues apace with the conflict being more or less irrelevant. Moreover, in places that are directly conflict-affected there might even be special initiatives to address child health issues.

In short, there can be localized negative effects of a conflict on child health but at the national level these can easily be swamped by powerful positive factors.

Of course, armed conflict does not cause declines in child mortality. The point is that if you accept the concept of causality that underpins the excess-deaths idea then you are forced to conclude, absurdly, that armed conflict causes child mortality to decline.

Andrew,

You say, reasonably enough, that failure to explain oneself and share data does not automatically invalidate a study but starts to become a big problem when serious flaws start to appear one’s work.

Serious flaws have emerged in the Burnham et al. work. To sample the evidence you could read:

Plus there are three other surveys all give dramatically lower violent death estimates than the Burnham et al. one does (The Iraq Living Condictions Survey, The Iraq Family Health Survey and the Hagopian et al. survey just published recently)

If chemotherapy actually slows down growth without stopping it then this would be a very close analogy.

I agree that once you realize what’s going on this isn’t hugely surprising. The only real content is just that the effect of war relative to other factors tends to be weaker than a lot of people think it is.

Interestingly, though, some people get really angry when this underwhelming fact and try to deny it. One possible reason for such an irrational reaction is that some academics are so committed to the to the weak concept of excess deaths that I criticize in the interview that that they think that the only possible interpretation of decline of child mortality rates during a war is that the war is causing child mortality rates to decline.

Rahul – shame on you for recommending that all children receive chemotherapy to help them grow! Similarly, shame on the people who recommend war to help children live longer. OK, now I will go to sleep so I can cause the sun to come up.

Indeed. I was secretly wishing that war was actually causing child mortality to decline. A causal relation like that would throw up real delicious moral dilemmas. It’s not so bizarre if war in a neglected nation brings medicines, doctors, aid-money etc.

See also Mark van der Laan’s discussions and writings on the original Lancet reports: critique of the as,pling scheme and calculation of confidence intervals, and lack of details on the survey protocols. Some interesting statistical methodology came out of these considerations — calculation of robust confidence intervals based on Bernstein inequalities..

Thanks for the link. I agree with most of Mark’s points, but I think he should’ve pushed back a bit against the interviewer’s statement, “Both Lancet reports used a very small sample size. The 1st Lancet went to 33 clusters of 30 households each with each house representing roughly 739,000 people. The 2nd Lancet included 47 clusters of 40 households, for an average of 577,000 each. What can happen if only a few people are polled in a rather large population?”

33 and 47 are not necessarily such small samples! It depends a lot on the level of variation between clusters. In some settings, 33 clusters would be fine. And the bit about the “rather large population” is really irrelevant.

But that’s a small point. Overall, Mark Van der Laan’s comments regarding the big standard errors of the surveys are complementary to Mike Spagat’s comments on the surveys’ biases.

It’s interesting how people intuitively misunderstand sampling. I saw this in field inspections in forensic engineering. Engineers are very distrustful of small samples. Perhaps this is because of a natural tendency to be interested in extreme events, but it was hard for engineers to get the idea that among say 250 houses if I had seen 20 randomly selected ones I could pretty well estimate the cost to repair the entire development.

One context though in which this kind of distrust was quite justified was when the sampling was “haphazard” rather than random. Where for example inspectors simply wandered around and looked at whatever 20 locations they thought seemed likely to be interesting and weren’t too hard to reach, and didn’t have dogs in the yard, and weren’t at the top of hills, or whatever other implicit and unknown biases there were. I suspect surveys in war zones are a lot more haphazard than random. The underlying biases are unknown but usually quite large.

Dunno. I’ve never noticed this general sampling-distrust among Engineers I’ve met. No more than any other profession. As an Engineer it’s almost impossible to avoid sampling.

What’s not so obvious is that sometimes things like non-linearity may trip up a naive sampler even if he’s not haphazard. Say there was prior reason to believe there’s only one house in every 250 that has a exponentially high cost of repair (say, due to asbestos or radon) & the tender / RFP / quote demands a 3% accuracy and the financial penalty of a bad estimate is huge. And the cost per assessment is anyways relatively low, say.

Domain knowledge often leads people to act in ways that may seem stupid to an outsider.

The Iraq data analyzed for the presentation had many areas that experienced no violence at all and a small number of areas suffering a lot of violence. So the situation was similar to the one that Rahul describes. In such cases samples of 40 or 50 clusters can easily lead to estimates that are way off in either direction (details in the presentation). Non-violent deaths are spread much more evenly across space so that can be estimated fairly precisely in small samples.

Also, since violent deaths are fairly rare events measurement error has an asymmetric effect that can push the estimates way too high. You can have low rate of false positives (finding deaths that haven’t actually happened) and a high rate of false negatives (failing to detect deaths that did occur) and you still can end up overestimating by a wide margin.

There have been a large number of studies, by different groups.
As was pointed by Entsophy (perhaps accidentally), there are large biasing factors downward, such as the desire of, ah – ‘new’ inhabitants not to speak of the old ones, and the fact that many of the harder-hit groups would have fled. BTW, this was discussed somewhere; there’s lots of history on the sort of death rates which cause mass exodus, and they’re surprisingly high.

I am a year late, but entsophy’s remarks are a strong argument for believing that news accounts are going to be a huge underestimate of the true death toll in Iraq. Obviously reporters have no capability to count any bodies they didn’t see for themselves and they couldn’t go around attempting to be one man death counters, so much of what they report would be what someone told them. And who could go into neighborhoods where near genocide was occurring? There is also the case of Dexter Filkins, who was a reporter in the second assault in Fallujah and saw no insurgent bodies until the Marines took him at his request to see one. So you can be a reporter in the aftermath of a battle and not have any personal knowledge of any Iraqi dying, unless someone shows you.

I would take these criticisms of the larger death toll estimates seriously if people were just as skeptical,about the obvious weaknesses in trusting media counts–the same problem of people having the incentive to lie and of not being able to see for themselves would be , if anything an even stronger reason for believing that IBC’s numbers are too small.