Air war in Iraq

by John Quiggin on October 15, 2006

Not surprisingly, the publication by the Lancet of new estimates suggesting that over 600 000 people have died (mostly violently) in Iraq, relative to what would have been expected based on death rates in the year before the war, has provoked violent controversy. A lot of the questions raised about the earlier survey, estimating 100 000 excess deaths in the first year or so appear to have been resolved. In particular, the lower bound estimate is now around 400 000, so that unless the survey is rejected completely, there can be no doubt about catastrophic casualties.

One number that is striking, but hasn’t attracted a lot of attention is the estimated death rate from air strikes, 13 per cent of the total or between 50 000 and 100 000 people. Around half the estimated deaths in the last year of the survey, from June 2005 to June 2006. That’s at least 25 000 deaths, or more than 70 per day.

Yet reports of such deaths are very rare. If you relied on media reports you could easily conclude that total deaths from air strikes would only be a few thousand for the entire war. The difference between the numbers of deaths implied by the Lancet study and the reports that shape the “gut perceptions” that the Lancet must have got it wrong are nowhere greater than here. So are the numbers plausible?

I recall seeing only a handful of mentions of air strikes in the mainstream press. In checking my perceptions on this, I found this piece by Norman Solomon (linked by by Dahr Jamil) who notes that a search for “air war” produces zero results for the NYT, Washington Post and Times. Solomon refers to the earlier New Yorker article by Seymour Hersh who makes the same point.

The best source turns out to be the US Air Force Command itself. For October and November 2005, the US Air Force recorded 120 or more air strikes, and this number was on an increasing trend. Most of the strikes appear to be in or near urban areas, and the recorded examples include Hellfire missiles fired by Predators, an F-16 firing a thousand 20mm cannon rounds and an F-15 reported to have fired three GBU-38s, the new satellite-guided 500-pound bomb designed for support of ground troops in close combat. Typical reports of air strikes involve the destruction of buildings in which suspected insurgents are seen taking shelter, or from which fire has been reported. Obviously there is no opportunity to check whether such buildings are occupied by civilians.

An average of 10 fatalities for each air strike seems plausible. If we assume the average number of US plane and missile strikes for the year as a whole was 150 per month, that’s 18000 fatalities for 2005-06. Taking into account strikes by British and other allied forces and by attack helicopters (which seem to be used a lot, but are also rarely reported) it seems likely that Coalition air strikes killed more than 20 000 people in 2005-06.

That’s below the Lancet range of estimates, but in the same ballpark. To explain the gap, I’d suggest that it’s likely that the cause of death has been reported wrongly (or at least, inconsistently with official US accounts) in some cases. I’ve seen quite a few cases where Iraqis have blamed US air strikes for deaths, while the US authorities have denied that there were any strikes in the area and have blamed the deaths on insurgent mortar attacks. That seems to suggest that deaths attributed to air strikes may actually have been caused by artillery on one side or the other.

Based on the survey, and allowing for some misclassification, it seems likely that Coalition air and ground forces have killed between 100 000 and 200 000 people since the war began. The majority of these are military age males, most of whom would have been targeted as suspected insurgents, although we have no real idea how many actually were insurgents and how many were just in the wrong place at the wrong time. Around 70 per cent of all violent deaths in the Lancet survey were of military age males, and presumably the proportion would be higher for the Coalition since they are at least trying to avoid civilian casualties. But even if 80 per cent of those killed were insurgents, that would leave somewhere between 20 000 and 40 000 innocent civilians killed by Coalition forces so far. And of course, the figure also implies that even after 80 000 to 160 000 suspected insurgents have been killed, the situation is going backwards.

What are your thoughts on the change in non-violent/violent deaths distribution between Lancet 1 and Lancet 2 (see Lambert and DD’s “Death Certifciates” post below for contributions from Mike H specifically)?

I’m no statistician (no, really I’m not) but Mike H’s observations have been out there now for well over 3 days and I’m not seeeing any credible rebuttals. Which, of course, doesn’t mean there are none.

At least for the current survey, it’s impossible to say whether non-violent deaths have risen or fallen. Unless the confidence intervals for non-violent deaths in the previous survey were very tight, I imagine the same was true in that case. Here are the point estimates and confidence intervals for the current survey.

Pre 5·4 (4·1–6·8)

03-04 4·5 (3·2–5·8)

04-05 5·0 (3·8–6·3)

05-06 6·9 (5·1–9·5)

Post-invasion 6·0 (4·8–7·5)
p value 0·523

None of the changes here would meet standard tests of statistical significance. On the other hand, both violent deaths and all deaths have increased significantly.

So, it doesn’t seem to me there’s a story here (unless checking back on the original survey produces something surprising).

Skeptics use the gut-check argument from incredulity a lot. The combination of foreign-language problems, the difficulties of collecting information in war time, and (perhaps most important) successful message control by the military (and self-censorship by the media) altogether mean that American intuitions about Iraq are pretty much worthless. (The idea that the gut is a usable source of information is a dubious one in any case, but especially here.)

A compounding factor is that many Bush loyalists and super-hawks would be happy enough if every single Iraqi were killed. They just don’t want the biased liberal media to report about thing like that.

I thought “air war” is when, say, the USAF and Royal Air Force are fighting Luftwaffe – and those Eye-raki fellas there don’t even have any anti-aircraft machine guns, let alone fighter jets. Isn’t it more like air slaughter?

If I’ve understood the complaint correctly, it’s this apparent drop in non-violent deaths from 76% of the count to 46% of the count that’s causing difficulty. The two studies used somewhat different methodologies; the sample size and period were different. Death certificates were asked for only in some of the Lancet 1 interviews. Given this, and given the specialism of the maths, I think it would take a statistician with access to population data to interpret these figures meaningfully (pace John Quiggin above; I was busy writing this in the meantime). And my gut feeling is that I don’t see any special reason to assume that non-violent deaths would scale with violent deaths. Perhaps infectious disease epidemics caused in part by degraded sanitation would do it: I don’t see any reason to expect an increase in heart disease, which was said to be the leading cause of death pre-war. And if news reports are to be believed, sectarian gun killings seem to be up; is there any reason for these to lead to a concomitant increase in non-violent deaths?

OK, enough of my speculation. Contrary to what some commenters are suggesting, Lancet 1 had this to say about the non-violent death rate:

It is surprising that beyond the elevation in infant mortality and the rate of violent death, mortality in Iraq seems otherwise to be similar to the period preceding the invasion. This similarity could be a reflection of the skill and function of the Iraqi health system or the capacity of the population to adapt to conditions of insecurity.

And Lancet 2 had this to say:

… families might have misclassified information about the circumstances of death. Deaths could have been over or under-attributed to coalition forces on a consistent basis. The numbers of non-violent deaths were low, thus, estimation of trends with confidence was difficult.

These studies have known limitations which the researchers freely admit to, but that doesn’t render them worthless. If you want to discern specific mortality trends at a higher level of accuracy, I think you need to lobby for a better survey. And if you don’t get that from your government, I think you should assume that the news from Iraq is bad.

What astonishes me is that with 800-900 attacks a week, and with two airstrikes a day why would anyone think that a lot of people were not dying?

Surely, if you were a fiscal conservative you ought to be livid with the administration if they were telling the truth: the huge amount of money being spent on ammunition, expensive air-strikes every day. And yet nobody is dead?

Doesn’t it just seem more logical to agree that lots of people are being killed? Precisely as intended when we drop bombs on them.

Mr. Quiggin, I think the concern that brownie has raised is that the previous study had a vastly different proportion of deaths due to violence. I think the argument is that in the first study a very large proportion of excess deaths were due to non-violent causes, while the new study estimates pre invasion deaths to be largely due to violent causes. In response to Brownie I think the explanation is very simple. The original study had a very wide confidence interval, and in the cause of death data (which divides the main data set into smaller groups of counts) the confidence intervals would be even wider. The confidence interval around the number of non-violent deaths in the previous study is probably so wide that it cannot be considered to be statistically significantly different to the figures observed in the new study – i.e. while the point estimates look radically different, the imprecision of the old study prevents any conclusions from being drawn about this.

(I have read the critique of this aspect of the study, which was presented somewhere in comments to Tim Lambert’s blog I think, and this critique was constructed based on the point estimates of deaths only. One needs to consider the errors as well when comparing the old and new studies, and the old study had very wide confidence intervals).

The ony way to get around this problem would be to do a really big study – say, on the scale of the sorts of national surveys which occur in Australia, I think about 5000 households – so that all subgroup analyses (women vs. men, type of death categories by year, etc.) can be performed with narrow confidence intervals. If the governments of the coalition of the willing really cared about Iraqi lives they would have commissioned such a survey using these very techniques several years ago…

I expect that the administration is perfectly well aware that the Lancet numbers are closer to reality than their own. The most obvious explanation of the disparity between the estimates is not some statistical finesse but mere duplicity on the part of people who have been lying relentlessly for the past five years about a great many things.

Over the centuries civil wars have caused hideous carnage. 600,000 Americans died in the US Civil war, more than died in World Wars 1 or 2.

Little Bosnia, with its population of 5 million, managed to slaughter 100,000 in its three year war.

England’s Civil War killed 10% of the populace at the time.

The Spanish Civil War, spookily like Iraq in many ways, killed maybe half a million.

Etc etc.

Given that all other major civil wars have indeed caused deaths in the hundreds of thousands, and killed up to 10% of the populace or more, why should Iraq be different?

Why should Iraq, unique amongs civil wars, have a death rate of only 0.2% of the population, 50,000 people? Rather than the normal 2-10%? Why should Iraq get away with only 50,000 dead, when, looking at history, you should expect 500,000? What is so specially peaceful about Iraq that makes it such a fortunate place?

Is it the lack of widespread violence? Is it because all sides are showing admirable restraint? Is it because the Americans are brilliantly precise in their bombing?

I repeat, we expect 2-10% of a population to die in a civil war, that’s what happens. Yet amazingly enough, according to the war mongers, Iraq has got away with an astonishing 0.2%!!

Probably the main reason why the statisticians aren’t rushing to respond to Mike H is that this is really difficult to explain without getting into sampling theory. Robert made a good stab at it in comments at Tim Lambert’s:

The basic issue is that small pieces of larger aggregates are often less well-estimated than the aggregate as a whole. We can pretty much estimate how many deaths occur on the nation’s highways each month, but there will be more variation in how many will occur on any particular day, or on a particular road, or by single-car accident vs. multi-car accident. Day-to-day variation in those counts doesn’t invalidate an overall guess at the total number for all causes for all locations.

I’m willing to grant that people who are expert in statistics are going to want to vet this survey. But I think that they are a minor or vanishing factor in the debate we’re seeing right now.

The survey adds a little more detail to what we’ve known all along: Iraq has become a hellhole. A precise number with precise confidence intervals and rigorous methodology supposedly gives the report more credibility, but not really, because the topic has now been diverted to technical questions about the report rather than the substantive data (WAY TOO MANY PEOPLE KILLED).

The right’s compulsive-obsessive tics are closely tied to their kneejerk politics, and completely disconnected from any understanding of statistics, or of the facts on the ground.

So I’ll settle for 100,000 — 600,000 extra deaths, both numbers being well within the FAR TOO MANY range for this particular war.

12: Any chance of seeing some working for those figures? Perhaps a year by year breakdown, because after all, Lancet 1 covers a shorter time frame than Lancet 2. I’m not sure where you’re getting the ‘erases’ and ‘replaces’ from; they sound a bit rhetorical to me.

Lancet 1 does cover a shorter time frame than 2, but there isn’t any similar study to compare Lancet 2 with for the period after Sep 04. On the other hand, we now have 3 studies (Lancet 1 and 2 and the UNDP survey) to evaluate estimates for the first 18 months

“The more you want to know about precise cause, the larger your sample has to be.”

And the largest mortality study done thus far is the UNDP effort. We know how its numbers stack up against Lancet 2 for violent (or “war-related” deaths), at least for year 1.

As I’ve said over at Deltoid, I can see allowing some latitude for discrepancy from study to study when it comes to the various subsets of non-violent and violent, but not the two main categories of death themselves. In order to be considered reasonably accurate, I think the studies have to be reasonably consistent in terms of violent and non-violent death.

Otherwise, this seems an awful lot like resurrecting tens of thousands of dead from study 1, only so you can kill them off again in study 2, but in a markedly different way, in order to balance their bottom lines

John E: I was thinking about your question, and my theory is that the Lancet study serves as a convenient outlet for the pro-war side. For the anti-war side, you’re right that the study only carries a tiny increment of information, so it’s hard to get that worked up about it. For the pro-war side, the news has been so relentlessly bad, that the Lancet study serves as an outlet for frustration. Taking someone’s boxing metaphor of the other day (was it yours?) they’re like boxers who are dead on their feet and don’t know how to stop hitting, but the Lancet study is the best thing they’ve found to punch in nearly two years.

“It is surprising that beyond the elevation in infant mortality and the rate of violent death, mortality in Iraq seems otherwise to be similar to the period preceding the invasion. This similarity could be a reflection of the skill and function of the Iraqi health system or the capacity of the population to adapt to conditions of insecurity.”

Charlie, I meant to address this earlier, but forgot.

Point taken. I’d forgotten about that statement from the first study.

That doesn’t change the fact that the study authors and defenders of the study attributed the increase in non-violent deaths over the baseline extrapolation (whether by infant mortality, accidents, slight increases in heart disease deaths, etc) to the effects of invasion. I spent many hours arguing with learned defenders of the study, their position being that this increase was entirely expected, given there was a war going on and all.

Now, Lancet 2 tells us that Iraqis were actually less likely to die from natural causes and acidents for the first 2 years of the war.

Recall that in the 2004 survey, the headline-grabbing 100,000 excess death figure owed much of its punch to the 40,000-plus excess deaths attributed to non-violent causes. The 2006 survey tells an exceedingly different tale of mortality in Iraq during the first 18 months of the war. Not only is all of the excess death toll in the 2006 survey the result of violence, it’s actually greater than the entire 112,000 excess death figure, because the death rate from non-violent causes is significantly less than the base line non-violent death rate. My rough math indicates an extrapolated violent death toll of more than 130,000 for the second survey, for the same time frame covered by Lancet 1.

Later in the same thread you quote Lancet 2:

Application of the mortality rates reported here to the period of the 2004 survey gives an estimate of 112,000 (69000 – 155000) excess deaths in Iraq in that period. Thus, the data presented here validates our 2004 study, which conservatively estimated an excess mortality rate of nearly 100,000 as of September 2004 …

I’m not seeing an explicit breakdown of this 112,000 figure in Lancet 2. It seems to be a point estimate total of all deaths for the period of the Lancet 1 study. The authors do, however, provide a table of mortality rates by category and by year on page 4. I think what you’re getting at is that the non-violent death rate there drops from 5.4 pre-war to 4.5 in 2003-4, and then rises again to around 6.9 for 2005-6 (I can’t find equivalent figures in my copy of Lancet 1 for comparison).

Are you saying that this means that the 112,000 figure in Lancet 2 consists entirely of violent deaths?

Bob McManus, apparently unaware that the Hague conventions were signed before the airplane was invented, writes:

I thought aerial bombing of civilian areas by an occupying power was a war crime under the Hague Conventions.

Anyway, the closest that international law comes to banning these types of attacks is in article 51 of Geneva Protocol 1, which bans “indiscriminate attacks…not directed at a specific military objective” or “which may be expected to cause incidental loss of civilian life…which would be excessive in relation to the…military advantage anticipated.”

So I don’t think that these kind of aerial attacks are war crimes, since they’re not indiscriminate.

John Quiggin, on the basis of an article that discusses air attacks at a rate of 35 to 120 per month, concludes that there have been an average of 150 attacks per month. What am I missing?

I also doubt the 10 dead per attack figure. It wouldn’t surprise me if most air attacks killed no one at all. Most military attacks simply miss. The US has shot several billion bullets in Iraq, but they haven’t killed a billion people.

On the other hand, the explanation that most of the Lancet-reported air deaths come from helicopters or artillery makes a lot of sense to me. The US likes to fight at night, and I find it hard to believe that someone who was sleeping when an attack occured has any idea if their loved ones were killed by planes, helicopters, mortars, or what.

“The US has shot several billion bullets in Iraq, but they haven’t killed a billion people.”

Well, yes, but it’s hard to imagine that you can drop three 500 pound bombs in a “close combat” situation in an urban area, and not kill anyone. As for the number of attacks, everything I’ve read suggests that the increasing trend to use air attacks, noted in the report, continued through 2005-06.

As regards Mike H’s comments, I think my response at #2 is perfectly adequate. This may be a bit hard to see if you don’t understand sampling theory, but the position is clear to anyone who does.

“Are you saying that this means that the 112,000 figure in Lancet 2 consists entirely of violent deaths?”

To reiterate what Brownie says, yes. The authors also explicitly state that all of the 112,000 excess deaths extrapolated for Lancet 2. I can’t recall off the top of my head if the statement is in the study itself or the companion paper.

“As regards Mike H’s comments, I think my response at #2 is perfectly adequate. This may be a bit hard to see if you don’t understand sampling theory, but the position is clear to anyone who does.”

I think it’s anything but adequate, John. While I respect your statistical kung fu, I’m having trouble taking it seriously when applying it to the real world, in the context of comparing the two Lancet studies.

Perhaps if we were dealing with a much larger population, and a much larger per annum baseline mortality number, the variances would be easier to dismiss as blips.

But we’re not dealing with a baseline mortality estimate of a million people, we’re dealing with one in the range of 120,000 – 132,000. Juxtapose the swings from study to study between violent and non-violent deaths and we’re looking at numbers that are huge, half the baseline number (although admittedly for an 18 month period).

I’d also be interested in any thoughts you might have on my post to Kevin (17), in relation to the UNDP study, which was conducted using similar methodology to the Lancet studies, but with a sample size many times larger.

29: While my grasp of the maths involved is minimal, reading around this subject suggests that the p-value quoted for non-violent death rates in Lancet 2 needs to be taken into account. It’s cited as 0.523. I think this may mean that there’s a high chance that the actual figure, if measured, would turn out to be significantly different than the sample.

I guesstimated roughly the same figures for air strike deaths — less than the Lancet number, but still significant.

The more I’ve thought about how you could carry out a study in Iraq under today’s violent conditions, the more skeptical I’ve become of how random the sampling was. It would have been foolhardly for researchers to pick neighborhoods out at random and start nosing around.

What I would have done if I was one of the Iraqi researchers assigned to do random surveys is call around and find neighborhoods where the Man-in-Charge (tribal leader, ayatollah, gang chieftan, whatever) wanted my presence. I’d tell the Americans we did it all randomly, but I’d actually only go where I was invited.

“I think this may mean that there’s a high chance that the actual figure, if measured, would turn out to be significantly different than the sample.”

There’s one sure fire way to find out, Charlie. We could ask the authors themselves what their data reveals as the most likely estimate for violent and non-violent deaths in the second study, covering the same time frame as the first.

I’m sure they have these numbers at hand, and it would be interesting to have them on the record.

I’m going to have another go at stating the significance of the p-value because I’m not happy with my last effort. A p-value of 0.52 means that if you used the same methods to sample a (hypothetical) series of identical populations, there’s a 48% chance that you’d observe a smaller difference than your study showed and a 52% chance that you’d observe a larger difference than your study showed. So in the case of Lancet 2, the authors are saying we may as well assume that all years for which deaths were reported were identical in terms of the non-violent death rate.

Compare this with the Lancet 2 p-value for the change in the rate of violent deaths: 0.0001. This states that there would be a 99.99% chance of observing a smaller difference than the study showed if you used the same methods on a (hypothetical) series of identical populations.

In the recent Israel-Lebanon War, “The IAF flew some 15,500 sorties, including some 10,000 fighter sorties, and attacked a total of around 7,000 targets.” Since Israel killed about 1,000 people by all means, I conclude that on average it takes 10 air attacks to kill 1 person, and that John Quiggin’s figures are inflated by a factor of 100.

Quiggin may find it “hard to imagine that you can drop three 500 pound bombs in a ‘close combat’ situation in an urban area, and not kill anyone,” but that’s exactly what happened most of the time in Lebanon.

On these Lancet discussions, the point has been made repeatedly that “gut feelings” are a poor way to form judgements about things outside our usual experience. Hence, I suggest that people hesitate a little before relying on what their gut tells them about the effects of airpower.

There are many potential biases in the study but how can you tell which way they point?

If Steve is right, the surveyors might either have gone to where someone had a particularly large collection of bleeding stumps making the results a high estimate or they might have missed out the most chaotic and dangerous areas altogether making the results a low estimate.

On the violent/non-violent death split issue, as I understand it the confidence intervals of the two surveys overlap. Likely for some value in the middle neither estimate looks unlikely. Constructing a confidence interval along similar lines provides a lower bound of 400,000 deaths above Saddam. It seems odd to think that the former numbers give cause for concern but not to conclude that since the invasion there have likely been 10,000 excess deaths a month. The confidence intervals are there precisely to address issues like this.

Asking for the data on the exact period doesn’t seem likely to do much beyond the figures John Q quotes and it is, unfortunately, not standard practice to provide all possible breakouts of survey data.

If anyone was better positioned to do a better job of counting the dead it would be the US governent and they have signally failed to provide any meaningful statistics on the matter.

Ragout, I think you have the burden of proof utterly reversed. We have *data* indicating a large proportion of casualties from airstrikes – although this probably includes helicopters and artillery. Is this possible? Well, yes – that’s Johns point. Lebanon is not a good template for several obvious reasons. Iraq is heavily urbanized while the warfare in Lebanon was concentrated in the southern villages – places, I’d add, where Israel strongly encouraged civilians to flee from. Bombs in cities kill more people than bombs in mostly empty villages do…

Ragout,
your estimate of deaths in the Israel Lebanon War may be low by a factor of two and the sorties are not of the same kind. The Israeli attacks were supposed to be based on excellent intelligence, aimed at destroying infrastructure mostly in evacuated areas and the count includes the massive clusterbombings of evacuated areas in the last days. By contrast the US strikes are not in evacuated areas, with the best will in the world not mostly based on excellent intelligence and mostly in a live combat support role mostly in urban areas.

Still, I will be sure to take arguments from gut feelings skeptically.

There is a difference between “sorties” and “attacks” or strikes. In many sorties no attacks are carried out. CENTAF news releases apparently speaks of “weapons released” to refer to sorties where ordinance is fired. The news article referenced by John referred to “military strikes”. It’s unclear whether this sumarizes all air missions involving weapons release.

Whether the Israeli air force uses a similar terminology I don’t know. The lebanon war is probably not comparable to Iraq. The Israeli air force did bomb civilian infrustructure and towns, but the bulk of their missions were directed at suppressing rocket fire from Hizbullah and attempting to eliminate their deeply entrenched bunkers. They would repeatedly bomb hillsides trying to elimate fire. There has been no comparable air barrage in Iraq except for the beginning of the war.

That said, I agree that John had very little basis to hypothesize that a “military strike” would cause on average 10 deaths. Nevertheless he is correct to observe that there has been an increase in the air war in the recent year, which one assumes will have resulted in more casualties, intended and otherwise. Whether the Lancet study accurately capures this, I don’t know. Although in general the methodology is sound.

There is no factor of 10 error in my figures. Four people take me to task for not distinguishing between sorties and attacks. But the sentence (!) I quoted made exactly this distinction: 15,500 sorties attacking 7,000 targets.

Yes, there are differences between Iraq and Lebanon, but I doubt that the differences were as big as you say. We’re talking orders of magnitude here. Even if US air strikes were 10 times more deadly than Israeli ones, Quiggin is still off by a factor of 10.

Let me use my imagination about how a typical air attack goes down. First, US forces and Iraqi insurgents get in a firefight. When the airstrike called in by US forces arrives, the Iraqis have all taken cover in holes they’ve dug in the ground. These “foxholes” prove a very effective defense, and no Iraqis are killed. However, Iraqi fire is suppressed, as the Americans intended.

In contrast to air strikes, here’s a passage about artillery from Rick Atkinson’s book about the conventional war in 2003 that’s stuck with me:

To avoid hitting American soldiers, an artillery tube deliberately fired a single round six hundred meters beyond the target. A spotter in the city radioed instructions to walk the artillery back in hundred-meter increments.

… [The] earlier suggestion of additional air strikes now seemed more appealing. “What we ought to do is put a precision munition on him instead of walking artillery all over town.”

Ragout,
the Israelis were doing different things. Most of the Israeli targets were supposed to have been evacuated while most attacks in Iraq are supposed to be support for troops in combat, a very different proposition. You are also lowballing the death toll in Lebanon.

Ragout, the only 10 to 1 ratio in your original post was fighter sorties:deaths, so it was a bit hard to figure out that you actually meant targets:your unstated estimate of air strike deaths.

The fact that the USAF figures show a ratio of 10 missions to an air strike, whereas the Israeli figures are 2 sorties to 1 target reinforces the point that these figures simply aren’t comparable.

As others have stated, the Israelis were trying to destroy hillside rocket emplacements and infrastructure, and, in the latter case to avoid casualties as far as possible. The US use of airstrikes is specifically to kill suspected insurgents in urban areas, and the force protection doctrine clearly allows the destruction of buildings in which they are sheltering.

“Mike H, before commenting any further you need to read a text on sampling theory, particularly in relation to confidence intervals and hypothesis testing. Charlie gets it exactly right at #35.”

John, the study authors extrapolated 40,000 excess non-violent deaths for the 18 month period covered by Lancet 1. That figure represents a 22% increase over the baseline mortality figure for the 18 months.

If a figure that large isn’t ” statistically significant,” then perhaps the real issue here is whether statistical surveys are being given undue credence when measuring mortality in an environment like Iraq.

“As I said, Mike, before commenting any further you need to read a text on sampling theory, particularly in relation to confidence intervals and hypothesis testing.”

I disagree John, and this is looking more and more like a game of dodgeball. And you still haven’t answered my question posed in post 31. It’s a very straightforward request.

As I mentioned at Deltoid earlier tonight, in 2004 I debated knowledgable defenders of Lancet 1, who argued that the 40,000 excess non-violent deaths made sense, and were to be expected as the natural consequence of the country being placed into a war environment. Specific defences were provided to support increases in infant mortality, accidents, etc.

Now Lancet 2 erases these deaths, and replaces them with a much different cause of death, violence. I’m now being told by some of the same previously mentioned defenders of the study that the deaths they lobbied for in 2004 are no longer statistically relevant. Forget about them, I’m told. Go read a book on statistics, I’m told.

Well I think what I really need to do is get to my doctor, real quick.

I’ve had so much smoke blown up my ass over the last 3 days, my lungs probably look like those of two pack a day man.

Mike H, the significance of an effect size is just not the same thing _at all_ as the statistical significance of an effect. A study can, for example, indicate a very large effect, but in a way that gives us little reason to think that it wasn’t just random noise — big size but little statistical significance. And a study can give us very, very, very good reason to think that there’s a real effect, but that the effect in question is subtle to the point of near-nonexistence — small size but strong statistical significance.

This is a very common layperson’s error in mucking about with statistics: stats people give the word “significant” a special, technical meaning, and you can’t use your ordinary language intuitions about it in trying to suss out what should or shouldn’t count as significant.

Here’s another way to scale Israel in Lebanon. 6 weeks of air assault killed approximately 1,000 people. The Iraq war contains about 28 6-week intervals, so on a pure scaled up basis that would be 28,000 dead, which is in the ballpark of John’s estimate.

Do you reject the assertion that the huge variance between violent and non-violent deaths has little statistical significance? If so, why? I’d also be interested in what your take is on the cause of the variance.

Which study is giving us the truest picture of mortality in Iraq? Is either? What would you base your choice on?

In Lancet 1, 40% of the 98,000 ci mid-point deaths were attributed to non-violent causes. In Lancet 2, it’s 0%.

A p-value of 0.52 means that if you used the same methods to sample a (hypothetical) series of identical populations, there’s a 48% chance that you’d observe a smaller difference than your study showed and a 52% chance that you’d observe a larger difference than your study showed. So in the case of Lancet 2, the authors are saying we may as well assume that all years for which deaths were reported were identical in terms of the non-violent death rate.

Thanks, Charlie. Are there any conclusions we can draw about the probability that the number of non-violent deaths would fall by 40%? Does a p-value of around .5 not just mean that the figure has an equal chance of being higher or lower? In which case, the non-violent deaths could be even further into the negative than they already are (extrapolating from the pre-invasion baseline)?

At the very least – and even allowing for sampling theory explanations for the discrepancy – are the authors justified claiming mutual corroboration between the first and second studies if the percentage of deaths that can be attributed to non-violent causes has fallen more than 40% according to their own figures, however ‘safe’ they might be based on p-values?

JohnQ,

The UNDP ILCS report was a survey of over 21,000 households and not, as you say, based on media reports.

The interesting analysis above is based on number of missions/strikes/sorties. I think there is a very complementary analysis to be done that is based on amount of bombs dropped, measured by weight.

I think there is reason to conclude that we have dropped more on Iraq than what was dropped on Germany in WWII. I think there are further comparisons which suggest that we could easily have killed 300,000 people just in airstrikes.

Give me a break. You’re quibbling over factors of two (air attacks vs. sorties, or estimates of Lebanese deaths), when the figures suggest that Quiggin is off by a factor of 100. And I find your argument that there’s a vast difference between airstrikes targeting rockets in Lebanon and airstrikes targetting mortars in Iraq to be unconvincing.

In any event, Quiggin’s *only* justification for his estimate is that he “can’t imagine” dropping bombs that don’t kill anyone. As I’ve shown, that’s what happened 90% of the time in Lebanon.

By the way, this article puts the number of airstrikes in Iraq at about 300 a year in both 2004 and 2005. That’s a factor of 6 lower than Quiggin’s estimate.

Oh, and another thing. John Quiggin argues that because the US ratio of sorties to airstrikes is a lot higher than Israel’s, that means the figures for airstrikes aren’t comparable. But these figures just mean that the USAF flies a lot more non-combat missions than the IAF (transporting cargo, buzzing crowds in a “show of force,” intelligence-gathering). It certainly does not imply that US combat airstrikes are any more deadly than Israel’s, as Quiggin claims.

U.S. Airstrikes Take Toll on Civilians … Eyewitnesses Cite Scores Killed in Marine Offensive in Western Iraq … U.S. Marine airstrikes targeting insurgents sheltering in Iraqi residential neighborhoods are killing civilians as well as guerrillas … The number of airstrikes carried out each month by U.S. aircraft rose almost fivefold this year, from roughly 25 in January to 120 in November, according to a tally provided by the military.

I was surprised that the leading cause of violent death changes significantly from the 2004 Lancet study in the current Lancet study. Air strikes or artillery were the main cause of violent death in the post-invasion period in Lancet 1. In Lancet 2, gunfire is the leading violent cause of death, including the March 2003-April 2004 period covered by Lancet 1. Since coalition-attributed fatalities in Lancet 2 account for 31% of violent deaths, and air-strikes account for 13% of violent deaths and insurgents don’t have planes, that means that attributed airstrikes alone account for 42% of coalition-inflicted fatalities.

Quiggen’s point about the air war coverage is a bit off the mark. Air strikes account for 12-14% of violent deaths in all three post-invasion periods (see Table 4 in the study). If media reports of the causes of violent fatalities were proportionate to the surveyed causes, we should see no change in media reports about air strike fatalities from March 2003 to the present day. Car-bombing fatalities as a percentage of all fatalities increase monotonically across the three time periods, so we would expect to see more attention given to them. Also, reports tend to focus on mass-fatalities rather than individual engagements. Indeed, most reports about gun-fire combat fatalities arise from reports of US fatalities or discovered bodies. Other engagements are not reported.

Even if “other explosion/ordnance” and unknown causes from table 4 are added to airstrikes (which presumably would cover all not-vehicular IEDs), it is the May 2004-May 2005 period that sees 39% of fatalities due to possible airstrikes. The last period would have possible airstrike account for 26% of all fatalities in that period. Gunfire and car-bombs are still the main story, accounting for 71% of all deaths in the most recent period, and 69% overall.

“do you reject the assertion that the huge variance between violent and non-violent deaths has statistical significance?”

Yes, I am rejecting exactly that; without something confidence intervals, the point-values that you are focusing on cannot be meaningfully compared, no matter how large their difference. (since the confidence intervals might be bigger still).

I’m aware of the site you and John linked to. I think you misunderstood my request of John and Kevin when you typed this:

“There are some big differences in the suveys about how things are classified, what was included, etc. But that link states that the results are consistent.”

Iraq Mortality.Org claims consistency between Lancet 1 and the UNDP. I’m not asking John and Kevin to explain the variances between Lancet 1 and the UNDP study. I’m asking this in relation to Lancet 2.

Iraq Body Count has come out with its own analysis of the latest Lancet paper. They think it’s rubbish. They’re polite about it, though.
I thought they made a number of very good points, but won’t rehash them here.

I’d rather rehash the one I’ve been making at Deltoid–

The UNDP survey found that the number of war-related deaths (supposedly excluding criminal murders, from what I’ve read) was 19,000 to 28,000 in the first 13 months. The latest Lancet paper happens to have the violent mortality rate for exactly the same stretch of time in Table 3. The 95 percent confidence interval is 1.8 to 4.9 per thousand per year. Assuming a population of 25 million, that’s 45,000 deaths per year at the bottom of the confidence interval, or about 50,000 for 13 months.

To get the UNDP estimate to agree with the lowest end of the confidence interval, you have to tack on 20-30,000 criminal murders. To hit the midrange figure (3.2 per 1000 per year, or about 80,000 deaths per year or 90,000 in the 13 months), you’d have to tack on about 60-70,000 murders.

I don’t think the UNDP and the second Lancet report are in agreement, and it’s not a question of me taking a subset of the data and overinterpreting it–the Lancet authors did the confidence interval for that time period themselves.

Iraq Body Count makes other arguments. I hope people look at them calmly and think about them–there’s been a bit too much talk about how the only reason one could have for doubting this 400-900,000 estimate is rightwing buffoonery or sheer ignorance, and I don’t think that’s quite right.

BTW, John Emerson, there’s an enormous difference between the war as portrayed by Iraq Body Count statistics taken at face value and what one sees in the Lancet report. If you ever bother to read IBC’s analysis of their own data, in most months of the war the number of civilian deaths that the media attributes to American action is extraordinarily low. In most months it is a few dozen people. In the third year of the occupation IBC could clearly attribute 370 civilian deaths to occupation forces. If one takes those numbers at face value, the US troops are doing an extraordinary job avoiding civilian casualties, especially when you couple such (wildly implausible) numbers with the number of insurgents Michael O’Hanlon claims we are killing–he said in late 2005 it was 760-1000 per month.

OTOH, if you take the Lancet numbers as gospel, Iraq is a Vietnam-style free fire zone.

I think the truth is somewhere in-between, but I don’t know where exactly. I think the media is clearly underreporting killing by our side, but don’t know by how much. But it seems to me that American citizens should very much want to know the order of magnitude of the killing our own troops are committing. Tens of thousands of civilians killed per year by our troops in reckless free-fire zone style killing is qualitatively different from a few hundred per year tragically killed by accident.

The difference between the UNDP figure of 24,000 war-related deaths (for one year) and the 130,000 plus excess violent deaths I believe Lancet 2 extrapolates for 18 months is rather large, don’t you agree? Given that the size of the UNDP study seems to provide it with a decided advantage over either of the Lancet studies in terms of precision, how would you describe its impact on the credibility of Lancet 2’s violent death figure?

I’d like to harken back to your original comment in this thread, because I missed something important in my initial reading of it:

“At least for the current survey, it’s impossible to say whether non-violent deaths have risen or fallen. Unless the confidence intervals for non-violent deaths in the previous survey were very tight, I imagine the same was true in that case.”

If you look at pg 7 of the companion article, under ” Non-violent death rates,” the study authors make this statement:

“Immediately post-invasion, the death rate due to non-violent causes dropped slightly, then stayed level for the next period, but began to rise in the period from June 2005 until June 2006.”

According to the authors’ own words, there was a decrease (albeit slight) in the rate of non-violent death for the 18 month period immediately after invasion. You’re right that it’s impossible to say if the non-violent rate has gone up or down over the entire length of the study, and the authors confirm that in the same paragraph I cite, but we’re talking about a 40 month period as opposed to an 18 month one.

The point estimate for non-violent excess deaths for the entire 40 months in Lancet 2 was 54,000. However, the non-violent excess death estimate from Lancet 1 was 40,000, over only 18 months. While the authors conclude the former number wasn’t large enough to assert an increase in non-violent mortality, the latter number evidently was big enough, and over a shorter period of time, to constitute an increase in the post-invasion mortality rate, and an increase in the excess death estimate.

BTW, so as not to mislead, IBC attributed about 10,000 civilian deaths to US forces in the first two years. But nearly 7000 of those died in the first two months (during the invasion) and roughly 2000 more died in the two separate attacks on Fallujah. What’s left over are months and months and months where it’s literally dozens of deaths per month caused by coalition forces, in comparison to many hundreds per month (in the first two years remember) caused by criminals or insurgents.

IBC also admits that in the third year there were thousands of deaths it couldn’t attribute to any one actor. But when you stick to what they can attribute, the overwhelming majority of violent deaths after April 2003 are caused by Iraqis (or the foreign jihadists).

Again, that’s a completely different war from the one depicted in the Lancet paper. It makes a difference if you’re an American citizen–it’s not just that the Iraq War was wrong, but it’s also a question of how many civilians our own forces are killing directly.

“Yes, I am rejecting exactly that; without something confidence intervals, the point-values that you are focusing on cannot be meaningfully compared, no matter how large their difference. (since the confidence intervals might be bigger still).”

Philosopher, I’m not sure what you mean in regard to ” without something confidence intervals…..” Did you miss a word?

I don’t want to come off like a snarkasaurus when I say this, philosopher, but I’m a little tired of the cherry-picking and goal post moving that seems to be going on with defenders of the study when it comes to point estimates and confidence intervals. Whenever a point estimate starts taking serious flak, someone pulls the pin on a confidence interval smoke grenade. When the point estimates are holding up well in a debate, defenders of the studies tout them as gospel.

You can’t have it both ways. The authors have gone on the record in both studies with numbers they cite as the most likely values, for all the categories that are of the most consequence. Why don’t we try to stick with them whenever possible?

Any response to my question in relation to which study you think is most accurate, and why?

On the number of strikes, the report I cite, direct from USAF gives more than 300 strikes in the three months Sep-Nov 2005, so there seem to be some big contradictions in reports from the same source.

Looking over the CENTAF reports, they cite 50 or so “close air support” operations per day fairly consistently – this includes reconnaissance and similar, but not transport operations, and there are always several cases of combat support reported. The reported operations in Iraq seem to include a lot less “expenditure of munitions” than those for Afghanistan where the total number of operations is smaller.

So, it seems pretty clear that, if the reporting is correct, air strikes are being used very differently in Iraq than in Afghanistan and certainly than in Lebanon. The descriptions I’ve seen suggest a higher degree of lethality.

An average daily air-strike death toll of 65 people seems a bit steep, but if near daily fixed-wing strikes and less frequent helicopter strikes are combined with more intense bursts of violence like the inital invasion and the Fallujha battles, then the Lancet figures seem more plausible.

David Kane, every time you raise your head in these forums it is to say that you have “asked for the data” and it wasn`t forthcoming, with the attendant insinuation of fraud or bloodymindedness. Anyone who has worked with these large survey samples knows that the organisations which collect them don`t just give up the data on a whim to anyone who comes along wanting to “replicate it”. They have serious ethics requirements which prevent them from doing that easily. You like to imply you are familiar with statistical analysis and data auditing, in which case you know this is true for almost any survey of this type, and the way you present their reluctance to share the data as evidence of suspicious dealing is quite disingenuous.

If, on the other hand, you have an actual criticism of the methods of the paper – and by “actual” I don`t mean “it disagrees with Dubya” or “that can`t be right!” or “it doesn`t concur with another study which I happen to believe is a fraudulent crock of shit” – then feel free to write a letter to the editor of the Lancet explaining your position. I`m sure they`ll be willing to publish whatever analysis you bring to bear on the substantive elements of the paper – i.e. the sample design, the sample weighting or the statistical comparisons. If you don`t have any argument with these elements of the paper then accept the figures presented and get over it.

Mike H, this issue of confidence intervals is not a smoke grenade or some other dodge, it`s an essential part of the consideration of the results. Given what you and Brownie have presented of the authors` statements in the conclusion one could assume that they have drawn too strong a conclusion about the comparability of the studies, but it is quite common for scientists to overstate the results of their own papers, and such an act in the discussion in no way invalidates the results of the paper. Choosing to disagree with their conclusion that the papers are comparable, and instead concluding that statistical error in the first paper prevents comparison, in no way invalidates the second paper (or the first).

Is it also not the case that in Lancet 1 they excluded the Fallujah cluster? Had they included it the confidence intervals for the violent deaths estimates would have been narrower, and the point estimate more like the point estimate from Lancet 2. In a way this whole discussion is driven by imprecision which the authors themselves introduced in order to make the paper less biassed. I think they should get some credit for that, regardless of whether you think their conclusions about comparability were too strong.

Given what you and Brownie have presented of the authors` statements in the conclusion one could assume that they have drawn too strong a conclusion about the comparability of the studies,
I don’t think it’s fair to assume that. As Robert pointed out more than once, you can get a significant agreement between studies about the total number of deaths without getting a significant agreement about the total number of non-violent deaths in particular, and the authors never claim the latter.

I would suggest that, given the statistical insignificance of the change in non-violent deaths, one of the authors shouldn’t have later listed the breakdown of excess deaths by cause the way he did after Lancet 1, since apparently the error bars on each of those numbers was so huge as to make them totally uninformative. However, the published study didn’t make that claim.

Is it also not the case that in Lancet 1 they excluded the Fallujah cluster? Had they included it the confidence intervals for the violent deaths estimates would have been narrower, and the point estimate more like the point estimate from Lancet 2.

No, actually, if it’s included the confidence intervals become much larger and the point estimate’s well above Lancet 2. Lancet 2’s between the two estimates you get from Lancet 1, including vs. excluding Falluja.

As I said on another thread, that makes sense to me–Falluja did happen, and really should raise the death count, but the actual Falluja cluster data they got in Lancet 1 is insanely high and (as the authors themselves point out) may not accurately represent the entire 3% of Iraq’s population that cluster was supposed to cover.

Mind you, I have 1 class’ worth of stats knowledge, so whether it makes sense to me or not isn’t particularly important. I’m just saying it doesn’t seem like there’s an intuitively obvious discrepancy so huge we can accuse the authors of incompetence and/or fraud.

Mike H., confidence intervals are not “smoke grenades” or “kung fu” or any other form of mysticism. They are a basic advanced high school / college into course level statistical concept that you actually need to understand if you are trying to interpret results from random samples. The fact that you don’t seem to is not evidence that you are a no-bullshit, sensible, down-home sort of guy. It just means that your attempt to cherry-pick hopelessly imprecise numbers from subsamples and say that they are implausible is straight-up propaganda.

On the other hand, the stuff on the discrepancy between the UNDP results and the Lancet ones is important. Somebody needs to unpack the exact survey questions and sampling methods here. Steve Sailer’s point at 33 is also an interesting one, although I don’t know how one would follow up on it. Perhaps detailed questions to the surveyors? The authors did try to address this point somewhat in their writeup. Also, the bias that surveying “safer neighborhoods” would introduce into the study is totally unclear — it would seem to bias things toward lower, not higher casualty counts? Finally, Lancet 1 in particular was done before Iraq lapsed into total civil war.

It’s sensible, from a non-warhawk source, and does raise many reasons to doubt the scale of the Lancet 2 results. They take no position on what if anything might have biased the study sample.

One thing for sure at this point: the initial (Lancet 1) estimate of 100,000 excess civilian deaths is looking not just plausible but actually pretty low. Even the IBC’s passive count based completely on press reports of violent deaths alone is approaching 50,000. And passive counts are almost guaranteed to be underestimates. I suppose a few years down the line the Lancet people will come out with an estimate of 1 million and we’ll all be arguing vociferously over whether it’s only 400,000.

There’s all kinds of caveats to attach to this, but there’s as yet no historical basis to claim a civilian death rate due to bombing higher than 0.2 deaths per sortie. Assuming American intelligence were accurate enough to kill four times as many insurgents as civilians (a big assumption), your estimate would still be off by a factor of 10.

“Philosopher, I’m not sure what you mean in regard to ” without something confidence intervals…..” Did you miss a word?” Yeah, sorry, that was supposed to be “something like confidence intervals”, since there are other mathematical ways of getting at significance than the intervals per se. (There’s already been discussion on this thread about p-values, for example.) But nothing in this particular conversation is turning on that.

mq is right about how completely fundamental the distinction between the point-estimate and the confidence intervals are. I’m sure some defenders out there are sloppy about this, too — like I said, it’s a very common misunderstanding that people without any stats training make all the time, regardless of political persuasion. But all the people who seem to know anything about stats are putting a lot of emphasis on the CI, which is as it should be. The point-estimate is of course part of that — the CI is an interval around the point-estimate — and it is also of course easier to report, talk about, and throw around, but it is only part of the story, and a meaningless part without some indication of statistical significance.

Sometimes an extreme hypothetical case can help illustrate matters. Suppose someone did a truly awful survey of Iraqi mortality, in which they only check on 50 people; and with those 50 people, there had been 1 pre-invasion deaths and
4 post-invasion deaths. Even for the lower estimates, this is an entirely plausible kind of result for such a small sample — small samples are more hostage to the luck of the draw than large samples, which is why you want large samples when you do this right. Now, if we don’t check at all for significance, then we have something like a 20 per 1,000 pre-invasion death rate and a 80 per 1,000 post-invasion death rate. Multiply that difference by the population of Iraq, and you get something like
2.1 million excess deaths. Such a huge point estimate! But I hope that I have set this up so it is clear why the size of the point-estimate is simply no indicator whatsoever of whether you’ve actually found something statistically significant. In this case, the sample is so small, that the CI would be absurdly huge. Indeed, I suspect that if one quickly crunched the numbers, this wouldn’t even be a big enough difference, for samples so small, to indicate any statistically significant difference between pre- and post- at all, let alone one of such a magnitude. (It’d be pretty hard to guarantee representativeness in a sample that small, too, but that’s another matter.)

Moral of the story: the bigness of point-estimates or their differences is just no indicator whatsoever of any significance of the effect.

* * * * *

I’ve just looked back at the earlier _Lancet_ paper, and if I didn’t miss anything (and, hey, it’s 3:30am where I’m at, so it might have slipped past me), then there’s something very telling in it, for the particular line of argument that Mike H. is running: _the authors never make any claim to statistical significance for an increase in nonviolent deaths_. On p. 1861, there is a specific claim about a “58-fold” increase in _violent_ deaths, as well as the increases in the _overall_ death rate — but I’m just not finding anything reported in there about specifically nonviolent deaths that has a CI attached to it. And it looks like Lancet II is the same in this regard. The closest they come to making any such claim is in the following from p. 5, but if you look closely you’ll note that no claims to significance are made there: “Excess mortality is attributed mainly to an increase in the violent death rate; however, an increase in the non-violent death rate was noted in the later part of the post-invasion period (2005–06). The post-invasion non-violent excess mortality rate was 0·7 per 1000 people per year (–1·2 to 3·0).” See? No CI reported, and on p.7 they explicitly say that this was not significant. (Did someone already say this part up-thread somewhere? My apologies for being redundant, if so.)

So, Lancet I makes no claims about nonviolent deaths increasing; and Lancet II likewise finds nothing of statistical interest to report about nonviolent deaths. So in this regard they are totally consonant with each other.

You have a reasonable point, Bruce, but air support seems to be used very differently in Iraq – a much smaller proportion of close air support missions use weapons, but they seem to be directed much more at killing insurgents at close quarters , rather than against dug-in positions. And most of this is taking place in or near urban areas, whereas that was true of only a small proportion of the attacks in the Gulf War, Kosovo and Afghanistan.

I concede that 10 is probably too high, but, given the differences, the value of 1 implied by your analysis (since the 1 to 4 ratio of clearly civilian to putative insurgent appears about right) is too low.

The data on missions vs strikes leads me to think misclassification is probably more important than I first thought. With 50 close air support missions per day by fixed-wing aircraft alone, it’s likely that there will be lots of cases where people are killed by artillery or even small arms but where it seems plausible to those on the ground to blame airstrikes.

I appreciate your efforts to explain, but even for we non-statisticians the concept of a confidence interval is pretty well understood.

When Kaplan talked about a “dartboard” with respect to Lancet 1, he was ridiculed by supporters of the study who rushed to explain why 8,000 was not as likely a figure as 98,000 for projected excess mortality in Iraq. The net must be literally bursting with examples of comments and posts from sites like CT where the fundamentals were trotted out for the benefit of people like Kaplan. Similarly, it’s not difficult to find quotes from Horton or any of the Hopkins authors in which the 98,000 (more commonly referenced as 100,000) is mentioned as the best estimate we have for excess mortality.

Two years on and when some observers compare the figures in Lancet 1 with Lancet 2, we’re told that we ignore the confidence interval at our peril. Well, okay, let’s not ignore it, but let’s at least be consistent with the emphasis .

the authors never make any claim to statistical significance for an increase in nonviolent deaths.

It really isn’t the argument of those raising questions that the authors make this claim. The fact remains, whilst non-violent deaths account for 40% of the 98,000 point estimate for Lancet 1, they’ve dipped below zero (in that the rate has gone negative from the pre-invasion baseline) for Lancet 2’s 112,000 (or 130,000 extrapolated). Over 40,000 non-violent deaths have been replaced by an additional 60-odd thousand violent deaths.

Provide 99 statistics-based reasons how such a discrepancy may have come about, but then please go on to explain to me why the authors are still within their rights to say the studies are mutually corroborative?

The only sense in which the studies can be said to be mutually corroborative on the specific question of composition of excess deaths, is that the confidence interval’s in both overlap, which, as any fule no, is no corroboration at all.

This is hardly an unimportant point, given mutual corroboration is one of the strongest indicators that we can have faith in the results produced by similar studies. Which is probably why the authors both to claim corroboration at all.

Regarding air operations in Iraq: Close air support sorties are those flown in direct combat support of troops. It’s in the nature of CAS that the aircraft will only fire (or indeed do anything rather than circling) if they get a call from the forward air controller with the troops, unless somebody has a Canadians-in-Afghanistan speed-freak brain storm, so the ratio of strikes to sorties will be low. I’d also point out that more of this will be going on in Iraq than Lebanon, as there are five and a half coalition divisions spread out over a wide area, more than twice the maximum number of troops the Israelis ever sent into Lebanon.

Also, the Israelis launched a quasi-strategic air offensive, which implies they picked a list of targets off the map and set out to bomb them – so, once discounted for the inevitable aircraft that go technical, find the target weathered in, fail to meet the refuelling tanker etc, you’d expect the two numbers to be similar.

Second, the figure of 40,000 claimed as the number of deaths recorded by the MoH in 2002 is false. No specific citation is offered by the Lancet authors for this figure other than a vague attribution to “informed sources in Iraq”. But official Iraqi figures for 2002, forwarded to IBC courtesy of the Los Angeles Times, show that the Ministry registered 84,025 deaths from all causes in that year. This excluded deaths in the Kurdish-administered regions, which contain 12% or more of the population.

Thus, the actual MoH figure for 2002, even while excluding Kurdistan, stands at 70% of the estimate of 120,000 that, per the Lancet authors, “should have been recorded” nation-wide in 2002. It may (or may not, given its post-2004 casualty monitoring system) be true that the “ministry’s numbers are not likely to be more complete or accurate today”.

And by the way, just in case anyone needs convincing that IBC are not ‘warhawks’, as an earlier poster put it, try reading the last paragraph of their response to Lancet 2:

On 9/11 3,000 people were violently killed in attacks on the USA. Those events etched themselves into the soul of every American, and reverberated around the world. In December 2005 President George Bush acknowledged 30,000 known Iraqi violent deaths in a country one tenth the size of the USA. That is already a death toll 100 times greater in its impact on the Iraqi nation than 9/11 was on the USA. That there are more deaths that have not yet come to light is certain, but if a change in policy is needed, the catastrophic roll-call of the already known dead is more than ample justification for that change.

You can believe 655000 Iraqis have died.
But only if you believe it was paradise under Saddam.

The Lancet this week published a report claiming that 655,000 people have been killed in Iraq.
It is based on flawed research that would shame any publication, except one that had previously been responsible for giving Andrew Wakefields MMR “research” the oxygen of publicity.

The research is based on the pre war death rates under Saddam Husseins regime before the 2003 invasion, assessing those afterwards, and labelling the difference war related deaths.

Unfortunately the researchers have made two huge errors.
Their figures only stand up if adult Iraqis living under the Ba’athist regime hardly ever died. Either of natural or violent causes.

Secondly they included the data to undermine their conclusions in their own report.

The researchers claim the death rate in pre war Iraq was 5.5 per thousand and backed up this claim in a country of 25 million people with an analysis of 82 deaths. Thats a death rate, half that of the EU but perfectly possible in a “young” country with a high birth rate, low median age and good life expectancy.

Unbelievably the same research claims infant mortality was 10%.

5.5 deaths per thousand means that 137,500 Iraqis died each year.
10% infant mortality in a country of 34 births per thousand and 25 million means that 85,000 of these deaths were under one year old.

Leaving a death rate for Iraqis over one year old of 2.1 per thousand.

Less than half that of the lowest country in the world.

Paradise.

And on the subject of violent deaths pre invasion?

.

They drew the following conclusion “As there were few violent deaths in the survey population prior to the invasion all violent deaths can be considered “violent excess deaths”
82 deaths analysed.

Read that again. No violent deaths under Saddam.
Not even domestic violence. Prison works.

This may tell you something about the motivation and rigour of the analysis.

Or a desire for headlines.

The World Health Organisation gives a figure of 9.03 deaths per thousand in Iraq pre invasion.

A figure which ,put through the Lancet prism would have reduced its headline death toll by hundreds of thousands.

Don Johnson, 71: That point is a reasonable one, and some of the statistics and sampling questions are reasonable too, but all reasonable points on these questions are being swamped, and that was inevitable. And the statistical arguing makes swamping easier, not harder.

That the Lancet attempt at precision was, in the context of this particular debate, doomed. (If they cut a few corners to make “the best case”, it was even more doomed, of course).Any report they made whatever would have received enough flak that it wouldn’t be regarded as credible by the media. And while I fully support the political purpose of the study and think that their release date was perfectly timed, I stillthink that their effort was futile.

The population able to understand statistics and willing to have their mind changed by a careful statistical argument on a highly contentious issue is small, and there’s so much junk statistcs out on various issues that the good statistics are automatically devalued (just because so few people really understand stats, and also because so many don’t trust anyone at all).

My opposition to the war is mostly based on its fraudulent justifications, strategic stupidity, and failure, rather than on the brutality or not of the American troops. Based on what you’ve said, Id expect to see lots of war supporters say that we need to “take the gloves off”, since we’re killing too few Iraqis.

It really isn’t the argument of those raising questions that the authors make this claim. The fact remains, whilst non-violent deaths account for 40% of the 98,000 point estimate for Lancet 1, they’ve dipped below zero (in that the rate has gone negative from the pre-invasion baseline) for Lancet 2’s 112,000 (or 130,000 extrapolated). Over 40,000 non-violent deaths have been replaced by an additional 60-odd thousand violent deaths.

I think the problem is that you can’t meaningfully do arithmetic with point estimates.

I think the problem is that you can’t meaningfully do arithmetic with point estimates.

Except, apparently, when you’re looking to use one point estimate in one study to corroborate another in another.

Are you saying that there is no statistical merit in the claims by the authors that the studies are mutually corroborative because the point estimates are in the same ballpark? If so, I’ll settle for that.

The second it bcomes legitimate to do the like for like comparison of point estimates and use this as corroboration, we surely are entitled to point out the glaring contradictions in the constitution of each?

Apple pies and steak pies are both pies, but I’d only put custard on the former.

You’re right, John Emerson, I’d expect a lot of rightwingers to want to “take the gloves off”. A number have already expressed that sentiment. Leaving aside the pro-genocide forces, the one sort of encouraging thing I’ve seen in this debate is that even some rightwingers (such as the editors at the Washington Times) think the death toll is at least 200,000. A week ago I would have thought it impossible to get anyone outside the far left to believe a number that high.

I’m a little perturbed at the near-invisibility of the issue I raised regarding the UNDP/ICLS confidence interval and the corresponding confidence interval in Table 3 of the latest Lancet paper. Only mq in response 82 seems to think it matters. If this latest Lancet paper is plausible, then some sort of explanation for the apparent discrepancy is needed.

103: Are you saying that there is no statistical merit in the claims by the authors that the studies are mutually corroborative because the point estimates are in the same ballpark? If so, I’ll settle for that.

I’m not competent to judge the study’s authors in that way, but it could be that their statement was incautious, yes.

It’s still the case that researchers have (twice now) been to Iraq and asked some of the people there how many in their households had been killed since the invasion. And they came back with some numbers. The problem we are all faced with is finding an explanation for those numbers. The choices here are, essentially:

– The death rate from violence is significantly higher than passive reporting suggests
– The respondents were lying
– The researchers committed a major error in their sampling methodology

And it’s possible that we could combine these explanations in some way.

There is no ‘jury out’ situation here. A criticism that doesn’t amount to more than a criticism of some of the rhetoric of the presentation does little to get you closer to understanding why the numbers are as they are; nor, crucially, does it make the numbers go away. In my view, without a plausible explanation for these numbers, concern is the appropriate response and hand washing should wait.

Let us get down to limiting cases. Baghdad makes up something over 20 per cent of the population, and 40-50 dead shot people are found daily. If the same pattern is national, that would be 160-200 sectarian war victims a day, before the usual 10-20 victims of dramatic guerrilla violence. 3 years, 7 months=1291 days=251,745, and that’s without any allowance for coalition activity or mass-casualty bombings – or for places like Fallujah, Ramadi or Diyala province.

And the pattern is national. Very much so. If you tot up Baghdad, Basra, Mosul, Kirkuk, Kerbala and Najaf out of the top 10 cities, all of which have severe violence, that gives you more than half the population – and we still haven’t included Samara, Baqubah, Fallujah, Ramadi or Diyala.

There is no ‘jury out’ situation here. A criticism that doesn’t amount to more than a criticism of some of the rhetoric of the presentation does little to get you closer to understanding why the numbers are as they are; nor, crucially, does it make the numbers go away. In my view, without a plausible explanation for these numbers, concern is the appropriate response and hand washing should wait.

It’s a little deeper than that, Charlie. One of the strongest indicators we could have that Hopkins has things about right is if Lancet 2 validates and corroborates Lancet 1. I suggest that’s why the authors have claimed this to be the case.

You say it could be that such claims are “incautious” yet don’t believe you are qualified to judge. Well, if the basis for claiming corroboration was a mystery beyond the comprehension of all but statistical sciences PHDs, then you might have point, but the basis for such a claim is a 98,000 point estimate in Lancet 1 and 112,000 in Lancet 2 for the same period. They’re in the same ballpark, aren’t they? The point is that the composition of each figure is entirely contradictory, invalidating any claim of mutual corroboration. Over 40% of deaths from one cause in Lancet 1 have been replaced by the only other possible cause in Lancet 2.

The importance of non-corroboration cannot be overstated, as it immediately invites the conclusion that both cannot be right, which in turn means one must be wrong.

The point is that the composition of each figure is entirely contradictory, invalidating any claim of mutual corroboration.

This is wrong. If, in sample A, mortality has increased and the main cause is traffic accidents, while in sample B mortality has increased and the main cause is heart disease, the conclusion that mortality has increased is indeed reinforced. As to the main cause in the population as a whole, “further research is required” – a conclusion which appears in all too many studies.

The point of contradiction is not that Lancet 1 said the greater number of deaths was 30-40 year old shoemakers from Baghdad, whereas Lancet 2 claims it is 20-something plumbers from Mosul. The deaths have one of two causes: violence or non-violence. Even allowing for double weight of misattribution, to go from 40%+ non-violent deaths in Lancet 1 to sub-zero in Lancet 2 takes some doing if all bias has been avoided.

Moreover, that mortality has increased is not in dispute. If the authors used the findings of Lancet 2 and Lancet 1 to claim mutual corroboration for an increase in mortality and nothing more, you’ll get little argument from me or anyone else, I suspect. But they specifically went further than this. They highlighted a point estimate of 112,000 deaths for the first 18 months of the conflict and compared this with the 98,000 estimate from Lancet 1. They are explicitly claiming mutual corroboration not just for a rise in mortality, but in the estimation of that rise.

Given the finite number of causes of death is 2 (violent and non-violent) and not 22, the composition of both estimates is significant and where they are found to be contradictory – as they are – it is evidence against mutual corroboration on the specific question of point estimates.

If the authors want to row back on the corroboration claims about the point estimates for the first 18 months of the year, than I’ll accept that composition of those point estimates becomes less significant.

107: The importance of non-corroboration cannot be overstated, as it immediately invites the conclusion that both cannot be right, which in turn means one must be wrong. You can fill in the rest.

The corollary of ‘the studies don’t explicitly support each other’ is not ‘the studies disagree’. The degree of confidence you’d need to claim contradiction is unavailable, as we’ve discussed.

So while logical opposition between the studies, if it existed, would be good for your argumentative drive, I think the best you can claim is that one or more of the studies is incomplete. So more data is needed.

John: “The best source turns out to be the US Air Force Command itself.”

Yes, but I think you need to pay less attention to what they say about “airstrikes,” and more attention to what they say about tons of bombs dropped.

As far as I can tell, the above analysis is largely focused on number of aitstrikes/sorties, and relies on statistics released by the US. Trouble is, guns don’t kill people; bullets kill people. Airstrikes don’t kill people; bombs kill people. (I realize airstrikes deliver munitions aside from bombs, such as bullets, but I’m more concerned about the bombs.)

While the US does release certain vague figures regarding “airstrikes” (a term which I haven’t seen defined very clearly anywhere), it seems very, very reluctant to tell us the total tonnage of bombs dropped on Iraq.

Information can be found in daily “CENTAF airpower” summaries such as this, but I think the information is purposely vague, and never reveals tonnage, as far as I can tell (we see terminology such as “airstrikes,” “missions” and “sorties”). John cites Jamail’s article which takes a close look at those summaries. By the way, I’m puzzled by this statement of Jamail’s:

Carrier-based Navy and Marine aircraft flew over 21,000 hours of missions and dropped over 26 tons of ordnance in Fallujah alone during the November 2004 siege of that city.

I can’t figure out where he got that from (“26 tons”). I also think the idea that Fallujah received only 26 tons “during the November 2004 siege of that city” is dramatically understated.

Anyway, as far as I can tell, the US has released such information (tonnage dropped on Iraq) exactly once, here:

So far for OIF II, 3rd MAW has dropped more than 500,000 tons of ordnance … but this number is likely to be much higher by the end of operations

I believe this statement has been used as the basis for reporting by Hersh, here:

Since the beginning of the war, the press release said, the 3rd Marine Aircraft Wing alone had dropped more than five hundred thousand tons of ordnance

(John, you point to Solomon, who points to Hersh, who points to the US press release. However, Solomon and Hersh leave out URLs, which I’ve provided.)

As far as I can tell, the Marines press release has otherwise been overlooked, in general, even though I think it is credible and unintentionally revealing. When that statement is analyzed (please follow the links cited above), I think it provides strong support for the hypothesis that any analysis based on number of “airstrikes” dramatically understates the situation.

The cited quote indicates that the 3rd MAW was dropping bombs at a rate of 2,000 tons per day.

If the 200 aircraft of the 3rd MAW fly to Fallujah one morning and drop, in aggregate, 2,000 tons of explosive, is that what the US calls one “airstrike?” I think the data shows that the answer is probably yes. For obvious reasons, this is a big problem, and I think it’s a problem that’s being completely overlooked.

Consider that this one hypothetical “airstrike” has delivered about 10 pounds of explosive for each person in the town. Therefore this one “airstrike,” if perfectly efficient, could kill 350,000 people.

By the way, it’s been reported that 20% of the buildings in Fallujah have been totally destroyed. The coverage area of the 3rd MAW includes Fallujah.

Detail supporting the above can be found by following the links cited in #64. This includes a reasonability check showing that the 3rd MAW is indeed capable of dropping 2,000 tons per day. I think when a lot of people see Hersh talking about 500,000 tons, they figure it must be a typo. I think there’s very good reason to understand that it’s not. By the way, a reasonability check based on cost per bomb (and comparing this to total expenses for the war) also checks out nicely.

“An average of 10 fatalities for each air strike seems plausible”

I think a plausible range of per-airstrike fatalities could be 0-300,000. I think the US is deliberately using a term that is utterly vague and therefore meaningless. That’s why I think we need to pay attention to tonnage, not “airstrikes.”

#76: “the problem is that we don’t have precise data on US strikes”

Exactly. That’s why we need to look beyond counting “airstrikes.”

By the way, Hersh makes sort of a factual error that has the effect of understating his case. He says this:

Since the beginning of the war, the press release said, the 3rd Marine Aircraft Wing alone had dropped more than five hundred thousand tons of ordnance

Strictly speaking, that is true. However, it implies that 3rd MAW started dropping bombs at “the beginning of the war.” They didn’t. They weren’t Iraq until 2/04.

In other words, someone reading Hersh could logically conclude that 3rd MAW dropped 500,000 tons in a period of 20 months. That’s incorrect. The actual period was 9 months.

John, re your 86, you may be interested in the matching stats for weapons dropped, which is a more useful stat than “targets” or “strikes” for this analysis. (A target can be prosecuted by one plane or many, instantaneously or over time, and may be a topographical point or area; ordnance drops by comparison are relatively discrete.) With the increasing use of precision ordnance, and the wide variety of bombing platforms (from strike aircraft to B-52s) it’s also a more useful stat than tonnage.

The increase is largely due to the increase in precision aiming, which makes every strike more deadly. But it’s hard to see the Air Force in Iraq achieving mean deaths-per-weapon numbers that are so very far off the curve.

Alex wroteLet us get down to limiting cases. Baghdad makes up something over 20 per cent of the population, and 40-50 dead shot people are found daily. If the same pattern is national, that would be 160-200 sectarian war victims a day

1.the situation in Baghdad every day since the start?
2.Its not national.
Large areas of the Shia south and Kurdish north are relatively calm.

On the specific question of composition of point estimates, the studies are indeed contradictory. I agree this may or may not be significant. As ever, context is everything. So here is the context direct from the pages of Lancet 2:

Since the 2006 survey included the period of time contained in the 2004 survey, we could compare these two results for the time frame from January 2002 through August 2004. In 2004 we estimated that somewhere in excess of 100,000 deaths had occurred from the time of the invasion until August 2004. Using data from the 2006 survey to look at the time included in the 2004 survey, we estimate that the number of excess deaths during that time were about 112,000.

That these two surveys were carried out in different locations and two years apart from each other yet yielded results that were very similar to each other, is strong validation of both surveys.

This is an explicit claim of mutual corroboration based on similarity of point estimates that is simply unsustainable given the contradictory composition of each. I’m not sure what data you feel we are lacking given the cause of death is a binary “violent” or “non-violent” and both studies produce figures for each? Rather, I can imagine some ways in which cause could be misattributed based on how each family defined ’cause’, but not to the degree of difference evident in reports.

“Lancet 2 validates Lancet 1” is an extremely powerful message. To make such a claim absent any justification is a serious enough matter even without considering what this means about the integrity of the data in either Lancet 1, Lancet 2 or both.

Remove whatever level of “rhetorical punchlining” you feel I’ve added, but you’re still left with the authors’ claim of mutual corroboration that borders on the mendacious.

They are explicitly claiming mutual corroboration not just for a rise in mortality, but in the estimation of that rise.

I don’t see what you are getting at here. On my reading, they are saying that both studies show increased mortality and the magnitude of the increase is about the same. Putting it another way, if the national figures corresponded exactly to the point estimates in the 2004 study, it wouldn’t be at all surprising to get a sample like the 2006 sample. And vice versa.

To anyone who says we shouldn’t pay too much attention to point estimates, I nod in agreement. Daniel Davies made that point in his CIF post and he also said that about the 2004 study way back then. As he says, the right way to look at these things is to ask: if the mortality situation really wasn’t all that bad, what would be the chances of getting a sample like this?

I see Mike H complaining that defenders of the first study pushed the point estimates too aggressively. For my part I plead not guilty, but I concede that some people did. When Les Roberts got away from the Lancet referees he got a bit excited at times. That’s understandable but so is the fact that IBC are annoyed with him.

The UN estimates that around 35,000 homes and businesses were destroyed in Lebanon during the war from all causes (air, artillery). Other estimates I’ve seen speak of 15,000 homes destroyed in South Lebanon. The Lebanese death toll is estimated at 1200.

This strongly suggests that very few of these destroyed buildings were occuppied at the time they were hit.

On my reading, they are saying that both studies show increased mortality and the magnitude of the increase is about the same.

My issue is with the “magnitude” component. The magnitude is comprised one way in Lancet 1 (>40% 0f 98,000 deaths have a non-violent cause) and another in Lancet 2 (all 112,000 deaths from violence).

Where is the legitimacy of a claim that “the magnitude of the increase is about the same” if the respective compostions of the increases are contradictory? Again, where respondents were having to perm one from 50 when attributing cause, differneces in composition lose significance. Where the choice is “violent” or “non-violent”, nothing explains such a swing in attribution from study to study absent bias of some sort.

At the very least, such differences disallow claims of mutual corroboration, if they don’t threaten statistical validity directly.

I wonder if the change in the proportion of casualties between the two studies can be explained by changing perceptions of the war and its effects. Iraqi’s that were glad of the war/occupation after 18 months were willing to either honestly or dishonestly claim their losses were not due to the war, but a couple of years later with the situation having deteriorated or at least not improved in many places they blame the death of a relative a few years ago on the occupiers either directly or indirectly. Of course to some extent this depends on whether the cause of death was asked as an interview question, or derived from death certificates (or confirmed, at least).

I wonder also when they talk about death certificates entitling people to benefits of some sort, in a largely broken economy it would seem possible that if any meaningful monetary gains can be got from death certificates its not unreasonable to wonder if its possible there is fraud going on, and then some of the results of the survey might be open to question. If in 2004 you could get benefits for any cause of death, but in 2006 you get more cash if your loved ones have been killed in violence it might explain why the larger proportion of deaths is attributed to violence.

While it wouldn’t surprise me to find the overall scale of deaths in the war to be of the level predicted by this study, there does seem to be a possibility of systematic fraud going on, at least for someone uninformed of the way in which death certificates are issued, certified, checked, and what benefits might be gained from them.

The UN estimates that around 35,000 homes and businesses were destroyed in Lebanon during the war from all causes (air, artillery). Other estimates I’ve seen speak of 15,000 homes destroyed in South Lebanon. The Lebanese death toll is estimated at 1200.

This strongly suggests that very few of these destroyed buildings were occuppied at the time they were hit.

You know, I wish I’d been told at the time that the Israelis were being so discriminating and humane. It wasn’t the impression that I got from following anti-war websites and newspapers, but there you go.

Where is the legitimacy of a claim that “the magnitude of the increase is about the same” if the respective compostions of the increases are contradictory?

Because…the magnitude of the increase is about the same?

You keep saying that if the two studies’ excess death estimates have very different breakdowns by cause, their agreement in overall magnitude is automatically no longer significant. Can you say why? Not in terms of intuition but with an actual statistical argument?

Second, the figure of 40,000 claimed as the number of deaths recorded by the MoH in 2002 is false. No specific citation is offered by the Lancet authors for this figure other than a vague attribution to “informed sources in Iraq”. But official Iraqi figures for 2002, forwarded to IBC courtesy of the Los Angeles Times, show that the Ministry registered 84,025 deaths from all causes in that year. This excluded deaths in the Kurdish-administered regions, which contain 12% or more of the population.

Now that’s a pretty significant disagreement. I wonder if we can get more detail on either the Lancet authors’ “informed sources” or the IBC’s “official figures,” and whether or not both were based off counting death certificates?

Over a thousand people were killed in Lebanon by the airstrikes in 4 weeks – that’s those who died on the spot.

What the excess mortality resulting from event will be – considering all the injuries, considering a million refugees, considering hundreds of thousands of cluster bombs laying around, etc. – who knows. Might be 5 thousand, or might be 10 thousand. Probably not that many though, because there is a functioning society in Lebanon.

“I appreciate your efforts to explain, but even for we non-statisticians the concept of a confidence interval is pretty well understood.” I’m afraid that this claim is patently falsified by your own discussion on this thread! Or, if you get the general concept of it, you seem unable to apply it to the particular case at hand.

“When Kaplan talked about a “dartboard” with respect to Lancet 1, he was ridiculed by supporters of the study who rushed to explain why 8,000 was not as likely a figure as 98,000 for projected excess mortality in Iraq.” But that is exactly something that _does_ follow from understanding the CI! In a study like this, not every value in the CI is equally likely, and indeed based on the statistics the point-estimate is the most likely value. 98,000 _is_ a more likely result than 8,000; an appropriate comparison to the 8,000 would have been the other extreme end of the interval, 194,000. They got statistically significant results in Lancet I, and those results put that 98,000 figure in the middle of the CI, and there’s nothing wrong with their reporting that. And they got statistically significant results in Lancet II, and for the corresponding period of time they had a CI of 69,000 – 155,000, with the peak at 112,000. One could do some more statistics on this to see just how likely either result is given the other, but that the latter study’s CI for this value is completely contained within the former’s is, indeed, rather corroborative.

“My issue is with the “magnitude” component. The magnitude is comprised one way in Lancet 1 (>40% 0f 98,000 deaths have a non-violent cause) and another in Lancet 2 (all 112,000 deaths from violence).” This kind of line that Brownie and Mike H. are running only makes sense if that claim about the magnitudes was itself something that the statistics showed. And, as I argued in my earlier comment, it wasn’t, and the authors never claimed that it was. Their data do not support putting any particular weight on _those_ point-estimates, because the confidence intervals around them are too large — they are like my hypothetical case with the 50 Iraqis. They are, given the data available, statistically unsupported claims. They have statistical significance for the overall mortality claims; and for the claim that there was an increase in the number of deaths attributable to violence; but not for the part of the data that you want to put so much weight upon. And this is the case with both studies, in that they found nothing statistically significant with regard to the nonviolent deaths.

Basically, everything that Lancet I found with significance, Lancet II did as well, and in a way that corroborated Lancet I; and in this particular area where Lancet I found nothing of statistical significance, so too did Lancet II not find anything of statistical significance. That’s pretty darn good.

Anton writes to brownie: “You keep saying that if the two studies’ excess death estimates have very different breakdowns by cause, their agreement in overall magnitude is automatically no longer significant. Can you say why? Not in terms of intuition but with an actual statistical argument?” I would say that, if the two studies had found statistically significant claims to make about the breakdowns by cause, and those claims were substantially divergent from each other, then Brownie, Mike H. et al. would have a point — there would be some claims in the two studies that somewhat disconfirmed each other. I don’t think that it would be the end of the story, but it would be the case that the authors could not so easily claim that their results support each other. However, because the “if” part of Anton’s question is not supported by the statistics, there is not actually such a worry here. (Which is not to say that further analysis and/or more data might not generate such a worry.)

Brownie and others: Lancet 2 does corroborate Lancet 1. This is based on estimates of the *total number of deaths* in periods where the two studies are both close to each other and precisely estimated. Of course you are going to be able to find a subsample which is imprecisely estimated where the point estimates from the two studies are far apart, but the variance is so high that the difference doesn’t mean anything. Again, confidence intervals *are not optional*, you need to *actually understand them* and not just pay lip service if you want to understand sampling results. I’m sorry that someone once criticized you for taking the lowest level of the Lancet 1 confidence interval as the most likely real-world result, but that has no bearing here.

Tim at 93: your comment is inaccurate enough that one suspects political motivation. There is no estimate of infant mortality in the Lancet studies, their sample is not large enough to make conclusions about infants alone. You cannot compare subtract infant mortality rates from some other study to the overall Lancet mortality rate because they are done using completely different methods. This is especially likely to be true for infant mortality, by the way. Also, as people have said over and over again here, the Lancet study methodology is based on comparing the difference in pre and post-invasion mortality from *answers on the very same survey*, which is the right way to do it. Errors from the study methodology should cancel each other out in the differencing. Fishing around in old UN stats to cherry-pick a number that disagrees with the Lancet figures is just motivated by a desire to smear the study. It is also true (confidence intervals again), the Lancet estimate of pre-invasion mortality is easily compatible with mortality rates above 6; indeed the top of the CI is over 7.

I wish someone unbiased would actually comment on the IBC press release, instead of having to wade through ignorant right-wing propaganda all the time. But here’s a quote from the blog “Healing Iraq” that I got from Steve Sailer’s web site. It certainly gives some sense of how bloody things actually are in the country.

“I have personally witnessed dozens of people killed in my neighbourhood over the last few months (15 people in the nearby vicinity of our house alone, over 4 months), and virtually none of them were mentioned in any media report while I was there. And that was in Baghdad where there is the highest density of journalists and media agencies. Don’t you think this is a common situation all over the country?”

It’s obvious enough that John Emerson is right: the study details may be distracting us from the simple fact that things are very bad. This is a humanitarian disaster, even if the current Lancet numbers are off excess deaths are probably well into the six figures.

Second sentence should read: “This is based on estimates of the total number of deaths in periods where the two studies overlap in time, and figures are both close to each other and precisely estimated”.

114: On the specific question of composition of point estimates, the studies are indeed contradictory.

But I’m really not sure that the point estimates have a composition in the arithmetical sense; i.e. that you can take the percentages from the sample and apply those same percentages to the estimate. I think your argument would be better if it were framed around the sample data. Are there questions to be asked of Lancet 1? Maybe the Fallujah issue (in or out; either looks odd) or maybe the infant mortality issue (many of the reported non-violent deaths were neo-natal).

That these two surveys were carried out in different locations and two years apart from each other yet yielded results that were very similar to each other, is strong validation of both surveys.

Well, as I said, this may have been an incautious claim, but you really shouldn’t look to the likes of me for validation or otherwise of the statistical methods. My comment is on the language attached to the numbers; a somewhat different thing.

And you’re arguing with someone who accepts the general finding – that there have been significant excess deaths in Iraq since 2003, and that the rate may still be rising. Realistically, I don’t see what other situation would result in Iraqi households reporting deaths in those numbers.

‘Borders on the mendacious’ is pretty strong. Do you suspect the authors of the studies to be acting in bad faith? To the extent that they would fabricate the data?

mq #127: ‘There is no estimate of infant mortality in the Lancet studies, their sample is not large enough to make conclusions about infants alone. You cannot compare subtract infant mortality rates from some other study to the overall Lancet mortality rate because they are done using completely different methods. This is especially likely to be true for infant mortality, by the way.’

I hope you have read Appendix E of the study’s companion document. It’s true that the mortality figures they give there only go up to 1998 on the year axis, but nothing they say in that Appendix would lead you to think that they thought the situation had improvemed markedly by 2002.

Since the same methodology was used to estimate deaths in Darfur and the Congo, I wonder if the Lancet critics want to go through them as well. The recent Lancet article on the DR Congo claimed that there were 1250 excess deaths per day. The method used was the same spatial sampling survey technique. The estimates of mortality in Darfur also rely on similar sampling techniques.

Now, I wonder why nobody is going through those Darfur and Congo reports with a fine toothed comb. The sudden interest in statistical method in Iraq somehow hasn’t been extended to the whole system of estimating mortality that has driven the “moral interventionists” in the first place. They are whacking away at the very source of the last bit of their own shaky moral legitimacy.

David Kane, every time you raise your head in these forums it is to say that you have “asked for the data” and it wasn`t forthcoming, with the attendant insinuation of fraud or bloodymindedness. Anyone who has worked with these large survey samples knows that the organisations which collect them don`t just give up the data on a whim to anyone who comes along wanting to “replicate it”. They have serious ethics requirements which prevent them from doing that easily. You like to imply you are familiar with statistical analysis and data auditing, in which case you know this is true for almost any survey of this type, and the way you present their reluctance to share the data as evidence of suspicious dealing is quite disingenuous.

1) I haven’t commented on this thread.

2) I don’t know if the only choices are “fraud or bloodymindedness” but I get suspicious when scientists do not allow outsiders to replicate their work. Some scientists are better about this than others. If you can’t see the raw data and the peer reviewers (as far as you know) did not see the raw data, then what is your trust based on?

3) It is true that data often has confidentiality restrictions. The Lancet authors mentioned this with regard to the first paper, but then dropped all communication. They refused to reply to my (very polite) e-mails. I was willing to sign whatever restrictions they had. I did not need to see any data that would put any individual in danger. But they just left the conversation. Now, I am just another idiot fellow at Harvard, so perhaps they were correct to tell me, in essence, to STFU, but call me crazy for being suspicious of that behavior, now being repeated with the latest paper.

4) It is standard for academics to share data with one another, especially after the first paper has been published and everyone agrees to sign the appropriate restrictions. The refusal of the Lancet authors to share was, in my experience, unusual.

I hate having to play catch up in a busy thread, but sleep and everyday life make that inevitable.

To get back up to speed, I’ll have to be brief to some commenters who have directed posts to me. I’ll split it up between posts, in the interest of coherence.

Anyway, Brownie, I’d be remiss if I didn’t acknowledge you’re doing an excellent job pressing home my argument. Far more eloquently and concisely than I have, I might add.

Philosopher, MQ:

As Brownie notes in #89, us statistician neophytes have developed some understanding of the role and importance CIs play in these types of surveys. But you’re still missing my point. There is a tendency for supporters of the study (particularly the first Lancet) to freely interchange the downplaying/playing up of point estimates and CIs, whichever a particular discussion calls for at the time.

As Kevin has noted with his comment about Roberts, it isn’t non-stats types who have been guilty of this. Horton, the editor-in-chief of the Lancet, was also guilty of citing the 100,000 figure as gold (although Roberts, to his credit, later corrected him). I’ve personally encountered many knowledgable defenders of the study defending point estimates, not just on the total excess death toll, but on many subsets below the two main substes of violent and non-violent.

“It just means that your attempt to cherry-pick hopelessly imprecise numbers from subsamples and say that they are implausible is straight-up propaganda.”

MQ, I’m working exclusively with the two largest subsamples, violent and non-violent, and if these numbers are “hopelessly imprecise,” then what does that say about the total point estimate? But thanks for agreeing with me.

“So, Lancet I makes no claims about nonviolent deaths increasing; and Lancet II likewise finds nothing of statistical interest to report about nonviolent deaths. So in this regard they are totally consonant with each other.”

Let me see if I have this right, Philosopher. Lancet 1 ” makes no claims about non-violent deaths increasing,” but adds 40,000 excess non-violent deaths to its excess death point estimate, without which the study could not achieve its headline grabbing 100,000 figure. These 40,000 excess non-violent deaths are only statisically relevant because they can piggy-back onto the excess violent ones, even though the 40,000, by themselves, represent a 22% increase over the baseline mortality estimate for the 18 month period.

From a non-statistician’s perspective, that’s a pretty neat trick, even if it is ” statistically sound.”

The other point I’d be interested seeing someone tackle is the assumption that we can have several hundreds of thousands of Iraqis die violently in slightly more than 3 years, and yet see Statistically relevant rise in death from non-violent causes.

The Confidence Interval guys might want to sit this one out. The lower bound of the CI still comes in at 425,000 plus violent deaths.

“This is wrong. If, in sample A, mortality has increased and the main cause is traffic accidents, while in sample B mortality has increased and the main cause is heart disease, the conclusion that mortality has increased is indeed reinforced. As to the main cause in the population as a whole, “further research is required” – a conclusion which appears in all too many studies.”

That still leaves you with unreliable estimates for heart disease and traffic accidents, because the studies conflict. Would you be able to cite with any confidence that heart disease had truly gone up, and traffic accident deaths had declined?

In any event, violent death is unique. We don’t cluster sample it to obtain numbers in modern western countries, so far as I know. We count it. As I mentioned at Deltoid, I can see an Iraqi interviewee mistakenly telling a Lancet surveyor that a family member died from heart attack, when they really had died in the final stages of cancer. That has no effect on the violent/non-violent distibution of a study. Both are non-violent causes of death.

But I don’t see how an interviewee would mistakenly believe a family member had drowned, when they were found in the street with a bullet in their head.

In my view, the distinction between non-violent and violent deaths is too fundamental to allow statistical methodology to excuse huge variances from study to study. I’m running out of ways to express that, and I apologize for the repetitiveness of the premise.

There’s an awful lot of “you simply don’t understand” hogwash being produced here by people yet to display their own statistical credentials. And philosopher/mq, it would help if you actually read my contributions: the reference to Kaplan’s ridicule was in response to philosopher’s earlier comment about placing too much emphasis on the point estimate and not considering the ci as a whole. I was contrasting this alternative take on the figures with the rigid focus on 98,000 in the months following Lancet 1. I was emphatically *not* endorsing Kaplan’s view and I really don’t need lectures on what it means to be at the extreme of the ci rather than the mid-point. Seriously.

I’m not a professional statistician and have never claimed to be, but I have my Maths A level, I trained as a Certified Accountant for 5 years and have worked in the analytics industry for another 7, so whilst I may not be an expert on sample theory, neither am I numerically illiterate. You’re going to have to do better than “don’t worry your pretty little head about it, Brownie” if you want to avoid looking like you’re stuck for an answer.

It seems you would like to decouple the composite figures from the consolidated excess mortality rates they produce. I keep hearing that there is no statistical significance for the violent/non-violent figures that comprise the totals, so here’s where you geniuses may be able to help me out. The figures of 98,000 and 112,000 were not calculated independently of all other data and then found to comprise a combination of violent and non-violent deaths. The entire confidence interval is a function of the violent and non-violent combinations: the ci is a sum of its parts and its parts are estimated rates of excess violent and non-violent deaths. If two studies produce two sets of contradictory constituent data, the fact that when you add two numbers together in both cases you get totals that are similar, is far from significant; it is statistically incidental.

I’m asked to reconcile a bank account showing a +100 balance at the start of the accounting period and +200 at the end. I find 200 pounds of deposits and 100 pounds of expenditure. My colleague double-checks my figures and finds 900 pounds of deposits and 800 pounds of expenditure. The fact we both reconcile the closing balance would not get us a pass from the senior partner, given there’s obviously something seriously amiss with the calculations and/or records.

Philosopher, you say:

Their data do not support putting any particular weight on those point-estimates, because the confidence intervals around them are too large

Yet this is exactly what the Lancet authors are doing when they base the claim for mutual corroboration not on overlapping cis, but on the 98,000/112,000 point estimate comparison *specifically*. CharlieW hinted earlier that there may be too much focus on the point estimates and you would seem to agree, but then I didn’t start it; the authors did.

You go on to say:

They have statistical significance for the overall mortality claims; and for the claim that there was an increase in the number of deaths attributable to violence; but not for the part of the data that you want to put so much weight upon.

Again, they put weight on the point estimates using these to claim cross-validation of studies. And if we focus on violent attribution alone, we find 57,000 deaths in Lancet 1 and at least 112,000 in Lancet 2.

Finally, on the “mendacious” thing. Do I think JHU fabricated data? Emphatically “no”. Are the authors misusing the data when they claim mutual corroboration using “similar” point estimates produced by Lancet 1 and Lancet 2? Yes, undeniably.

We’re now up to post 142, and I still can’t get any of the critics of the Lancet studies to address the fact that we’ve dropped at least 500,000 tons of explosives on Iraq, given that this alone is very likely to have created somewhere in the vicinity of 100,000 or more deaths.

That ratio works out to 5 tons per person. Anyone claiming that we have killed less then 100,000 people (that is, people killed directly by us, not by the “insurgency” or by “sectarian violence”) is claiming that our bombs are sufficiently inefficient that more than 5 tons of them are required to kill one person. This seems implausible, by perhaps one or two orders of magnitude.

And keep in mind the 500,000 tons are for one nine-month period, and for just one region of the country. The total for the whole war is undoubtedly much higher.

I’ve asked for comment on this more than once, and the studious avoidance of same has become the elephant in the thread, so to speak.

We’re now up to post 142, and I still can’t get any of the critics of the Lancet studies to address the fact that we’ve dropped at least 500,000 tons of explosives on Iraq, given that this alone is very likely to have created somewhere in the vicinity of 100,000 or more deaths.

Sounds like one of those “gut feel” conclusions we should be studiously avoiding.

Lopahkin: are you deliberately trying to miss my point? The stuff in Appendix E of the study is drawn from completely different sources and is not necessarily comparable with the survey results. You can’t add and subtract numbers collected from different sources that way.

Mike: Jesus, people keep pointing out basic stats to you and you keep replying with this weird rhetorical stuff about some subsample being important or fundamental or whatever. “Important” or “fundamental” are nothing but handwaving and speechifying from you. What is on the table here is actual, real, sample data. If you want to say something, say something about what conclusions can be drawn from the actual sample, or demonstrate why you think the sample is biased. What you are saying about the trend in non-violent deaths verges on being meaningless. The number of cases of non-violent death reported by respondents is small enough that the range of possible values for the difference is very large, especially over time small periods of a few months. The number of cases of total deaths is high enough that statistical theory allows us to bound the difference well away from zero. This is very elementary indeed. Yes, it is true because we are extrapolating from a small number of reported cases to the population of Iraq. But many good U.S. election surveys extrapolate from a smaller number of reported cases to a larger overall population. Either you believe in statistical methods or you don’t. I’m starting to get the feeling you really don’t, in the fashion that people don’t believe in what they can’t understand. The “confidence intervals” *are* the result of a random sample.

As for violent death: it is not unique, it’s like any other event in a population. The reason we don’t cluster sample it here is that we have good administrative mechanisms for tracking and reporting it. They don’t in Iraq. This is well covered in the study.

It looks to me like the UNDP results are compatible with estimates of around half of the Lancet ones. Which is still massive carnage. Still don’t understand the source of the difference. The Lancet people say it’s because their survey focused more on deaths, but I’m not sure why that alone should make such a difference. Deaths in the household aren’t hard to recall.

What’s at stake here? Take the lower bound, cut it in half, that’s still horrible enough. These quantitative comparisons are pretty much irrelevant to any of the big questions actually on the table for us. We’re not coffin salesmen or desigers of Islamic cemeteries around here.

We’re now up to post 142, and I still can’t get any of the critics of the Lancet studies to address the fact that we’ve dropped at least 500,000 tons of explosives on Iraq, given that this alone is very likely to have created somewhere in the vicinity of 100,000 or more deaths.

I see plenty of ” takers ” for your offer, anon. Much of this thread involves a discussion of how much ordnance has been dropped, and what effect it has had. Both sides seem well represented to me.

You’re being obtuse. For the last time, I am not ” …. replying with weird rhetorical stuff about some subsample being important or fundamental or whatever.”

I am dealing solely with the two major subsets, violent and non-violent deaths. These are not just ” some subsample(s).” You know that, which means that ” handwaving and speechifying ” best describes your disingenous depiction of my argument.

“What is on the table here is actual, real, sample data.”

Wow, that was profound. I had no idea. Tell me MQ, where do you thnk the authors derived their point estimates from, the estimates I’m taking issue with?

“The number of cases of total deaths is high enough that statistical theory allows us to bound the difference well away from zero.”

My God, what are you on about? When have I or anyone else here suggested the overall mortality rate hasn’t risen post-invasion?

I’m not going to waste any more time responding to the rest of your post, MQ. There are some very intelligent, knowledgable folks taking issue with my argument. You’re trying real hard to pass yourself off as one them. You aren’t.

But that is exactly something that does follow from understanding the CI! In a study like this, not every value in the CI is equally likely, and indeed based on the statistics the point-estimate is the most likely value. 98,000 is a more likely result than 8,000;

It seems you would like to decouple the composite figures from the consolidated excess mortality rates they produce. I keep hearing that there is no statistical significance for the violent/non-violent figures that comprise the totals, so here’s where you geniuses may be able to help me out. The figures of 98,000 and 112,000 were not calculated independently of all other data and then found to comprise a combination of violent and non-violent deaths. The entire confidence interval is a function of the violent and non-violent combinations: the ci is a sum of its parts and its parts are estimated rates of excess violent and non-violent deaths. If two studies produce two sets of contradictory constituent data, the fact that when you add two numbers together in both cases you get totals that are similar, is far from significant; it is statistically incidental.

I have a feeling that the Crooked Timberites (I’m not one) are going to get this framed.

Anyway, you’ve given us pies, custard, sandwiches and bank accounts, but here’s the analogy I have in mind. You have a jar with a number of roughly nut-sized things in it. That’s your population. You reach in without looking and pull out a handful. That’s your sample. On counting, most of the sample turns out to be pistachios. The rest are peanut M&Ms. So now you can make a statement about the likely proportion of peanut M&Ms v. pistachios in the jar.

But looking closer at the M&Ms, you find that three of them are blue. You can now make a statement about the likely proportion of blue M&Ms v. pistachios in the jar.

My suggestion here is that your estimate of blue M&Ms is going to be a fair bit less confident than your estimate of M&Ms of all colours.

And consider your first estimate. Was it ‘made up’ by adding together a series of separate estimates of M&Ms of various colours? No.

Anon in 142: I can’t attest to your 500KT figure — love to see the source — but some historical comparisons would be the First Gulf War, where 85KT were dropped, Kosovo (17) or Afghanistan (9). In fact, a meatball estimate based on my own knowledge of the equipment involved, sortie rates, etc. would put only about 20KT onto Iraq in the first six weeks of war, and something under 10KT since, even if I use John Quiggin’s estimate of 5 attack sorties per day, which I honestly think is a little inflated and unsupported by his citation. But I’m confident you’re off there by at least a factor of 10, regardless.

That’s still a lot of bombs, though: in Afghanistan the civilian fatality rate per KT was 140-150, which was something of a historic high. The same fatality rates in Iraq would put Iraqi civilian deaths due to airstrike alone in the vicinity of 5,000. Add in 20,000 or so insurgents also killed and it’s a significant number. It is not, however, anywhere near the number given in this Lancet study.

PS: Anyone who wants to challenge any of the above, please avoid citing the famous 3MAW press release during Fallujah, which was a pretty clear instance of mixing up tons and pounds. Cheers.

Seriously, thank you. I understand where you are coming from. Stick with me on this. Believe it or not, I am prepared to be convinced.

The day after Mike H’s first post on this issue some contributors chipped in with the sub-sample thing. I understand the statistical principle, but as Mike H and I have pointed out, we are dealing with a variable in Hopkins that has only two possible values: “violent” or “non-violent”.

My suggestion here is that your estimate of blue M&Ms is going to be a fair bit less confident than your estimate of M&Ms of all colours.

Yes it is. Because the MMs in the jar will be a mixture of blue, red, green, brown, yellow, black, orange, etc., etc.. The second we know there is only one other colour of MM in jar besides blue, our estimate of blue MMs gains confidence. Yes, it will never be as confident as an estimate of number of MMs versus pistachios, but the confidence with which we can estimate a number of MMs of a particular colour is a function of the number of colours of MMs in the jar. Isn’t it?

This is the point with the violent/non-violent attribution. It can only be one or the other. Further up the thread I referred to a fictional 30-40-year old Baghdad shoemaker demographic as being responsible for the upsurge in violent death. Yes, I was being a facetious prick, but I was attempting to make it clear that I understand that we lose confidence with every drop into sub-sampling.

Where am I going wrong with your analogy when I say the jar of nut-shaped objects is our collection of deaths and for pistachios read violent and for MMs read non-violent? Can you explain why the differences in violent/non-violent attribution from study to study can be dismissed given this is the highest level of sub-sample for the dead?

Bruce R, as stated above, I think the use in Iraq is likely to be more lethal than the previous cases, many of which were high-altitude strategic bombing campaigns. On your calculations, to get into the Lancet range the Iraq use of tactical bombing in largely urban guerilla warfare needs to be twice as lethal as for Afghanistan, which doesn’t seem implausible to me.

Can you explain why the differences in violent/non-violent attribution from study to study can be dismissed given this is the highest level of sub-sample for the dead?

They can’t be dismissed, but they should be looked at more cautiously than the total. There are two reasons: first, you’re asking people to remember the color of the M&M’s even though they picked the handful a while ago in the past. There may be error both in the number of M&M’s, and also in their composition. However, in the Iraq case, while you can ask for death certificates to check on the total number of deaths there may not have been a corresponding check on the cause of death. (BTW, the sample of roughly 1800 households experienced around 600+ deaths during the period from Jan 2002 to July 2006 so at least 1200 households experienced no death at all during the period) Second, the number of blue M&M’s and the number of not-blue M&M’s are not independent — basically, you only have one degree of freedom in determining blues and not-blues once you get the total. Both of these mean that the estimates of cause of death are probably less robust than the estimate of total all-cause deaths.

I think it’s the size of any given subsample relative to the population that’s key, not the number of other subsamples, nor its conceptual ‘level’. The smaller the sample, the harder it is to estimate with confidence, and below a certain size you can’t say anything meaningful. But analogy will only take you so far: at some point the maths kicks in, and I don’t have any experience with that.

I can see how the Lancet 1 sample could be troubling – conceptually – because, when Fallujah is excluded, the number of deaths attributed by respondents to violence is small: 21. But I think you’d need to work some maths to get insight into the significance of any estimate made on the basis of this number.

Soru said, “This graph was used to correct me a few posts back:http://anonymous.coward.free.fr/misc/roberts-iraq-bootstrap.png
Unless I’m missing something, looks like it can be used to correct you too….” Um, maybe you can explain what you had in mind here, but it sure looks to me like 8,000 is a lot less likely than 98,000 on both versions…?

Brownie writes, quoting me:

“Their data do not support putting any particular weight on those point-estimates, because the confidence intervals around them are too large.”

Yet this is exactly what the Lancet authors are doing when they base the claim for mutual corroboration not on overlapping cis, but on the 98,000/112,000 point estimate comparison specifically. CharlieW hinted earlier that there may be too much focus on the point estimates and you would seem to agree, but then I didn’t start it; the authors did.

Since the CIs are reported elsewhere in both studies, there’s just nothing wrong with this. It’s not like every time you want to talk about the point-estimate you have to explicitly remind the reader what the CIs, p-values, etc. were. What would be wrong would be putting weight on the point-estimates that the CIs can’t substantiate, or especially putting weight on the point-estimates where there’s no statistical significance in them at all (which is unfortunately what you seem to keep trying to do with the nonviolent deaths).

“They have statistical significance for the overall mortality claims; and for the claim that there was an increase in the number of deaths attributable to violence; but not for the part of the data that you want to put so much weight upon.”

Again, they put weight on the point estimates using these to claim cross-validation of studies. And if we focus on violent attribution alone, we find 57,000 deaths in Lancet 1 and at least 112,000 in Lancet 2.

As I said, both studies found a statistically significant difference in the rate of violent deaths; neither found a statistically significant difference in the rate of nonviolent deaths. So it’s fine for them to make (appropriately limited) appeal to the violent death numbers, but it would’ve been a mistake for them to make a similar appeal to the nonviolent death numbers.

Brownie writes at 140:
“The entire confidence interval is a function of the violent and non-violent combinations: the ci is a sum of its parts and its parts are estimated rates of excess violent and non-violent deaths.”
Charlie is definitely going in the right direction, in explaining how this claim of brownie’s is off the mark — that even if samples are composed in a certain way, the statistics are not necessarily themselves composed in anything like the same way. I’ll take a stab at following up, in response to brownie’s 152. Brownie writes there:
“Yes, it will never be as confident as an estimate of number of MMs versus pistachios, but the confidence with which we can estimate a number of MMs of a particular colour is a function of the number of colours of MMs in the jar.” Your basic point here is correct, but doesn’t get you as much as you want. It’s correct, in that it will be easier to get statistical significane for either of violent or non-violent than it would be to get for, say, a much more finely broken-down set of categories. But what it doesn’t mean is that you get significance for either category for free, just because you’ve got significance for all deaths.

Suppose we are comparing the m&m/nut ratio of two different jars, and how they differ from some baseline. Even if you know that there are just blue and brown m&ms, there may not have been enough of an difference in either color to warrant a claim of significance by color, even if there was enough of a difference in the overall m&m count to warrant a claim of significance by snack type. Or, if we want to make the analogy closer to the Lancet studies, there was a clear m&m effect, and a clear brown m&m effect, but the difference between the number of blue m&ms in the two jars was not enough to license a claim of a blue m&m effect. If there were a whole heapload more brown m&ms in one jar than the other, it might be very easy for there to be both a significant m&m effect and brown m&m effect without there being any discernible difference in the number of blue m&ms between the two jars, or between either jar and the baseline.

(Oops, I see that I flipped whether m&ms or nuts were corresponding to violent or nonviolent. Sorry about that.)

Okay, Mike, I get it now. I looked at your points upthread, as well as Brownie’s at 140. Your problem is apparently not with the statistically insignificant drop in non-violent deaths right after the invasion found in Lancet II (this criticism would make you pretty much completely innumerate, and is what I was reacting to). Instead, you have an issue with the difference in the fraction of first year post-invasion deaths that were non-violent in Lancet I vs. the fraction that were non-violent in Lancet II. Your beef seems to be that although the overall death rates in Lancet I vs. Lancet II are similar for the relevant period, the two component death rates (violent and non-violent) that make up the total have switched around somewhat. Is that it?

You shouldn’t be surprised that that criticism has opened you up to charges of fishing for something to discredit the study. Too take an analogy: I take two separate polls of 10 people about support for President Bush. In the first poll, I get 2 men and 1 woman saying they support him, with the rest not supporting him. In the second poll, I get 2 women and 1 man supporting him, with the rest not. Do the results from the two polls corroborate each other? Yes. Do the results from the two polls on the percent of his supporters who are men corroborate each other? No, one would need a larger survey to get more accuracy here. The results are inconclusive on the percentage of his supporters who are men. Likewise, the results here are inconclusive on the percent of first year post invasion deaths that were violent vs. non-violent.

Should the fact that a subsample result is significantly different between two polls on the same topic, while the overall result is the same, make us suspicious? This actually is a pretty complex question. The answer is: not necessarily. Just randomly, one would expect some of the subsample results to differ at a statistically significant level. But one wouldn’t expect all of them to differ. (At a 10% significance level, you’d expect one out of 10 measurements from different samples taken from the same population to differ). If every possible subsample that could be taken differed, but there was the same overall result then that would be suspicious. One needs a way of conceptualizing the “population” of subsamples that make up the whole to formalize this. Saying that some subsample you’ve found a problem with feels major or important really is just rhetoric. My expertise is more in regression analysis than sampling theory, but I suspect someone somewhere has done this. I gotta run so I’m not going to check out JASA at the moment, but I wonder if anyone else knows who it is.

(Obviously there is extra complexity because the subsample measurement is not independent of the overall measurement — but it is not perfectly correlated, and thus each subsample is to a certain degree an independent measure, but with a higher variance).

“here are two reasons: first, you’re asking people to remember the color of the M&M’s even though they picked the handful a while ago in the past.”

Robert, I don’t believe a single interviewee would forget whether their family member died violently, or from natural causes. I leave open the possibility that some may have lied, as D Squared hypothesizes.

“I think it’s the size of any given subsample relative to the population that’s key, not the number of other subsamples, nor its conceptual ‘level’. The smaller the sample, the harder it is to estimate with confidence, and below a certain size you can’t say anything meaningful.”

I agree Charlie, and to a large extent, that’s what Brownie and I have been arguing. There are no bigger subsets than the two Brownie and I are subjecting to scrutiny. Every death fits into one or the other first, before it trickles down any further. The issue of accuracy among the smallest death subsets was the source of much bickering back in 2004, especially over the attribution of deaths for coalition air strikes. There was some eventual quasi-consensus at Deltoid that the more specific death subsets were too small to reliably extrapolate.

#144, brownie: “Sounds like one of those ‘gut feel’ conclusions we should be studiously avoiding.”

I claimed that 500,000 tons of explosives are very likely to have killed at least 100,000 people.

If you had lifted a finger to follows various things I’ve been pointing to, you would understand that this claim is based on more than “gut feel.” What follows is stuff that I’ve already pointed to.

2.7 million tons of bombs were dropped on Germany in WWII. Civilian deaths in Germany were 3.6 million (I have seen lower figures for this, but only from sources I consider less authoritative).

Here’s that ratio in simpler terms: one dead civilian is correlated with .75 tons of bombs dropped.

Now, obviously many things killed civilians in Germany besides Allied bombs. But we are also killing Iraqis in a variety of ways, aside from dropping bombs on them. And this is aside from other sources of violence that are killing Iraqis. The bottom-line is that I think the ratio is meaningful, especially when we find that a similiar ratio holds when we do the same analysis regarding Vietnam (details available via various links I’ve already cited).

Here’s what we get when we apply that ratio to Iraq, regarding the 500,000 tons that we know about: 375,000 dead civilians. Obviously that’s an eyepopping number. But there’s reason to believe the number should be even higher.

As brucer pointed out (#112), modern weapons are more precise, “which makes every strike more deadly.” This suggests that it no longer takes .75 tons to kill one person. We now have an enhanced ability to make sure the first shot flattens the building where the people are hiding, instead of landing in the empty lot across the street.

The other staggering fact that needs to be taken into account is that the 500,000 tons reported is just for one nine-month period, and just for one region. The total for the entire war, and all of Iraq, is undoubtedly much higher.

500,000 tons in nine months is 2,000 tons per day. We’ve been there 42 months. 2,000 tons pers day over 42 months equals 2.5 million tons. This is almost exactly the total amount dropped on Germany. If we are achieving the same kill-rate achieved in Germany, we should have 1.9 million dead civilians. As I said, there is reason to think our modern kill-rate is greatly enhanced.

I’m not suggesting we’ve killed 2 million people. I’m suggesting it’s worth taking a close look at these numbers. I’m also suggesting that when one does so, it’s hard to imagine that our bombs have killed less than 100,000 people. A number less than that implies that since WWII, we have enhanced by a factor of 20 our bombs’ ability to recognize and spare civilians. But as I’ve said, I think our bombs have become much better at locating people to kill, not worse.

Not to be a bore (well, too late), but one thing I’m not getting–is it a consensus among statistics experts around here that the conflict with the ICLS survey is or is not a problem. Robert in another thread thought maybe there was a problem with the ICLS survey. Anyone else have an opinion? (I’ve given mine, or rather, I’ve explained upthread somewhere why I think there is a conflict. But I’m not an expert.)

Charlie/mq/philosopher/roger/etc., regardless of where we’ve actually got to, I want to thank you for going the extra mile explaining some of the principles to me. I won’t pretend I haven’t learned anything: which is a convoluted way of saying I’ve benefited from all your contributions.

I still don’t understand why figures at the violent/non-violent level need to be treated with such caution given the binary nature of cause of death, but I do at least understand the principle of inherently less confidence when moving down the levels of classification.

Philosopher, you say:

It’s correct, in that it will be easier to get statistical significane for either of violent or non-violent than it would be to get for, say, a much more finely broken-down set of categories. But what it doesn’t mean is that you get significance for either category for free, just because you’ve got significance for all deaths.

Okay, so what determines whether the estimates for violent/non-violent deaths achieve statistical significance? More specifically, why are we precluded from drawing a conclusion about a jump in violent death attribution from 56% to 100% from Lancet 1 to Lancet 2? What is the statistical explanation which means we don’t need to worry about what appears to a layman to be a numerical anomaly in violent death attribution from study to study? This is where I’m currently stuck.

I’m more puzzled by comments that the violence isn’t any worse than any American city. Really? In which American city do 60 bullet-riddled bodies turn up on a given day? In which city do the headless bodies of ordinary citizens turn up every single day? In which city would it not be news if neighborhood school children were blown up? In which neighborhood would you look the other way if gunmen came into restaurants and shot dead the customers?

This doesn’t prove anything about statistics, obviously. but people are at work discrediting this too. The “skeptics” are doubting the “negative reports” in the newspapers, and the actual reporters are telling us that what’s printed in the newspaper (after the editors have finished with it) is much milder than the reality.

Brownie, statistical significance of the subgroups is calculated the same way as for the main groups, i.e. the count of events in each subgroup is extrapolated to the whole population from the sample weights. Because the events are drawn from a poisson distribution, the count of events is the sole determinant of their variability in the sample (with additional effects from the sample weights when we extrapolate). Smaller numbers = wider confidence intervals.

David Kane, you`re doing the same thing again. Anyone who has tried to get access to survey sample data from major organisations doing these things knows that it is standard practice for them to refuse to let you have it until they have at least had time to deidentify it, and even then they may have ethical requirements which only allow them to share the data with collaborators. The latter situation is very common. You claim to audit data, you must know this and your continual implications that you should have received it straight away upon request are disingenuous. Also, while researchers may share data with one another all the time, this is not the case in public health research at all. It is dishonest to suggest so. Generally epidemiologists have to sign extremely rigorous ethics approvals which prevent them from disclosing their data to just anyone who pops up and asks for it. You may think this is “unusual” in your experience, in which case I assume your experience doesn`t include the analysis of survey samples of this type.

I presume you know how to do the cluster analysis they describe, but why should we assume that you can do a better job than they did? Why should we even assume you need to? If you don`t trust their statistical analysis why should we or anyone else trust yours? And if your concern is the data itself, you again are insinuating fraud (and how in any case could you tell if this data was entirely made up?)

Finally, have you, as someone else asked on this thread, attempted to audit the Sudan death rate estimates? Do you routinely audit US National Household Surveys? Did you audit the ICLS survey? Why not? Because you don`t like the number this one gives you? As I said, if you can`t fault the actual method, don`t pretend there is anything wrong with it just because the number seems a bit off to you. Especially when a lot of people have presented reasons for why the number might be right, in an environment (Iraq) where none of us has the ability to tell what is going on.

Robert, I don’t believe a single interviewee would forget whether their family member died violently, or from natural causes.

Perhaps, but that’s a belief. My point was merely that there are sources of error both in the totals and also in the causes, and that the death certificate check helps you with the former but may not with the latter.

regarding the actual original topic of this post, I read in the Daily Yomiuri today (hard copy, don`t know if it is online) that the number of journalists embedded with US soldiers has dropped from 600 at the time of the war to under 2 dozen today. Is it any wonder that we don`t have a clear picture of what is hapepening in the air war?

This article also observed that all the major news agencies have significantly reduced their size in Iraq, and it is very difficult for new journalists to even get into Iraq (one doesn`t just buy a plane ticket…). This reduction occurs at a time when Iraq Body Count is receiving more casualty reports than ever before. If casualty reports are going up while journalist numbers are going down, this can only mean that the rate of killings in Iraq is escalating rapily, right? Unless we suppose that the journalists have greater freedom to move around Iraq now than 2 years ago – an unlikely supposition given they need armoured cars and an armed escort just to get to their hotel from the airport.

Yet more evidence that we don`t know what is happening in Iraq and our `gut` responses just don`t have any value.

“This reduction occurs at a time when Iraq Body Count is receiving more casualty reports than ever before. If casualty reports are going up while journalist numbers are going down, this can only mean that the rate of killings in Iraq is escalating rapily, right?”

I don’t disagree that the rate of killings has been escalating, SG, but I’m not sure your reasoning is proof of that.

I read (and save) the daily summaries of violence coming out of Iraq from AP, Reuters, and AFP. I haven’t missed very many days since I started about a year and a half ago. From my reading of these reports, I get the impression that much of the violence, particularly outside Baghdad, isn’t coming to us via AP, Reuters, and AFP directly by way of field reports from their journalists. In many case, they’re simply regurgitating what they’re receiving from Iraqi government and US military press briefings.

That’s what makes me skeptical of the claim that Lancet 2 looks right because so much of the isolated, small stuff around the country isn’t making it into the daily media accounts, and therefore not into passive counts like IBC.

Well that just isn’t true, not from what I’ve seen. I want to qualify this, in that I don’t know if all the violence resulting in deaths in places of lesser prominence is getting reported. I think it’s safe to say that it isn’t. That doesn’t change the fact that I regularly see in these daily summaries, accounts of attacks where single victims, or two’s and three’s of victims are reported.

There have been many days when the wire services’ round-up for the day amounts to less than 50 deaths, and they invariably describe the “where,” the “how,” and the “how many” for all the attacks involved. On days like that, the amount of violent death going unreported must be staggering, if Lancet 2’s monthly average of 15,000 violent deaths is accurate.

David Kane, you`re doing the same thing again. Anyone who has tried to get access to survey sample data from major organisations doing these things knows that it is standard practice for them to refuse to let you have it until they have at least had time to deidentify it, and even then they may have ethical requirements which only allow them to share the data with collaborators.

Yes. Of course. Roberts could have said to me (and others):

“Just give me a couple months so we can remove identifying information and you can see the data.”

or

“You need to sign this non-disclosure agreement (and have it signed by a Harvard attorney).”

or

“There is an IRB document that prevents us from sharing any data with anyone and here is a copy of that document.”

But he (and his co-authors) said none of those things. They hedged a couple times and then stopped replying to my e-mails. Others tried to get data from them and were similarly unsuccessful. Draw your own conclusions.

Also, while researchers may share data with one another all the time, this is not the case in public health research at all. It is dishonest to suggest so.

My claims are about academic in general and not public health in particular. If your description of the standards in public health research is correct (I have no reason to doubt it but I thought that places like the CDC were pretty good about this stuff), than that is another reason to take the results less seriously. I do not think you would get away with such amateurish behavior in, say, JASA.

“Okay, so what determines whether the estimates for violent/non-violent deaths achieve statistical significance?”
I think sg answered this right — it’s all about how much did find from how many people. (How much from how many is enough? That is not something that can be answered without the math.) Neither study happened to find enough of an increase in nonviolent deaths, given the size of the sample, to justify any claim of a statistically significant difference in nonviolent deaths pre- and post-invasion.

“More specifically, why are we precluded from drawing a conclusion about a jump in violent death attribution from 56% to 100% from Lancet 1 to Lancet 2?” That “56%” is not a number that is based on anything that was found to have statistical significance in Lancet I. mq’s example at 157 is a good illustration of this; it is just very unlikely that one could claim, on the basis of that sample, that the two polls contradicted each other, in that one showed 66% of the president’s support coming from men and the other 33% from men.

Anon in #160, you really need to back up your 500KT with a cite. As I said, it’s certainly off by a factor of at least ten.

Aggregating the Kosovo and Afghanistan conflicts, the average CAS sortie dropped 1.3T of ordnance on its target. (26KT, 19,500 sorties). If you’re talking John’s 5 sorties average a day, that’s 6.5T dropped on Iraq per day during the insurgency period. Three years of conflict = 2.3KT. Add the 20KT dropped in the first six weeks of the war. I doubled it above to be generous and I still couldn’t reach 50KT.

As I also said above, the Afghan conflict was recognized at the time for having an abnormally high number of *civilian* casualties per KT dropped, at least compared to other recent wars: 140 to 150. Your estimate of 1,300 civilian fatalities per KT for WW2 and even higher numbers for the current war would not be supported by any serious scholar I’ve read. The generally accepted figures for that war are 3,000KT of ordnance dropped between USAAF and RAF, resulting in 600,000 civilian fatalities, or approximately 200 civilian deaths per KT (see Neillands, et al).

In my scratch analysis above, I was assuming 700-800 *aggregated* deaths per KT x 30-35 KT to get an upper bound of 25,000 *aggregated* fatalities from air strikes in Iraq to date. I believe the multiple-two error in your destructiveness estimate, multiplied by the order of magnitude error in kilotonnage, compounded by your math error in para 7 of your last post, is contributing to some unsustainable conclusions on your part.

I think that David is right about the practices in the social sciences in general, though what sg says about public health is plausible enough. But I would not be surprised if, given all the attention that the study has received — much of it hostile, and much of it from people with agendas — the authors just don’t want to mess with what would happen if they started sharing their data with strangers just now. I certainly can’t blame them for wanting to keep a few things close to their chests for the time being. So, wait ’til the hullaballoo calms down, and see if they are more forthcoming then.

I find it interesting that I cannot find this document on any official US site. According to google cache, it was once hosted at usmc.mil, and also at mcnews.info (both official sites).

It’s odd that this story can no longer be found at usmc.mil, since it’s not hard to find other stories there written by the same person, at around the same time (example, example, example).

Even though the USAF issues daily reports (example), as far as I can tell none of them ever mention tonnage. As far as I can tell, the press release I’ve cited is the only example of the US ever indicating this sort of revealing tonnage information (regarding Iraq). I suspect the release of this information might have been an accident. I think it’s also suspicious that the press release is now hosted only at non-official sites.

“please avoid citing the famous 3MAW press release during Fallujah, which was a pretty clear instance of mixing up tons and pounds”

Of course that’s exactly the press release I’m citing. But it would be great if you could help me with a couple of things. I wonder why you call it “famous,” since I have a really hard time finding anyone who noticed it besides Hersh. It would also be great if you could find an example of someone beside yourself making the claim that the passage “500,000 tons” was meant to say “500,000 pounds.” I can’t find an example of anyone making that claim.

In a two-week span, multinational force aircraft, comprised mostly of jets and helicopters from 3rd Marine Aircraft Wing, dropped or launched more than 500 precision-guided munitions against terrorist targets in the city.

This is a reference to Fallujah II (aka “Operation Al Fajr,” 11/8/04-11/22/04). “Precision-guided munitions” is probably a reference to JDAMs, which basically come in three sizes: 500, 1000 and 2000 pounds. For the moment let’s assume that an average “precision-guided munition” is 1000 pounds. That means that during Fallujah II, 3MAW dropped 250,000 pounds in a 2 week period.

If you are correct, then in the prior nine months 3MAW dropped only twice that, even though “the wing saw major combat action over Fallujah in April 2004, in An Najaf in August 2004.”

Here’s another way to look at it. If 3MAW dropped only 500,000 pounds in nine months, that works out to only 10 pounds per aircraft per day. (3MAW has 200 aircraft, supported by 13,000 people.) That sounds low to me (although I realize that not all 200 aircraft drop bombs, or are constantly deployed).

I realize that my analysis calls for 10 tons per aircraft per day. I realize that sounds high, but I think it might be plausible. An F/A-18 has a total payload of 8.5 tons.

“some historical comparisons would be the First Gulf War, where 85KT were dropped”

I’ve seen that figure, but I’ve also seen lower figures. A useful table is here, showing tonnage for WWII, Korea, Vietnam and GW1. If you calculate monthly rates, it’s interesting to notice that 3 of the 4 show remarkably similar rates, of 40-48 Ktons/month. Korea is an outlier, with only 12 Ktons/month. If the current war has the same relatively low rate as Korea, that would work out to 500,000 tons (12 Ktons per month times 42 months) for the entire war so far. Of course that’s the same figure in the press release I cited.

Your estimate (30KT for the whole war so far) works out to .7 Ktons/month, which is only about 6% of the rate achieved in Korea, which is itself by far the lowest rate of the wars listed in the table I cited. Therefore that seems low to me.

By the way, I realize that more accurate weapons means we have less need to drop a lot of them (as measured by weight). Maybe that’s what I’m missing.

Anyway, I realize there is good reason to suspect that 500,000 tons sounds too high, but I also think there’s good reason to suspect that 500,000 pounds sounds too low. So I’m interested in any further assistance with the puzzle.

PS: I now notice your #173. Thanks for this additional information, which I will think about.

“your math error in para 7 of your last post”

All I did in para 7 of #160 is claim that .75*500=375. I don’t see the math error.

OK, I have a few comments on the 2006 Lancet study. I have a stats background, and I’ve made a living conducting market and social research surveys for more than 25 years.

1) I am sympathetic to the arguments (Mike H and Brownie) about the discrepancy in the ratios of violent/non-violent deaths in the period where Lancet 1 and 2 overlap.

I also understand the counter-argument that the CI of the estimates in each survey overlap, so you can’t claim that the difference in proportions is significant. But I haven’t seen anyone show this (I’m not saying it hasn’t been posted, I just haven’t seen it).

Brownie says that the estimates of violent death (for the overlap) are:

Lancet 1 – 57,000
Lancet 2 – 112,000

So what are the confidence intervals for these estimates (I assume that these have been calculated by the critics)?

Assuming that the CI intervals do overlap (I would not be surprised by this), this is more an indictment of the absurd imprecision of the estimates than anything else. It is also a perversion of the 95% confidence standard, when it’s used to prop a claim of cross-study validation by ignoring wild fluctuations in the attribution of deaths that compose the estimate.

If I saw these results in a commercial environment, I would be deeply concerned about the reliability of the methodology.

2) I’m very worried about aspects of the methodology. Cluster sampling is fine in a situation like this. However the number of interviews per cluster (40) is quite large, and the fact that the dwellings are all adjacent is unusual. For those that like to compare the survey to political polling in the West, in the latter there are usually a much larger number of clusters and fewer interviews per cluster. There are also many checks and balances in political polling, particularly a typical 15 per cent validation (callback for confirmation).

In any case, very little political polling is done by face-to-face cluster sampling in the West (it’s difficult and expensive, most polling is done by phone). And the accuracy of polling is exaggerated. Election forecasts are generally made on the basis of comparisons to historical data, not extrapolation to raw numbers from data.

3) I’m also very worried about the fieldwork itself. I believe the reported refusal rate was 0.8% (I can’t find this in the report itself, so feel free to correct me). This is simply not believable. I have never conducted a survey with anything like a refusal rate that low, and before anyone talks about cultural differences, there are many non-cultural reasons for people to refuse to participate. If my survey was in a war-zone, I would expect refusal rates to be higher than normal.

4) Interview rate. I understand that each of the clusters (40 interviews) were conducted in one day (again, I don’t have a direct cite). The paper itself is ambiguous as to how many interviewers were deployed at each cluster, and whether they worked in pairs or solo. Assuming it means 4 interviewers each interviewing 10 people, that’s a very impressive work-rate in the context of the difficulties. See Appendix B of the report where they report being held up for hours at roadblocks and the extreme suspicion of initial respondents, requiring lengthy explanations. Apparently these “lengthy explanations” migrated quickly from household to household.

If the interviewers worked in pairs (as one would have assumed – one male and one female) then the interview rates are unbelievable.)

5) The report states that in 92% of when it was requested, a death certificate was produced. I also find this difficult to believe, although cultural practices might come into play.

I don’t attribute improper motives to the John Hopkins team. As I understand they remained in Jordan and relied on their intrepid field team. I find it difficult to believe that field team did what they say they did.

#147, mike: “Much of this thread involves a discussion of how much ordnance has been dropped, and what effect it has had.”

No. Much of this thread involves a discussion using terms such as airstrikes, missions, attacks and sorties. These terms are quite vague and give us little or no handle on the fundamental question of “how much ordnance has been dropped.”

Aside from me, there’s exactly one person here (brucer) who has made an attempt to quantify “how much ordnance has been dropped,” either by units or by weight.

“I guess you haven’t been paying attention.”

Unless you can be more specific and point out something I missed, the attention deficit is all yours.

Anon, the math error is that if you apply your (unrealistic) WW2 ratio of deaths per kiloton to your (also unrealistic) estimate of ordnance tonnage dropped on Iraq, you’d get 666,000 dead Iraqi civilians, not 375,000. You’ve taken the inverse by mistake.

Your table for Desert Storm is for USAF only. The difference between that and the accepted figure I cited is the difference when including USN, USMC, and allied air forces.

If you take a reread of your #175 above, you are essentially arguing that those five close-air sorties per day we’ve been using as a working figure for current Iraq have been dropping fully one quarter of the raw tonnage of bombs on a daily basis that the entire US Army Air Force managed to drop per day onto Germany in World War Two: so a full 20 F-18 sorties per day would presumably by able deliver the same ordnance by weight per day as the thousands of medium and heavy bombers of the USAAF were able to back then. The thesis can’t withstand the gross error check.

With regard to 3 MAW, perhaps the best comparator is that formation’s own performance in Iraq in 1991, using very similar aircraft. In that war, the teeth of 3 MAW, the 7 F-18 squadrons of Marine Air Group 11, dropped 8.5KT of ordnance in 7,500 sorties. Most but not quite all of those sorties would have been CAS, so the actual tons dropped on a per sortie would likely have been very close to the 1.3T per sortie I mentioned as a generally accepted average for modern fast jets.

You’re right that in both Vietnam and WW2 the USAF/USAAF dropped about 1.6KT a day, compared to about 0.4 KT per day in Korea (add in the RAF, and that WW2 figure climbs to about 2.3KT). The consolidated figure for all air forces in the Gulf War comes in at about 2KT per day, but air wars since have seen a drastic drop, with Kosovo averaging 0.22KT per day and Afghanistan 0.12KT. Studies by Carl Conetta and others confirm your surmise that, while precision weapons do not affect *civilian* casualty rates as much as was once hoped, they do still dramatically drop one’s rate of expenditure of high explosive to achieve the same effect, as one might expect.

So what conclusions do you draw, Mr. Kane? You`ve asserted many times that you think the authors have been behaving dubiously. Why don`t you explain exactly why you think they are?

Also, please feel free to answer my other question: have you audited the ICLS, the US National Health Surveys, the recent studies of death rates in the Congo or Sudan? If not, why not?

Finally, I don`t think many statistical or epidemiolgocial academics would agree with your assessment of the `amateurish` behaviours of these authors or the journal, or the comparison with JASA. Perhaps you are becoming a little strident?

Mike H, I understand your reasoning with regards to the media reports in Iraq, but my guess is that you aren`t there (i.e. in Iraq) to check them. Sure, you can see the ones that do get reported, and get a sense that there are x many coming in and there has been no change in the rate, and so on. But unless you can wander about in Iraq checking to see how many events get press coverage, you don`t know whether those 500 reports you see a day are because 500 journalists are fanning out over 500 events, or 10 journalists are standing around in baghdad watching 500 events, and missing the remaining 50,000 in the country. The article I mentioned hints at the possibility of the latter, with the withdrawal of journalists making it more likely that over time fewer areas have been getting coverage, but more events have been occurring in those areas. My point is that whatever the journalists are reporting, we aren`t there to check on what they aren`t reporting. So there could be an extra 60,000 attacks, sorties, bombardments, whatever you like to call them, and we would only have the military`s word for it. And if they said they used `precision` ordnance we cannot know if that ordnance was witnessed on the ground hitting a house, because we don`t know if a journalist could have been there to see it. The Daily Yomiuri article implies that the journalists aren`t there to see it, in increasing quantities of not-thereness. Which in itself hints at increasing amounts of violence. Perhaps we could track IBC`s daily or monthly death count with the number of western journalists in Iraq, and see how much one is increasing while the other is decreasing. That surely says something, doesn`t it?

“Unless you can be more specific and point out something I missed, the attention deficit is all yours.”

Ahhhhh, I see. So if someone wasn’t specifically talking in “KT” parlance, then the discussions involving numbers of civilians killed per sortie and numbers of sorties had more in common with a macrame chat room than what you wanted to talk about.

“Aside from me, there’s exactly one person here (brucer) who has made an attempt to quantify “how much ordnance has been dropped,” either by units or by weight.”

Well there you go. looks like you have a ” taker ” after all, don’t you?

“Mike H, I understand your reasoning with regards to the media reports in Iraq, but my guess is that you aren`t there (i.e. in Iraq) to check them.”

You guessed right. I’m guessing you aren’t in Iraq either. I’ve no idea how that negatively affects my point, SG.

“Sure, you can see the ones that do get reported, and get a sense that there are x many coming in and there has been no change in the rate, and so on. But unless you can wander about in Iraq checking to see how many events get press coverage, you don`t know whether those 500 reports you see a day are because 500 journalists are fanning out over 500 events, or 10 journalists are standing around in baghdad watching 500 events, and missing the remaining 50,000 in the country.”

A couple things.

As I said in my earlier comment, I don’t think the wire services people are ” see(ing) the ones that get reported,” in many cases. I don’t believe there is a lot of field work going on outside Baghdad and perhaps a few of the other large urban centers.

Second, in my view, your best argument is the one you haven’t made (well, you started to, but didn’t expand upon it).

A paucity of field reporting is irrelevant in the context of accuracy in violent death counts if the authorities are fully disclosing the deaths that do occur. While even the authorities won’t know about every death, it’s my belief the U.S. military and the Iraqi government, between them, haven’t missed very many.

The question then becomes, are they releasing details of them all? I can’t answer that. It’s apparent the Iraqi government doesn’t normally release deaths it attributes to criminal homicides, unrelated to the insurgency or sectarian fighting. A point in their favour is what I mentioned earlier, that a lot of the single and 2’s and 3’s deaths are getting reported in the wire service daily accounts of events in Iraq.

When you read enough of these, one gets the impression that there are some really ” slow news days, ” by Iraq standards, where the wire services really need these isolated, small scale attacks to keep the reader from assuming that some days, very few die violently in Iraq.

But a fair argument can be made that the authorities aren’t releasing all the deaths they know about, and in fact are being really crafty by releasing some of the small scale, away-from- Baghdad-and-its-environs attacks to make it look like full disclosure. I can’t rule that out, and confess to having my suspicions that it is going on. However, I don’t believe any withholding of death reporting, if it is occurring, gets us anywhere near the Lancet’s 15,000 a month toll.

“And if they said they used `precision` ordnance we cannot know if that ordnance was witnessed on the ground hitting a house, because we don`t know if a journalist could have been there to see it.”

True, but we are getting wire service reports from time to time where family members (and at times the Iraq government itself) accuse the U.S. of killing non-combatants in air strikes. We aren’t seeing very many, but the ones I have read lead me to believe the Iraqis aren’t shy about accusing the Americans when their air strikes kill people they shouldn’t. If we weren’t seeing any such complaints, I think your point would have more weight.

The actual lancet I study language only makes a claim about the *overall* death rate increasing to 7.9 per thousand over the baseline of 5 per thousand. It makes no claim about any increase in non-violent or non-violent death rates. Why? Because the sample sizes were too small.

If you look at the other chart you will see apparently a large increase in death rate for violent death, and moderate increases for accidents and infectious diseases. In the aggregate these increased death rates permitted a statistical claim that the overall death rate had increased over the pre-invasion baseline in Lancet I. But the study could not claim that any of the individual death rates had increased.

Note the huge increase in accidental death in Lancet I. Lets say that 6 of those 13 post accident deaths were recategorized as violent deaths, (e.g. crash because of gunfire).

You could reduce your non-violent excess deaths to 0 in Lancet I which is what you claim is the case in Lancet II. Sampling is by clusters, and I believe each of those deaths is weighted by the cluster in which it is located. If those 6 recategorized deaths are located in high population clusters they should have a big weight.

So the death breakdowns by category in Lancet I are very sensitive to small changes in the underlying sample That is why they are not statistically significant. And that is why any difference between Lancet I and Lancet II with respect to violent vs. non-violent death categories probably is of minimal importance.

“He does not claim that the category breakdowns are statistically significant.”

Chew, I’ve been over that ground several times, here and at Deltoid. It’s my argument that the increase in non-violent death measured by Lancet 1 is statistically relevant. Statisticians are telling me it isn’t. Probably, in terms of rigid statistical methodology, they’re right. In my view, if it isn’t ” statistically significant,” it certainly is ” significant,” and I doubt you’d find any reasonably intelligent, objective layperson who would say otherwise. As I’ve pointed out on several occasions, in 2004, knowledgable defenders of Lancet 1 were certainly defending the integrity of the non-violent death subsets, and their overall contribution to the excess death point estimate.

I’m not sure why I’m bothering to respond to you at all Chew. Your post 179 insinuated that I was mistaken in relation to the 40,000 excess non-violent deaths from Lancet 1. Instead of coming back and acknowledging I was right, you veer off on another tack, one that I’ve been over repeatedly with other posters here and at Deltoid.

I’ve shown you that the 40,000 non-violent claim could possibly be reduced to 0 excess deaths by recategorizing a very few of the deaths in the raw data.

Lancet I estimated approx 100,000 excess deaths with a very wide confidence interval. Whether the non-violent component of that 100,000 was 40,000 or 0 we can’t say from the data provided. Change a few deaths and you get 0 non-violent excess deaths.

Garfield’s claim was more than the data could support and was not a claim of statistical significance.

“It is surprising that beyond the elevation in infant mortality and the rate of violent death, *mortality in Iraq seems otherwise to be similar to the period preceding the invasion*. This similarity could be a reflection of the skill and function of the Iraqi health system or the capacity of the population to adapt to conditions of insecurity.”

“Note the huge increase in accidental death in Lancet I. Lets say that 6 of those 13 post accident deaths were recategorized as violent deaths, (e.g. crash because of gunfire).”

Chewie, Are you for real?

I have another way of describing your “recategorizing ” of the deaths. It goes something like ” changing what people told the interviewers back in 2004.” Which is another way of saying ” changing the data itself, because Chewie doesn’t like it.”

This is an absolutely absurd argument, but if you get to play, then so do I. I think I’m gonna take, say 5 of the 9 coalition attributed violent deaths, and change them to heart attack deaths, just because I want to reinforce the statistical significance of that 40,000 non-violent excess death figure.

“Lancet I is claiming no significant increased non-violent death rate after the war. This is consistent with Lancet II.”

“Yawn.”

I’ve been over that with Philosopher, quite some time ago. Here was my final thought on the matter:

Let me see if I have this right, Philosopher. Lancet 1 ” makes no claims about non-violent deaths increasing,” but adds 40,000 excess non-violent deaths to its excess death point estimate, without which the study could not achieve its headline grabbing 100,000 figure. These 40,000 excess non-violent deaths are only statisically relevant because they can piggy-back onto the excess violent ones, even though the 40,000, by themselves, represent a 22% increase over the baseline mortality estimate for the 18 month period.

From a non-statistician’s perspective, that’s a pretty neat trick, even if it is ” statistically sound.”

“Another reason not rely on the Garfield slide.”

Whatever you say, Chewie. I mean, just who the hell is this Garfield fellow anyway? It isn’t like he’s one of the study authors, right? Oh wait….., hang on a bit……., I guess he is.

I apologize for not acknowledging your comment earlier. Thanks for the vote of confidence in Brownie and I. It’s a shame you weren’t here a little earlier. While some of what you touched on has been hashed out, I get the sense you have some additional insights which would have enhanced the discussion.

We’re approaching 200 comments on a topic that should have been closed by #2. If I’ve ever seen sample estimates that say “cannot reject the null hypothesis” more clearly, I can’t recall it. But we’re trying to explain this to people who don’t know what a null hypothesis is and refuse to find out.

Given the time Mike H and others are putting into this, a day or two with a good intro stats book would be a really good investment. It might not resolve all the questions, but it would make it possible to conduct a coherent debate, which can’t happen as long as one side is wilfully ignorant of the basics.

“I have another way of describing your “recategorizing ” of the deaths. It goes something like ” changing what people told the interviewers back in 2004.” Which is another way of saying ” changing the data itself, because Chewie doesn’t like it.”

No. Think of it as taking another survey (call it Lancet II) in which you had 6 less deaths from accidents and 6 more for violence. Or think of it as there was ambiguity about whether to classify a death as an accident or violent and you had a switch of 6 in the original or a second survey.

In either case a small change in the underlying sample data, one that could occur by chance over multiple samplings even though the underlying population was constant, would eliminate the supposed “40,000” non-violent deaths that you claim are so significant. That’s why the concept of statistical significance is important.

This is an absolutely absurd argument, but if you get to play, then so do I. I think I’m gonna take, say 5 of the 9 coalition attributed violent deaths, and change them to heart attack deaths, just because I want to reinforce the statistical significance of that 40,000 non-violent excess death figure.

If the confidence intervals for subgroups are so wide that you can’t make any serious statistical claims about them, then presumably that also goes for the subgroup of excess violent deaths in Lancet 1, and Lancet 2 for that matter. If you’re going around reclassifying, as you state above, then why not reclassify all the deaths recorded as being violent in Lancet 1 as in fact accidental, and thereby claim that the coalition killed noone? (Or, so as not to be silly, and factoring in the deaths that we *absolutely know* were violent, due to illness etc. – that 90,000 of them could have been accidental, and only 10,000 were killed by the coalition forces?)

I ask this question seriously, but I admit to not being a statistician. These ‘violent deaths’ subgroups are the largest subgroups, but the ‘excess violent deaths’ subgroup in Lancet 1 isn’t much bigger than the ‘excess non-violent deaths’ one, and my understanding is that you’re saying you can’t make any serious statistical claims about the latter.

brucer, thanks for your clear, helpful and well-informed answer. You’ve convinced me that the 3rd MAW didn’t drop 500KT in a 9-month period in 2004, even though that’s what the official press release says.

And thanks also for explaining my math error. I think that maybe I unconsciously saw that 500/.75 gave an answer too “good” (from the perspective of my argument) to be true, so I approached it the other way.

There are still a few unsolved mysteries that you could probably help unravel (although they are much less important than the issues you did already resolve, so it’s understandable if you decide you have something better to do).

I still find it odd that the press release, once hosted on at least two official sites, can no longer be found there. What I would expect is to find it there, with a simple correction.

I wonder how you knew about the press release, and why you called it “famous,” since I can hardly find anyone other than Hersh who noticed it.

I’m surprised that I can find no one (other than you) pointing out the error in the press release. I’m also surprised I find no one (like righty bloggers, for example), pointing out that Hersh failed to notice the error.

You had also said this: “much of this thread involves a discussion of how much ordnance has been dropped.”

It’s not a question of “parlance.” It’s a question of understanding that reporting a given number of sorties is not the equivalent of reporting a given amount of ordnance dropped, unless one is in a position to make a statement about amount of ordnance dropped per sortie.

You’re a deep thinker. You should have no trouble grasping such a simple concept.

One person on this thread raised that question (of trying to assess amount of ordnance dropped, rather than strictly focusing on number of sorties). That’s me. One person on this thread answered that question. That’s brucer. You, on the other hand, have been no help whatsoever.

#184: “I don’t believe any withholding of death reporting, if it is occurring, gets us anywhere near the Lancet’s 15,000 a month toll.”

As someone recently said around these parts: sounds like one of those “gut feel” conclusions we should be studiously avoiding.

We’re approaching 200 comments on a topic that should have been closed by #2. If I’ve ever seen sample estimates that say “cannot reject the null hypothesis” more clearly, I can’t recall it. But we’re trying to explain this to people who don’t know what a null hypothesis is and refuse to find out.

Given there are about half-a-dozen contributors to this thread who claim to be statisticians, why did it take until comment #192 before someone mentioned the words “null hypothesis”?

I have a stats background, and I’ve made a living conducting market and social research surveys for more than 25 years. […] I’m also very worried about the fieldwork itself. I believe the reported refusal rate was 0.8% (I can’t find this in the report itself, so feel free to correct me). This is simply not believable. I have never conducted a survey with anything like a refusal rate that low, and before anyone talks about cultural differences, there are many non-cultural reasons for people to refuse to participate. If my survey was in a war-zone, I would expect refusal rates to be higher than normal.

“If you’re going around reclassifying, as you state above, then why not reclassify all the deaths recorded as being violent in Lancet 1 as in fact accidental, and thereby claim that the coalition killed noone?”

My “reclassifying” was to show that a few changes in the underlying data could eliminate the “40,000” non-violent excess deaths that Mike H is obsessing about.

As to your underlying question. In Lancet I you could claim a statistically significant increase in the overall death rate, but not in the subgroups. You can’t tell whether the overall rate increased because the violent rate increased, or the accident rate increased, or the infectious disease rate increased, or some combination thereof, although they had data showing increased deaths in each of those categories. So your claim that all deaths could be caused by increased accidents or heart disease cannot be excluded by their results. But is it plausible? Not very. If you want to say that that absolves the invaders from any responsiblity then that’s up to you. I wouldn’t, since the occupying power under the Geneva Convention has the obligation to provide security.

In Lancet II, they found both a statistically significant increase in the overall death rate, and a significant increase in the *violent* rate also. So in that case you couldn’t claim it was all due to accidents.

This is consistent with our observation that violence and chaos has been increasing in Iraq.

“Given there are about half-a-dozen contributors to this thread who claim to be statisticians, why did it take until comment #192 before someone mentioned the words “null hypothesis”?”

Because, for those of us who know what we are talking about “statistically significant difference” means “sufficient to reject the null hypothesis of no difference”. Since a reasonable number of non-statisticians have at least some idea of what “statistically significant” means, we’ve tried to make it easy on you and others, by avoiding the complexities of classical inference theory but to no avail. I repeat that you really need to understand this stuff if you’re going to engage in debate about sample surveys. “

I think you’ll find that claimed response rates in the 98% and 99% range are not uncommon.

In addition, the 2004 Iraq Living Conditions Survey reported a response rate of 98.5%.

So here’s an interesting lesson: James has 25 years of practical experience in the field of market and social research surveys, and declared the response rate in the Burnham study “simply not believable.” But his intuition, built up over those 25 years, appears to have been wrong. Doesn’t that suggest that our intuition based on experience in more developed countries may not transfer well to Iraq, and that arguments from incredulity are insufficient?

“Given the time Mike H and others are putting into this, a day or two with a good intro stats book would be a really good investment. It might not resolve all the questions, but it would make it possible to conduct a coherent debate, which can’t happen as long as one side is wilfully ignorant of the basics.”

That’s pretty rich John, coming from someone who believed the UNDP ILCS was a passive count rather than a massive cluster survey. You might want to give your friendly neighbourhood glass man a call.

While we’re on the subject of ” coherent debate,” it’s rather difficult to have one when people like you avoid addressing those points being made by the other side which make you uncomfortable.

For the third time, I’ll ask you for an explanation for the discrepancy between the ILCS’ 24,000 ” war related” deaths in the first 12 months of invasion, and Lancet 2’s 90,000 excess violent deaths for the first 14 months.

I’ll also ask you once again for an explanation as to how a country the size of Iraq (or any country for that matter) can undergo a minimum of 425,000 violent deaths in little more than 3 years time, and yet not have the non-violent mortality rate rise in a “statistically significant manner.”

Chewie, I promised myself I wouldn’t bother responding to anything else you post, but you keep stretching the truth further and further with every new comment you toss up.

“My “reclassifying” was to show that a few changes in the underlying data could eliminate the “40,000” non-violent excess deaths that Mike H is obsessing about.”

You can’t change what the interviewees reported in 2004. That’s not an option. Even though it’s already been pointed out by Lopakhin and myself, your tunnel vision prevents you from seeing that ” what ifs ” are a two way street. If you get them, so do I.

“One person on this thread raised that question (of trying to assess amount of ordnance dropped, rather than strictly focusing on number of sorties). That’s me. One person on this thread answered that question. That’s brucer. You, on the other hand, have been no help whatsoever.”

I didn’t realize it was mandatory that one fully dive into every tangetial discussion that crops up in this thread. If that’s the case, you’d better pull up your socks in relation to the discussions I’ve been involved in. You’ve been no help whatsoever.

#184: “I don’t believe any withholding of death reporting, if it is occurring, gets us anywhere near the Lancet’s 15,000 a month toll.”

“As someone recently said around these parts: sounds like one of those “gut feel” conclusions we should be studiously avoiding.”

As I recently said around these parts, a year from now, you and the rest of the Lancet true believers can tell us why just under a million dead by violence isn’t a ” gut feel ” conclusion. It’s “statistically sound.”

Mike H, your second question first: this is purely dependent upon confidence intervals, which are determined by the size of the samples, not the number of deaths which occurred. Let me give you an example of two samples of 100 deaths, so that you can see just how crazy the effect of confidence intervals can be. In pre-war Iraq, 100 deaths would occur in probably 18000 individuals, or maybe 4000 households (these are guesses based on that 5.5/100 0 death rate and 5 people per household). Now suppose that 40 of those deaths were non-violent. The confidence interval around these 40 deaths can be calculated (because they are assumed to be from a poisson distribution) from tables to be 29 to 55. Suppose now I conduct another survey of 100 people, and I want to see if the rate of non-violent deaths has changed (up or down, for now). To do this I just compare the point estimate from the new sample with the confidence interval from the old one, and if the new result lies inside the old one, there is no statistically significant difference. So if my new sample has a number of non-violent deaths between 30 and 55, I cannot detect a change. But a change from 40 non-violent deaths to 55 non-violent deaths is huge, as a rate in real terms. To check for only an increase, I would probably need to have a change from 40 to 60 in order to see a statistically significant difference (I don`t know how to calculate these CIs quickly, sorry). This is without considering the effect of survey sampling, in which households are assumed not to be drawn from a properly simple random sample. In this case even a moderate effect of sampling will require that the rate of non-violent deaths jump from 40 to 70 before you can conclude that there was a significant difference. And remember this is after sampling 4000 households, not 1800. So you can see that statistical conclusions about subgroups are difficult to make with anything but the most massive of surveys.

(This example is obviously much simpler than the Lancet survey, which included many adjustments for sample surveys, unequal variances between groups, etc. But I hope you see my point)

As for the ILCS vs. the Lancet, as I understand it the ILCS survey asked a different question, so you are comparing subtly different things. We don`t know whether an Iraqi householder would consider a death “war-related” if it was a revenge killing because his wife was an interpreter for the Coalition. He might simply think it was murder. Similarly, sectarian violence between shiite and sunni groups might not be construed as war-related, depending on what “war-related” was taken to mean. For example, some respondents might have thought that only meant deaths caused by direct firefights or bombings. If your daughter is run over by an American tank doing a peace-time patrol in Najaf, is that war-related or not? What if your son blew himself up making an IED? Would you tell the UN interviewer that your son blew himself up preparing an IED to blow up the UN offices? Was blowing up the UN offices war-related? The Lancet study avoids this definitional problem by classifying deaths as violent or non-violent, defining what that means and checking it with a death certificate. The language of excess deaths makes it clear how much additional death is due to Iraq`s change of circumstances, regardless of whether it is “war-related”.

This is the most important aspect of the study: a certain number of verified deaths occurred, and can be extrapolated to the population by a well-researched and supported statistical method. Whatever the ILCS found, whether different or not, doesn`t change that. The reason for the discrepancy must lie elsewhere, i.e. in the way the question was asked, different sample frames, etc. Neither study is necessarily invalidated by this problem.

Robert, response rates of 98% or 99% are uncommon. The claimed Lancet non-refusal rate is 99.2%. This is in a situation where:

the teams faced suspicion… lengthy explanations were required – explaining that it would help the Iraqi people – were neccessary to allay fears (p.15)

And depite these lengthy explanations, the field team managed 40 interviews per day, at 99.2% participation.

The whole description of the procedure in the report seems odd. Candidates for interview were the head of the household or their spouse. Are we to conclude that there were no households where the head/spouse were not at home?

BTW, I can find no indication that the 2004 Iraq Living Conditions Survey “reported a response rate of 98.5%.” do you have a link?

The media these days tell what is fact but not the truth. The news for example only gives quick updates on the war in Iraq and how many Americans died but they do not tell how many Iraqies died during the war . Reason behind that is because they do not want to show how cruel America acutally is. The media has control of what America can see through the media. I think it is really messed up that the numbers of death due to air strike is so high. It is like one day you are a live and in a matter of minutes an airplane flys above you and the next thing you know it your family or nieghobor is not longer beside you.

It’s unbelievable that with so many airstrikes in Iraq we aren’t getting the big picture and all the information of what is really going on there. We get bits of information from the media and we’re always told when an American is killed, but we hardly ever get news of Iraqis who die. And with so many airstrikes, shouldn’t people logically believe that people are dying? Planes don’t just randomly drop bombs on empty fields. I just think the media should just focus on what’s going on and stick with the facts instead of always trying to make America look like the “good guy” because a lot of people already think this war is completely unnecessary.

Mike, the point is not that you’ve made mistakes (as you observe, so have I) it’s that you refuse to acquire the knowledge necessary to correct them, or to keep silent about topics you admit you don’t understand properly.

It’s not that hard to find and read a good introductory stats text. If you did that, it might be possible to discuss the issues further.

What are the confidence intervals around the estimates of violent deaths in the overlap period of lancet 1 and lancet 2? As I’ve said before, I wouldn’t be surprised to see that they overlapped, but if you want to rely on them, you should post them.

The CI range for violent mortality rate in the second study was 1.8-4.9 for Mar 03-Apr 04. I can’t easily find a CI for the first study, but the point estimate was 58 000 violent deaths, which I make as approximately 2.2 per thousand, based on a population of 26 million. If this is right, the point estimate for the first study lies within the CI for the second, so obviously the hypothesis of no significant difference can’t be rejected.

The fifth, an old report from Thailand, reported a response rate of 94%. So I stand by my claim that response rates of 98 and 99 percent are not uncommon.

James continued:

This is in a situation where:

the teams faced suspicion… lengthy explanations were required – explaining that it would help the Iraqi people – were neccessary to allay fears (p.15)

And depite these lengthy explanations, the field team managed 40 interviews per day, at 99.2% participation.

The whole description of the procedure in the report seems odd. Candidates for interview were the head of the household or their spouse. Are we to conclude that there were no households where the head/spouse were not at home?

So what you’re saying is, you are arguing from incredulity and not evidence? BTW, on p. 4 of the Burnham article, they write: “In 16 dwellings, residents were absent; 15 households refused to participate.” So while the refusal rate was about 0.8%, the total non-response was about 1.7%.

I can find no indication that the 2004 Iraq Living Conditions Survey “reported a response rate of 98.5%.” do you have a link?

Page 13 of volume 1 of the ILCS report says 21668 out of 22000 HH’s were interviewed. They don’t mention whether a non-interviewed HH was because of refusal or absence.

So, let’s review one more time: you have 25 years of experience in the field, and yet your intuition about response rates that are “simply not believable” does not appear to be supported by the evidence. Shouldn’t that be a cautionary tale? Doesn’t that suggest that “unbelievability” in and of itself isn’t a sufficient argument?

#204, mike: “I didn’t realize it was mandatory that one fully dive into every tangential discussion that crops up in this thread”

The position you’re now taking is essentially this (paraphrase): “I/we didn’t answer your question (regarding how much ordnance has been dropped) because we had better things to do, and we had no interest in diving into a question we considered tangential.”

That’s a shift. The position you took earlier was essentially this (paraphrase): “you shouldn’t be asking that question (regarding how much ordnance has been dropped) because it’s already been answered.” Here are the words you used to communicate that idea (#147): “much of this thread involves a discussion of how much ordnance has been dropped.”

Trouble is, until I raised the question of “how much ordnance” had been dropped (#64, #111), that’s not what was being discussed. What was being discussed was how many sorties had been flown. That’s not the same thing as “how much ordnance” had been dropped, unless one is in a position to assert an ordnance/sortie ratio. Which is essentially what brucer helpfully did (#112, #151, #173, #178), in response to my question.

In other words, your claim (“much of this thread involves a discussion of how much ordnance has been dropped”) was false. Prior to brucer’s #112, no one here (aside from me) had made a claim about “how much ordnance” had been dropped.

I can’t imagine why you would make a false claim and then, when challenged, do your best to act as if you never made it. Here’s one possibility: the earlier claim was made by some other mike h. Is that what happened?

“you’d better pull up your socks in relation to the discussions I’ve been involved in. You’ve been no help whatsoever.”

You’re determined to repeatedly demonstrate that you have trouble with simple distinctions. Here’s why there’s been no need for me to “help” with the “discussions” you’re talking about: lots of other people are asking good questions and providing good answers. Not so with the question I asked. Only one person asked that question (me), and only one person answered (brucer).

“a year from now, you and the rest of the Lancet true believers can tell us … “

I pointed out that you were relying on a “gut feel” conclusion. Your defense was this (paraphrase): “it’s OK for me to rely on a ‘gut feel’ conclusion because I think you’re going to do that in the future.” Very impressive.

Just to answer a troll way back upthread – yes, the figures for urban Iraq are representative, because as stated the top 6 cities are a majority of the population. Yes, places like Muthanna Province are quiet, but nobody lives there.

Anyway, the “500,000 tonnes” thing has been thoroughly dealt with here. To be right, it would have required an impossibly high sortie rate, one that could not have been sustained long enough to get rid of the bombs.

Your quote from the Iraq Living Conditions Survey simply says that they had a target sample and an achieved sample. It says nothing about the refusal rate. Samples are almost always described like this “we set out to get x and we got y”. Sometimes y is greater than x!

Your quote from the Iraq Living Conditions Survey simply says that they had a target sample and an achieved sample.

Okay. But the DHS surveys indicate a number for the target, a number for the households that were found, and a number of completed interviews. That sure doesn’t sound like they were hunting for households until they got their quota. And 98% and 99% response rates in the DHS don’t appear to be uncommon.

So it sounds to me like your intuition that a response rate of 1849 out of 1880 HH’s as “simply not believable” based on 25 years of work in the field doesn’t transfer well to other countries. Doesn’t that suggest that “unbelievability” in and of itself isn’t a sufficient argument?

“Mike, the point is not that you’ve made mistakes (as you observe, so have I) it’s that you refuse to acquire the knowledge necessary to correct them, or to keep silent about topics you admit you don’t understand properly.”

John, I’ve got a short story to tell, in relation to your admonishment that non-statisticians like myself should “keep silent” about studies like the Lancet Iraqi effort, and just take your word for things.

Shortly after the first Lancet paper was released, Richard Garfield, one of the study’s authors, gave an interview (the Epic interview I linked to in an earlier thread here). During that interview, Garfield claimed the survey data extrapolated a point estimate (with the Falluja data removed) of 30,000 Iraqis killed by coalition air strikes. This was cited by defenders of the study at Deltoid, and it seemed to be an accurate statement, given that one of the most prominent and headline making assertions from the paper was its front page claim that ” violence accounted for most of the excess deaths and air strikes from coalition forces accounted for most violent deaths. (As you may recall, this statement was later found to be highly misleading at best).

Again, this seemed consistent with Garfield’s figure for coalition-caused air strike deaths. After all, the extrapolated excess violent death point estimate was 57,000, and 30,000 is a majority number for 57,000.

Well it turns out, John, that Garfield got it wrong, the actual extrapolation for air strike deaths worked out to about 17,000, not 30,000, and it was me, innumerate Mike, who caught the mistake. And for quite a while at Deltoid, no one knew what to make of the discrepancy I had pointed out, because after all, Garfield was one of the study leads, well-versed in statistics. The discrepancy must be explainable. Eventually, Tim Lambert sent Garfield an e-mail, and lo and behold, he had been wrong, and I had been right. You can confirm all of this with Tim or D Squared.

I am not bringing this up to toot my own horn, John. I do so to illustrate that it is sheer arrogance to suggest that non-statisticians are incapable of critiquing studies such as these in meaningful ways.

While we’re on the subject of non-statisticians vs statisticians, John, there’s a question that I’ve asked before, and I’m going to ask you again, and I’d really like you to give me the courtesy of a reply, rather than an ignore. It relates directly to your last 2 posts to me, and your criticism that I’m unable to grasp the statistical proof that the variances in the non-violent death subset from study to study are not statistically significant.

If it is a sign of innumeracy to fail to acknowledge a statistical non-significance or no effect for variances in the tens of thousands from study to study in the main non-violent subset, then what do you make of the numerates at Deltoid and elsewhere who were arguing vociferously in 2004 and 2005 that increases in smaller subsets of non-violent death in Lancet 1 were statistically meaningful? I don’t know much about statistics. They do. It seems to me they were committing the same sin you accuse me of, only on a more egregious scale.

“The position you’re now taking is essentially this (paraphrase): “I/we didn’t answer your question (regarding how much ordnance has been dropped) because we had better things to do, and we had no interest in diving into a question we considered tangential.””

And you’re accusing me of a ” shift,” and “false statements?”

Get rid of the ” we.” I was referring to myself, not the collective. I feel no requirement to take part in every tangential sidebar issue that arises in this thread. Very few people do in a busy thread, and I don’t see a single other commenter here that has, including you. So where did you get the ” we ” from?

“That’s not the same thing as “how much ordnance” had been dropped, unless one is in a position to assert an ordnance/sortie ratio. Which is essentially what brucer helpfully did (#112, #151, #173, #178), in response to my question.”

Yes Bruce was helpful, wasn’t he? Now, about that post # 112 of his…… Tell me anon, do you recall what post you made your exasperated ” any takers” challenge to everyone? I’ll help you out, it was # 142.

I think you can figure out which number comes first.

“I pointed out that you were relying on a “gut feel” conclusion. Your defense was this (paraphrase): “it’s OK for me to rely on a ‘gut feel’ conclusion because I think you’re going to do that in the future.” Very impressive.”

I guess you missed the nuance, anon. I don’t believe any objective person would have a “gut feeling ” that nearly a million Iraqis had died violently by this time next year, assuming Sunnis and Shiites don’t start slaughtering, in Algeria-like fashion, entire villages and towns in the near future.

In relation to the first half of your post, I’m wondering if it’s relevance to my argument is affected at all by the fact that pre-war, both studies found violence to be nearly non-existent for statistical purposes. The pre-war mortality rate prior to the war for both studies is made up almost entirely of death by non-violence.

“As for the ILCS vs. the Lancet, as I understand it the ILCS survey asked a different question, so you are comparing subtly different things.”

Yes, I concede that, SG. But we don’t know if the subtle differences between the two surveys, and their findings, all work in favour of closing the gap between the two surveys to assist Lancet 2’s credibility. I suspect there’s a push and pull effect going on with the UNDP ILCS, in that it moves toward the Lancet numbers for violence in some ways, then pulls back in others. The net effect is impossible to say.

I’ll give you some examples, to contrast with the ones you’ve used.

The Lancet study is far more likely to record homicide deaths than the ILCS. I agree with that. However, to trot out the example I’ve used elsewhere, if a member of Saddam’s security services had been killed shortly after Saddam was overthrown, by people whom he’d been applying electric current to their genitalia prior to the war, I expect that the thug’s widow would categorize this death as criminal homicide if asked by a Lancet interviewer. If she was interviewed by a UNDP interviewer, I think it’s highly likely she’d call this a ” war-related death,” although obviously this is something I find intuitive, rather than something I can substantiate. In such a case, both deaths become part of the respective studies’ violent death data.

As I’ve also mentioned before, the ILCS study seems intended to capture the deaths of Iraqi soldiers killed during conventional combat which ended the regime in March and April 2004. That would further dilute the number of civilian deaths that make up the ILCS’ point estimate of 24,000 violent deaths.

By contrast, Lancet 1 recorded no deaths it could categorize as regular military, and if memory serves me right, the questionnaire was structured in a manner that made it unlikely soldier deaths would be captured. Lancet 2 seems to be in the same boat, since it requires the decedent to have lived in the household continuously for 3 months before the event. As was discussed at Deltoid in 2004, with invasion pending, it was most likely that most of the Iraqi army had been in barracks for quite some time prior to the beginning of hostilities. Whether this is true or not, we don’t know, although it seems reasonable.

I agree there are some unknowns which make comparison between the ILCS and Lancet 2 difficult, SG. But in my view, we know enough, and over a very similar time frame, to ask why we have two very different point estimates for violent death. Further, it’s worth remembering that supporters of Lancet 1 proclaimed the war-related death figure from the ILCS to be a vindication of the Lancet survey and its conclusions. They can’t pick and choose which Lancet study gets to vindicate, and which one gets to be ignored.

Anyway, thanks again for another of your thoughtful comments. I enjoy our discussions. By the way, if you have the time, I’d be interested in any thoughts, brief or otherwise, that you might have concerning my reply to you in post # 184.

“So you assert that that the CI overlap, but you can’t be bothered calculating them?” – James

Talk about flypaper for innumerates! Hint:the point estimate is the central point of the CI.

Mike, if commentators at Deltoid got things wrong, go there and point it out. And if you managed to point out an error, good for you. But that doesn’t help in this case.

I made at #2 a simple, and conclusive, point based on elementary stats, showing that there was no contradiction in the data on non-violent deaths. The same is true for violent deaths and for all deaths. I can’t defend against someone who doesn’t know basic stats and declines to learn.

Again, I point out that the effort involved is not that great. A day or two with a good book is all you need. Given that you have apparently been involved in this topic for years, it would be a worthwhile investment.

…the effort involved is not that great. A day or two with a good book is all you need.

Would that that were true. I don’t want to discourage Mike, but he needs to know what he is letting himself in for. From the Introduction to Hogg & Craig, 4th edition:

A point is to be chosen in a haphazard fashion from the interior of a fixed circle. Assign a probability p that the point will be inside another circle, which has a radius of one-half the first circle and which lies entirely within the first circle.

Sure, it’s easy when you know how. But that’s on page 3 for crying out loud. They get to point estimation on page 200. Day or two my arse.

“Mike, if commentators at Deltoid got things wrong, go there and point it out. And if you managed to point out an error, good for you. But that doesn’t help in this case.”

How about Deltoid’s host himself, Tim Lambert? If he ” got things wrong,” would that make a difference, John? Tim’s statistical acumen is beyond reproach.

I’m not singling Tim out to embarrass him. He has nothing to be embarrassed about by being cited in this context, and I don’t necessarily think he has “got things wrong.” I’m using him as an example because it emphasizes just how wrong you are in suggesting that accomplished statisticians are in agreement that non-violent deaths did not rise or fall, pre-invasion compared to Lancet 1.

Here’s one of Tim’s posts from May 16th 2005, titled ” Lancet/ILCS roundup”:

“Jim Lindgren agrees with me that the ILCS supports the Lancet study. He also raises some concerns about some of the numbers in Lancet study: “

Lindgren: ‘ I find it somewhat odd that heart attack and stroke deaths are up 64% in the later period, and accidental deaths are up more than 3-fold. And live births are up 33% in the later (War & Post-War) period, even though post-War pregnancies would not lead to live births until 9 months had passed, so the rate of having children would likely have to have jumped substantially more than 33% in the last half of the later period. Further, household size jumps from 7.5 in the earlier period to 8.0 in the later period.’

Lambert: “None of these increases seem unlikely to me. While the number of births was 33% higher the time period after the war was longer, so the birth rate only increased by 10%. For this to happen, there would only have to be a 20% increase in the last nine months and it seems that the overthrow of Saddam might make people more optimistic about bringing another child into the world. The increases in heart attacks could be caused by an increase in stress because of the war and the decline in medical services. The increase in car accidents could be caused by the breakdown in law and order and fear of crime. (For example, driving through intersections at high speed to avoid ambush by robbers.) Finally, the increase in household size seems to be an inevitable consequence of the number of births and deaths recorded.

Of course, Lindgren’s suggestion that people are forgetting to mention some deaths that happened before the invasion may still be correct, indeed, the ILCS found evidence that infant deaths were being under-counted and went back to do some re-interviews. “

This is not an isolated instance of Tim defending the integrity of various subsets of non-violent death. As you can see, but will undoubtedly fail to concede, this passage from Tim firmly and conclusively rebuts your assertion that only a non-statistician would interpret the increase in non-violent deaths as “statistically significant.”

Hi Mike H, sorry about my failure to respond to your points in #184, this thread has grown a little incomprehensibly large and I missed them.

I think we are arguing a little at cross-purposes though. My post concerning the presence of journalists in Iraq was partly speculation, and was entirely written to make the point that many people`s “gut” feelings are based on reporting which shows some (note this qualified word) suspicion of being patchy, incomplete, imperfect, etc. I don`t pretend to claim that this post refers to any comments of yours or not (and I`m certainly not going to drag my heels through 226 posts looking for examples of who used “gut” reasoning!) There have been many critiques of the Lancet paper and a lot of them are based on the idea that if 500 deaths are occurring every day then those numbers would be reported faithfully in the media.

So anyway, in response to your points, I certainly agree with your suspicion that the US and Iraqi governments might be selectively releasing figures. After all, the US govt doesn`t do body counts. Also suppose that a casualty count of, say, 30 dead and 45 injured on one day gets in the news – if 5 days later 35 injured die of their burns, this doesn`t hit the news at all. The media only count the deaths that occur at the moment. In this regard the IBC website is very accurately named. I doubt that the recovery rate is very good in Iraqi hospitals at the moment. So I would say that even if the media are present all across Iraq it would be very easy for the govt to minimise the death rate by only reporting deaths which occur at the time of the event, particularly if they are reporting those deaths on a monthly basis. It takes months or years to compile death rates from hospitals, after all, even in functioning govts, and journalists can`t just waltz into hospitals demanding to know how many people died today and from what causes.

Given the US govt has a huge interest in minimising deaths in Iraq, and the Iraqi govt is its puppet, I don`t think we can underestimate the risk that these things are happening.

As for the possibility that journalists are not in remote areas but hear about all the deaths there because of Iraqi activism, this seems to me to be unlikely (please forgive me if I misunderstand your post). I admit that this is an argument from incredulity, but we all know that modern media operate by waiting for press releases, and I can`t imagine that people in war-torn regions of rural Iraq are very much able to get those press-releases to the media; and even if they could, they would be dismissed by western journalists as `not credible`. IBC requires that all reports be verified by an independent source, right? What independent source can verify the death of 15 children in a school in woop-woop-stan? So IBC throw it in the bin, the US govt takes every opportunity to discredit it (easily done – obviously it can`t be confirmed) and the western media don`t care because they weren`t there. No matter how the kids were killed, it probably won`t be reported.

Just in case this post doesn`t reply to yours and just appears to be a pointless rant, I shall reiterate that I think these points don`t serve to discredit the Lancet study. The Lancet study can be assessed on its merits, which to me are considerable. The role of journalists in reporting unseen deaths when they aren`t there may help to assuage people`s concerns that the deaths are too high, but the study is right or wrong for other reasons.

Mike H, as a follow-up to that longwinded response, I thought I might add something regarding reporting air strikes. I know of a theory of war which posits three dimensions – moral, mental, physical – to war, and some argue that against insurgents winnning the moral campaign is more important than winning the physical. These people also argue that the insurgents know this and the US army doesn`t. But if the US army has even an inkling of this philosophy behind their “plan” (or lack thereof) in Iraq, it is in their interests to discredit reports of air-strikes. So when a precision bomb precisely kills 15 kids in woop-woop-stan, if that report gets to the media unconfirmed by US troops on the ground, or by “independent” journalists at the school, the US army will immediately declare it to be a lie. I think we have seen evidence of this where they blew up a wedding on the Syrian border and claimed it was an insurgent hideout, until al Jazeera produced the wedding video, or photos, or some such. This is common behaviour for a force which knows the moral war is important, but hasn`t figured out that it is the killing part, not the reporting part, which is the important bit.

So I would argue that we need journalists on the ground in order to accurately report casualties – it is insufficient for them not to be there and to rely on secondhand reports. Whatever one thinks of al Jazeera`s methods, their reporting from remote areas has exposed a lot of spin and falsehood in US reports – remember the govt denied soldiers had been kidnapped until al Jazeera reported it? they denied post-invasion casualties until al Jazeera showed some; they denied a helicopter had been shot down until al Jazeera showed it; and they lied about the wedding. Therefore I argue that journalists on the ground are important, and there is evidence that they are not there as much as they used to be.

Mike H, re: post 222. Your examples are good, but they only serve to confirm my point about the different questions. Since we aren`t Iraqis, we don`t know what Iraqis consider to be war-related. For any one imagined incident we could probably come up with counter-arguments to declare it war-related or not. e.g. I could say, *after years of living in a country with violent police, the wife of your example considers that sort of behaviour to be normal, and considers the response to be normal, not war-related.*

Since most people posting here have been surprised by the response rate in Iraq (because we don`t understand Iraqi culture), it`s likely we would be surprised by the reasoning behind a definition of war-related. therefore we should expect that there will be differences in surveys. I think that the Lancet is better for its purpose because a) it confirmed 80% of the deaths through certificates; b) it ruled out deaths of soldiers (as opposed to insurgents); c) the question it asked (did someone die) is directly related to the research question (how many people died). These three points are markers of a good epidemiological study – rule out confounders (b), match the question to the aim of the study (a and c). The ICLS question (how many people died in war) is maybe only broadly related to the study aim (what are your living conditions like) since arguably a better question to ask about living conditions is *has anyone in your house died*. Also the question involves an inherent uncertainty, because it does not confirm through a death certificate, and does not clearly define the term. This isn`t to say the answer is incorrect, just that it is not so reliable for answering the survey`s key question (what are living conditions like in Iraq).

So again, I maintain that *did someone in this house die a war-related death* is only part of the question *how has Iraqi death rate changed since the invasion*. Therefore, we cannot expect comparable rates between the surveys.

Disclaimer: I have not read the ICLS in any detail. For that matter, I have not pored over every detail of the lancet studies – I have only examined the general method and results.

don`t waste your time reading a stats book. I don`t think anyone needs to know much about stats to legitimately question these or any figures, and if you end up looking obtuse because you don`t know the difference between a Borel set and a sigma-algebra, well, people don`t have to answer you.

Ok, I had to step away for a while to do some serious travelling, and I have no idea if this thread now counts as basically dead or not. But just in case:

“Let me see if I have this right, Philosopher. Lancet 1 ” makes no claims about non-violent deaths increasing,” but adds 40,000 excess non-violent deaths to its excess death point estimate, without which the study could not achieve its headline grabbing 100,000 figure. These 40,000 excess non-violent deaths are only statisically relevant because they can piggy-back onto the excess violent ones, even though the 40,000, by themselves, represent a 22% increase over the baseline mortality estimate for the 18 month period.”

The mistake here is to think that the 100,000 number is itself derived from a calculation of the 40,000 number for non-violent, and from the 60,000 number for violent. But that’s not what’s going on. The 100,000 number is calculated on its own, and was done from a sample large enough to get the CI that has received so much discussion. The 40,000 number for non-violent deaths is derived from a smaller sample, and was not (given the baseline number of nonviolent deaths) reflective of a statistically significant difference in the number of non-violent deaths. Remember, in the pre-war period, the vast majority of deaths in this country of twenty-something million were from non-violent causes, so a fluctuation of 40,000 against that large background number might just not be — indeed, apparently wasn’t — statistically meaningful. But the 60,000 increase in violent deaths (within its CI) was also calculated independently, and did turn out to be statistically meaningful. This is the same basic idea we keep coming back to: it can be the case that a result on a sample is sufficient for significance for a whole population, even while failing to give significance for any substantially smaller subpopulation. So when you write like this

“adds 40,000 excess non-violent deaths to its excess death point estimate, without which the study could not achieve its headline grabbing 100,000 figure”

you’re missing that key point. They didn’t calculate the violent deaths, and then the nonviolent deaths, and add the resulting numbers together. Each of the three estimates — total deaths, violent deaths, nonviolent deaths — was the result of a separate, independent calculation. Two of the numbers represent point-values within a significant CI; and one of them doesn’t.

I also never got around to responding to soru at #159: “On the second, the peak of the distribution is right next to the lower CI bound, not up in the middle as would be the case for an unadjusted normal distribution.” Well, you’ve got a funny notion of “right next to” — if you look at the different scales on the two graphs, it’s actually _much further_ from the lower bound, than the point-estimate is from the lower bound of the first graph! But perhaps your point is just that point-estimates aren’t always to be found by taking the mean of the upper and lower bounds? If so, then, yes, that’s right for certain sorts of distributions. But I don’t see how that point bears on anything I said earlier. What I said earlier was that the point-estimate is much more likely than either of the bounds. That is still very true in both graphs that you linked to.

I understand your argument, and I don’t disagree that the 40,000 non-violent deaths is less statistically significant and less statistically robust than the overall point estimate for excess deaths, and the excess violent death point estimate.

However, there still remains an increase in reported deaths for 4 of the 6 non-violent subsets (chronic disorders and ” other,” stayed the same). And as you see from my post #227 to John, one of the most prominent and statistically knowledgable defenders of Lancet 1 not only accepts that the study recorded an increase in non-violent deaths, he makes arguments to bolster the plausibility of these increases.

The way I see it Philosopher, the non-violent excess death extrapolation is a bit of a conundrum for defenders of the first study. In a pure statistical sense, it may not constitute an effect. Yet at the same time, the increase has to be defended, because no non-statistician would accept the survey’s 100,000 excess death point estimate if they were told that nearly half of it was considered statistically unsignificant. That’s completely counter-intuitive to any non-statistician.

You may not agree with my wording of my paragraph that you cite to start your post, but I think it quite accurately reflects this dilemma for the study authors and their supporters.

Of course, there was a solution to this dilemma. The authors could simply have cited the increase in violent death. Had they done so, you and I wouldn’t be having this exchange. I think the reason why they chose not to do so is obvious.

“…because no non-statistician would accept the survey’s 100,000 excess death point estimate if they were told that nearly half of it was considered statistically unsignificant.” The point is that it’s just plain wrong to say that “nearly half of it was considered statistically insignificant”. The 100,000 number is significant _on its own terms_. Again, that number is _not_ arrived at by calculating the increase in violent deaths, and then calculating the increase in non-violent deaths, and then adding them together. If that was how it was arrived at, it would indeed be a mistake to get the 100,000 number, for just the reasons you cite. But the 100,000 estimate stands or falls by itself, independently of any results for subpopulations.

Note that if your subtractive reasoning made sense, then we could presumably break that 100,000 number into a very large number of statistically-unmeasurable sub-sub-subpopulations (“persons aged 47 who died of a heart attack between 3pm and 4pm on a Thursday”); there would be nothing statistically significant to be said about any of those sub-sub-subpopulations, but every excess death would have been put into one of them, and so it would follow (by your reasoning) that there is nothing significant to be found about the whole! But I hope that it’s clear how such reasoning is going astray. Well, what doesn’t work for 100,000 subpopulations doesn’t necessarily work for 2, either. That subtractive reasoning may seem intuitively fine, but it is not correct, because going down to subpopulations may reduce the sample size such that the resulting statistics do not yield significance, even when the claims about the whole population are significant.

Someone tried much earlier to explain this in terms of an example of surveying Bush supporters, where you get 1 man and 2 women one time and 2 men and 1 woman the other time, and I think that the explanation there was pretty much on the money.

As for Tim’s earlier defense of the 40,000 figure, I think you’re misconstruing what was going on there. His arguments were only meant to show that such a number was basically _plausible_ — not that it was an iron-clad result of the study. (At least, that’s the case with the text you quoted.) And it seems like it would indeed have been an entirely plausible result — it was not a crazy possibility, after all. It just turns out not to have been the case. So, it was not a good critique of the Lancet I study to say, “40,000 more nonviolent deaths? Inconceivable!” (which is something that lots of people at the time were saying). There are many other values that would have been plausible as well; plausibility is, all things considered, a fairly low standard to meet, but some folks wanted to argue that Lancet I didn’t meet it, and they were incorrect in doing so.

I would also point out that this is perhaps not an entirely accurate thing for you to say at this point: “one of the most prominent and statistically knowledgable defenders of Lancet 1 not only accepts that the study recorded an increase in non-violent deaths, he makes arguments to bolster the plausibility of these increases.” The use of the present tense there is perhaps misleading, since I don’t think that Tim _now_ defends those numbers as the best estimates, in light of the greater evidence of Lancet II. (Certainly quoting text from 2005 does not show that he does so.)

“The point is that it’s just plain wrong to say that “nearly half of it was considered statistically insignificant”. The 100,000 number is significant on its own terms.”

Yes, the 100,000 number is significant on its own terms, Philosopher, but the composition of that figure is also significant, and the authors realize that, hence their efforts to apportion it into subsets.

Put yourself in my place, as a non-statistician. I dare say there are many more of me than there are of you. If you’re Les Roberts and associates, your goal is to change public perception and public opinion in relation to Iraq. Naturally, that means you have to appeal to the reasoning processes of non-statisticians, since we’re the overwhelming majority.

Well, you can’t convince us your study accurately reflects mortality in Iraq after regime change, if you’re not willing to break things down a bit beyond a gross, best estimate aggregate number of excess dead. This is, after all, a war-related study. If there wasn’t a war involved, Roberts and Co. probably wouldn’t have done a study. War generally means you have a significant number of dead from bullets and bombs. If a mortality study can’t reliably separate the ” bullets and bombs ” victims from those who died from peacetime causes, then non-statisticians like me tend to take a jaundiced view of the entire study.

“Note that if your subtractive reasoning made sense, then we could presumably break that 100,000 number into a very large number of statistically-unmeasurable sub-sub-subpopulations (“persons aged 47 who died of a heart attack between 3pm and 4pm on a Thursday”);……”

I think you’re getting a bit hyperbolic, Philosopher. In any event, myself and others criticized the subsets of death for their volatility and unreliability for extrapolation back in 2004. That didn’t stop defenders of the study from arguing that the data at the subset level made sense, probably was reasonably accurate, and was suitable for extrapolation, which was then debated ad nauseum.

“….there would be nothing statistically significant to be said about any of those sub-sub-subpopulations, but every excess death would have been put into one of them, and so it would follow (by your reasoning) that there is nothing significant to be found about the whole!”

I don’t follow that logic, Philosopher. If you can break numbers down into minute subsets, you can reconstitute them upward again. Nothing changes. You aren’t changing the data used to break things down further, and vice versa.

“As for Tim’s earlier defense of the 40,000 figure, I think you’re misconstruing what was going on there. His arguments were only meant to show that such a number was basically plausible—not that it was an iron-clad result of the study. (At least, that’s the case with the text you quoted).”

I’m not misconstruing, and your parenthesis clad qualification kicks in here. I have many other examples of Tim arguing that the non-violent subsets reflect an increase in non-violent mortality, post-invasion.

“And it seems like it would indeed have been an entirely plausible result—it was not a crazy possibility, after all. It just turns out not to have been the case.”

Then Tim has it wrong, although I don’t believe he has, at least not when considering the implication of the first Lancet study’s data. It’s much easier for the second study to assert an unequivocal statistical no effect for non-violent excess mortality. It extrapolated a similar number of non-violent excess deaths, for a period more than twice as long as that covered by Lancet 1.

“The use of the present tense there is perhaps misleading, since I don’t think that Tim now defends those numbers as the best estimates, in light of the greater evidence of Lancet II. (Certainly quoting text from 2005 does not show that he does so.)”

That’s cheating, Philosopher. We don’t get to hop in a time capsule to revise earlier opinions and arguments, especially when Tim didn’t know if there would be a ” Lancet 2″ when he voiced these opinions and arguments. I doubt Tim was a less qualified statistician then than he is now. If your argument is as statistically sound and statistically obvious as you assert, I find it incomprehensible that Tim would behave in such an apostate fashion.

I think it is quite common for statisticians to get carried away with the validity of subgroups in a study, and to act as if the point estimates of the subgroups are just as accurate as the main conclusion, even though this is not really the case. It`s a common example of scientists taking an overly positive view of their own work.

That said, I don`t know exactly how the research team described these issues, so I won`t comment anymore on it. I would expect that in the first study, given the width of the confidence interval, the subgroups are non-significant. But bear in mind that this is purely a statistical artifact, once the overall figure is significant. To have a statistically significant point estimate of 100,000 excess deaths you must have some positive numbers in the subgruops of violent and non-violent deaths – you just can`t say with any precision what they are. The second study (2006) I think serves to improve this accuracy?

But the biggest fallacy critics of the methodology of the study (as opposed to its value in decision making) have made in this regard is to say it is a dodgy study because it is imprecise. This is not a valid criticism, especially when those in govt who criticise the study refuse to fund a better one, and the only alternative study performed asks a much vaguer question. You`re right to say it may not convince non-statisticians, but this is probably why it was published in a medical journal and not a newspaper. And regardless of how convinced you are of the accuracy of the subgroups, does it matter whether they died from cholera or a bullet? Either way they`re dead because of our war, we just don`t know exactly how many we killed directly.

“you can’t convince us your study accurately reflects mortality in Iraq after regime change, if you’re not willing to break things down” Well, then I’m afraid we’re back to the point that Quiggin and others have gotten to repeatedly on this thread: y’all just need to learn some statistics. There’s just _nothing_ wrong with defending a claim about a population without any breakdown into subpopulations. Nothing at all. If you can’t see this at this point, with all the explanations and analogies that various people have offered here, then I don’t really know what more to say. In any field, there are always going to be claims that look intuitive to a non-expert that are simply wrong — if there weren’t, then there’d be no need of experts! — and this idea that a claim about a population _must_ be re-interpretable into a claim about subpopulations is wrong in just this way.

“Then Tim has it wrong, although I don’t believe he has, at least not when considering the implication of the first Lancet study’s data.” No, he doesn’t — because he was only defending the plausibility of the claim, not its truth. He’s just saying that the number were _sensible_ at the time, not that they’re the way things had to be according to the study. Again, it’s just meant as a defense against a line of “No way!” type objections, to indicate that the numbers are not implausible on their face. One can make such an argument without being committed to the for-all-time truth of the numbers themselves.

And my objection about your citing a 2005 quote in the present tense isn’t not cheating at all — it’s directly relevant! The point is that he wasn’t defending those claims about the 40,000 as _true_, he was defending them as _plausible_; and it’s relevant to that point that he isn’t at all trying to bother with defending them now, when there is no dialectical need to.

“you can’t convince us your study accurately reflects mortality in Iraq after regime change, if you’re not willing to break things down” Well, then I’m afraid we’re back to the point that Quiggin and others have gotten to repeatedly on this thread: y’all just need to learn some statistics. There’s just nothing wrong with defending a claim about a population without any breakdown into subpopulations.”

And we’re back to the point that me and Brownie have gotten to repeatedly in this thread; population samples that can’t be defended on a breakdown basis of the main subpopulations of violent and non-violent shouldn’t be considered reliable and accurate

Beyond that, I agree precision can’t be expected without a huge sample size. But I expect a study to deliver reasonable precision on the former criterion, where all interviewees can reasonably be expected to know whether a loved one died violently or not, and where a death that isn’t violent has no other option besides non-violence.

Philosopher, you’ve made the argument that the 100,000 figure is statistically sound, and the 60,000 figure similarly so. Whether you view it as “subtractive” reasoning or not, the 40,000 from the 100,000 have nowhere else to go. It’s worth mentioning, in my view, that while the researchers had some difficulty placing some deaths in the appropriate subsets below violent and non-violent, they experienced no such difficulty determining whether a death was violent or not.

” And my objection about your citing a 2005 quote in the present tense isn’t not cheating at all—it’s directly relevant! The point is that he wasn’t defending those claims about the 40,000 as true, he was defending them as plausible; and it’s relevant to that point that he isn’t at all trying to bother with defending them now, when there is no dialectical need to.”

I disagree, Philosopher. First of all, you’re really splitting hairs between “plausible” and “true.”

Second, splitting hairs isn’t going to cut it anyway, as this second quote from Tim illustrates:

“Rummel then takes Iraqi health ministry figures showing 3,274 deaths in military and terrorist conflicts in six months and multiplies

(Rummel): ‘by 3 to get comparable time periods, which would mean about 9,822 civilians killed by comparison to lancet’s estimate of over 100,000; 38 percent due to the terrorists versus 4 percent for Lancet. Hmmmm.’

(Lambert):” This is, of course, comparing apples with oranges. The Lancet estimate of 100,000 is of excess deaths. As well as deaths in the conflict it includes the increase in murder, accidents and disease that followed the invasion. Furthermore, the health ministry numbers are guaranteed to be an underestimate, since not every death will be recorded by Iraqi hospitals.”

Tim Lambert – RJ Rummel vs. the Lancet study – Feb 13th 2005

I think that should settle this portion of our discussion, Philosopher. In addition, I’d like to remind you of my ” apostate ” reference to Tim in my earlier post. I don’t see why Tim would argue as he has on this issue if there was a well known statistical principle he was violating by taking such a position, especially since not all of the critics he took on were innumerates like me.

So I’ll ask you, in keeping with this passage from Tim, is it fair for you to suggest to myself and others that ” y’all just need to learn some statistics?”

“I think it is quite common for statisticians to get carried away with the validity of subgroups in a study, and to act as if the point estimates of the subgroups are just as accurate as the main conclusion, even though this is not really the case. It`s a common example of scientists taking an overly positive view of their own work.”

Thanks for that, SG. Naturally, I agree with you. I think you can understand then why I find it perplexing to be chided (albeit in a gentlemanly fashion) by Philosopher when I take on the subgroups myself.

“To have a statistically significant point estimate of 100,000 excess deaths you must have some positive numbers in the subgruops of violent and non-violent deaths – you just can`t say with any precision what they are.”

I agree again, on both counts. But there’s a difference between precision and rough numbers. If the study isn’t precise enough for even rough numbers for the two top subgroups, then the study should refrain from providing any figures for same. The study authors evidently decided Lancet 1 was capable of extapolating rough estimates for the two main subgroups. On the other count, I’m not questioning the fact that Lancet 1 was absolutely correct in measuring an increase in the relative risk.

“But the biggest fallacy critics of the methodology of the study (as opposed to its value in decision making) have made in this regard is to say it is a dodgy study because it is imprecise.”

I don’t think it’s ” dodgy ” either, SG, and I know many critics have said it is, or worse. I don’t believe for a minute there was any fraud involved in the data gathering process, and in the extremely remote chance that some of the field interviewers fabricated anything (again, I don’t see it), Roberts and the rest of his study leads would be unaware and blameless.

” And regardless of how convinced you are of the accuracy of the subgroups, does it matter whether they died from cholera or a bullet? Either way they`re dead because of our war, we just don`t know exactly how many we killed directly.”

No, it wouldn’t matter how they died, SG. “How many,” matters to me even more, but I believe a study needs to convince me they’ve gotten the “how” reasonably close, at the top of the excess death chain, above bullets and cholera, before I can be convinced of the ” how many.” This is a methodology that could easily contain 10,000 extrapolated excess deaths, derived solely from a single lightning strike into a farm field, killing 4 or 5 workers. As I’ve said many times before, going back to debates over Lancet 1, the methodology is statistically sound, my disagreement lies with the methodolgy’s ability to give us an accurate picture of both the ” how ” and the ” how many ” in a place like Iraq.

It isn’t enough for me to be told ” they’re right that death has increased, why won’t you accept the ” how many ” they provide?

I can agree with the former, without agreeing they’ve got the latter right.

“I don`t pretend to claim that this post refers to any comments of yours or not (and I`m certainly not going to drag my heels through 226 posts looking for examples of who used “gut” reasoning!)”

No? Well why not? Suck it up, man! :>)

“I certainly agree with your suspicion that the US and Iraqi governments might be selectively releasing figures. After all, the US govt doesn`t do body counts. Also suppose that a casualty count of, say, 30 dead and 45 injured on one day gets in the news – if 5 days later 35 injured die of their burns, this doesn`t hit the news at all. The media only count the deaths that occur at the moment.”

Good point on the count of those who die later from their injuries, although one of the Iraqi goverment-released figures of deaths following each month are a combined morgue-hospital report count, if I’m not mistaken. I think some of the deaths which occur days after would be captured by the hospital count, although I’ve no way of knowing for certain.

I thing someone in the Pentagon let it slip very early in the conflict that the U.S. military was keeping a count of Iraqi deaths. Their numbers have never been made public, if in fact the Pentagon is counting.

“As for the possibility that journalists are not in remote areas but hear about all the deaths there because of Iraqi activism, this seems to me to be unlikely (please forgive me if I misunderstand your post). I admit that this is an argument from incredulity, but we all know that modern media operate by waiting for press releases, and I can`t imagine that people in war-torn regions of rural Iraq are very much able to get those press-releases to the media; and even if they could, they would be dismissed by western journalists as `not credible`.”

I think the method of reporting is a function of the perceived danger posed to reporters, particularly western ones, although I’m not sure that it matters why the media there relies on briefings. I don’t think much of the western media likes George Bush or agrees with regime change. I’d be very surprised if the media would withhold reports of large scale violence in rural backwaters of Iraq, if they were occurring. I think many major media entities have an agenda, and that agenda is to undermine support for a continuing American presence in Iraq. Minimizing violence is counterproductive to that agenda.

I think it’s also important to consider scale when assessing how much violence is being missed outside Baghdad. Supporters of Lancet 2’s extremely high violent death estimate point to the 2,500 – 3,000 a month death toll that has been the norm in Baghdad since early 2006 as corroboration. Most seem to accept that the 2,500 – 3000 is accurate, and helps press their case in relation to the overall huge Lancet 2 excess death estimate.

The final period covered by Lancet 2’s survey (June 05 – June 06) claims an average monthly toll from violence of about 22,000! Critics and supporters of the war alike seem to be in agreement that the worst of the violence is far and away in Baghdad, Anbar, and the Sunni triangle. You have to assume many thousands of dead every month in areas believed to be relatively quiet, to get anywhere near this 22,000 per month average.

“These people also argue that the insurgents know this and the US army doesn`t. But if the US army has even an inkling of this philosophy behind their “plan” (or lack thereof) in Iraq, it is in their interests to discredit reports of air-strikes. So when a precision bomb precisely kills 15 kids in woop-woop-stan, if that report gets to the media unconfirmed by US troops on the ground, or by “independent” journalists at the school, the US army will immediately declare it to be a lie.”

As I mentioned in my earlier post, some of these incidents are getting reported, and not only by the decedents’ relatives. I’ve seen Iraqi officials bring claims of non-combatant deaths (from air strikes and otherwise) forward to the media as well. I agree we don’t know what percentage are getting into the media, but that goes back to the entire debate over all casualties and the accuracy of government and U.S. military released figures.

“I think we have seen evidence of this where they blew up a wedding on the Syrian border and claimed it was an insurgent hideout, until al Jazeera produced the wedding video, or photos, or some such.”

I also recall a foreign insurgent being interviewed early on in the conflict. He talked about getting into the country via the Syrian border, and how U.S. air strikes had been successful in partially interdicting the flow of foreign jihadists into Iraq from Syria. He freely admitted that whenever such an attack resulted in dead fighters, the insurgency would propagandize the deaths as non-combatants to any media willing to listen. I may still have the article saved on my hard drive. I’ll have a look for it.

“I could say, after years of living in a country with violent police, the wife of your example considers that sort of behaviour to be normal, and considers the response to be normal, not war-related.”

Yes, but we have to consider probabilities. All scenarios aren’t equally plausible. We also don’t know how many homicide deaths the study authors might be willing to assign to the same time frame as the UNDP study. Some of the overall extrapolated homicide deaths will fall into the 6 months after the UNDP study stops, so even if the UNDP study captured no deaths from homicide, the fact that the Lancet did is somewhat diluted by this difference.

“I think that the Lancet is better for its purpose because a) it confirmed 80% of the deaths through certificates; b) it ruled out deaths of soldiers (as opposed to insurgents); c) the question it asked (did someone die) is directly related to the research question (how many people died).”

I’m not that hung up on the death certificates issue, SG. I don’t think many interviewees would fabricate a death. But not capturing the deaths of soldiers makes it less likely to claim corroboration with the UNDP war related death figure, if the UNDP did capture soldier fatalities.

“The ICLS question (how many people died in war) is maybe only broadly related to the study aim…”

You’re right about that.

“So again, I maintain that did someone in this house die a war-related death is only part of the question how has Iraqi death rate changed since the invasion. Therefore, we cannot expect comparable rates between the surveys.”

Without regurgitating my entire argument, I disagree to some extent, SG. Obviously, ” war-related ” death is only a part of the overall mortality picture, since we can safely assume ” war-related” will consist almost exclusively of violent deaths. So I agree with you there, that the death rate has other elements affecting it. Still, I think you can give the Lancet every possible concession (ie, compare it’s lowest possible violent death extrapolated figure with the highest possible UNDP war-related number, exclude all homicide and soldier deaths from the UNDP figure, etc), and there’s still a sizable gap between the bottom lines for both studies in terms of violent death.

Thanks for the advice concerning statistics studies. That’s two now, you and Kevin. I think you guys speak from experience, and I’d be wise to benefit from that.

Hope all of this makes sense, SG. If not, I blame some of it on the fact that it was a rush job. I’ve got to get myself off to work shortly.

“First of all, you’re really splitting hairs between “plausible” and “true.”” I’m sorry, but that’s just wrong. This of course happens to us philosophers all the time — a distinction that we think is important is poo-pooed by others. But arguing that something is plausible is very, very different from arguing that it is, in fact, true. In particular, one can argue for the plausibility of a claim without being at all committed to the claim’s being actually true. And that’s just what’s going on with the bits of Tim’s text that you keep quoting. (For example, I’d be happy to defend the claim that it is _plausible_ that Iraq could have a stable democracy in 15 years; but I would certainly not be willing to argue right now for the claim that it is _true_ that Iraq will have such a government.)

I also don’t see anything in the second Lambert quote that is at all regrettable or now-looks-false-in-the-light-of-Lancet-II. What is supposed to be wrong about what he says there?

“Whether you view it as “subtractive” reasoning or not, the 40,000 from the 100,000 have nowhere else to go.”
Viewing the deaths as having to “go” somewhere is, again, to commit the kind of subtractive reasoning that I just criticized. To think that they have to “go” somewhere is to think that htey already had to “be” somewhere in the original 100,000 number. But that’s just the mistake of subtractive reasoning. You can either defend the form of reasoning as cogent, or you can stop doing it and find a new argument. But please don’t just state it over and over again.

“It’s worth mentioning, in my view, that while the researchers had some difficulty placing some deaths in the appropriate subsets below violent and non-violent, they experienced no such difficulty determining whether a death was violent or not.” What they didn’t have any trouble categorizing was the deaths that they recoreded. The issue about statistical significance has to do with the projection out from those recorded deaths. You could be completely certain about the ones you’ve observed, while not being sure how and to what extent they reflect the population on the whole. Again, the case of the male & female Bush supporters illustrates the relevant principle here nicely. (This also applies to this statement that you make: “But I expect a study to deliver reasonable precision on the former criterion, where all interviewees can reasonably be expected to know whether a loved one died violently or not, and where a death that isn’t violent has no other option besides non-violence.” Again, accuracy in coding the _sample_ does not entail a corresponding accuracy in the projections up to the _population_.)

This exchange really crystallizes what’s been going on, even from the beginning of this whole thread:

me: “you can’t convince us your study accurately reflects mortality in Iraq after regime change, if you’re not willing to break things down” Well, then I’m afraid we’re back to the point that Quiggin and others have gotten to repeatedly on this thread: y’all just need to learn some statistics. There’s just nothing wrong with defending a claim about a population without any breakdown into subpopulations.”

you: “And we’re back to the point that me and Brownie have gotten to repeatedly in this thread; population samples that can’t be defended on a breakdown basis of the main subpopulations of violent and non-violent shouldn’t be considered reliable and accurate.”

But continuing to stamp up and down on this point, even after we’ve tried so many times to explain to you why it’s wrong, does not really constitute much of an _argument_ on y’all’s part. Several different people have tried, in several different ways, to explain to you how statistics just doesn’t support the claim that you make in that passage.

Which is why, after a while, it’s hard for various of your interlocutors to avoid pulling out the “you need to learn some statistics” card. To which you responded, “So I’ll ask you, in keeping with this passage from Tim, is it fair for you to suggest to myself and others that ” y’all just need to learn some statistics?”” At some point, I’m afraid that it is. The problem is that you keep wanting this stuff to make sense in purely intuitive terms, without recourse to the actual substance of statistical reasoning. And we’re tried with a great many different examples to make it as intuitive as we can. But at the end of the day, statistics is something that takes us beyond the intuitive, like any other specialized form of knowledge. And it is just not a legitimate demand on a specialist that he or she can make all his claims intuitive to you.

Suppose, for example, that someone didn’t know any scientific physics, and they said, “Hey, airplanes cannot possibly stay up in the air — metal is heavier than air, so it’ll fall right away.” And you can try to make the underlying physics, illustrations of the Bernoulli principle, etc. as intuitive as you can. But then suppose further that the person just said, “Well, that’s all very well, but you’re missing my point. The plane, see. It’s made of metal. And metal is heavier than air. So it’ll fall.” After enough cycles of this, you’ll want to say to the person, “hey, I’m sorry I can’t make you see how planes work, but _go and learn some physics_ and I think you’ll see for yourself.”

Note that this isn’t even the “hey, trust us, we know the statistics” argument (which is not always necessarily a bad argument, even if it is unavoidably condescending). The argument is “hey, you don’t even trust us — go out and learn the relevant statistics yourself, you can do it pretty quickly.”

“If the study isn’t precise enough for even rough numbers for the two top subgroups, then the study should refrain from providing any figures for same.” This is a case where it is _really_ important to keep in mind just where this was pubished. _The Lancet_ is going to presuppose a certain amount of statistical sophistication in its readership, as well as an understanding of the norms of scientific publication in this area. It is entirely standard for these sorts of studies in any other area to report things in just the manner that these particular studies have. If you find it misleading, then that is unfortunate, _but you are not part of that journal’s target readership_. As long as the authors are totally up front about what is or isn’t statistically significant, then there is absolutely nothing wrong in reporting the raw numbers. You’re trying to object to these particular studies, but in this particular argument you can only do so by attacking the scientific practices more generally.