This post contains some more notes on a reply to the badly flawed “Main Street Bias” paper.

In my previous post I showed that the MSB papers was wrong to claim that it was plausible that the unsampled regions was 10 times as large as the sampled region. In this post I look at their model. Their model is wrong because it assumes that there is no main street bias in the sampled region and because of this they massively overestimate any bias in the Lancet sampling.

Let’s start with a correct model of the situation. I’ve adopted their terminology where possible.

We have a population of size N divided into Ni people inside the survey space (Si) and No people outside the survey space. The death rate for people living in Si is Bi and Bo for people living in So. The overall death rate B is just the weighted average of Bi and Bo:

B = (Ni*Bi + No*Bo)/(Ni + No)

The bias introduced by using Bi instead of B is

R = Bi/B

if we let n = No/Ni and b = Bo/Bi, after a little algebra we find that

R = (1+n) / (1+n*b)

If we plot this function, you will see that to get significant bias
you must have both n significantly bigger than zero and b
significantly different from 1. Even in an extreme case where n and b
are both 2 (ie two-thirds of the houses are missed and the death rate
is twice as high in the unsampled region), the bias factor is only 0.6.

To get the large R=3 bias that the MSB authors propose, they need
implausibly extreme values for both n (10) and b (0.27) — that is,
the unsampled region is ten times as big and residents suffer just one
quarter the risk of violent death.

To get their implausible value for b, the MSB folks use their own
model. They have parameters, q which is the risk of violent death while
you are in Si divided by the risk for times when you are in So and
parameters fi, the fraction of time residents of Si are in Si and fo
the fraction of time residents of So are in So.

The formula they derive for R is equivalent to saying that

b = (q – (q-1)*fo) / (1 + (q-1)*fi)

To see what’s wrong here, look at their argument for a high value of q:

It is likely that the streets that define the samplable region Si are
sufficiently broad and well-paved for military convoys and patrols to
pass, are highly suitable for street-markets and concentrations of
people and are, therefore, prime targets for improvised explosive
devices, car bombs, sniper attacks, abductions, and drive-by
shootings. Given the extent and frequency of such attacks, a value of
q=5 is plausible.

Where do they think that the people at street markets and those forming concentrations of people come from? The people in the unsampled region have to go to markets as well and there is no reason to suppose that they spend less time there than people from the sampled region. This means that attacks on markets and concentrations of people produce no main street bias.

Let’s make this concrete with an example they reflects the pattern of violence that the MSB authors think leads to main street bias and has exactly the parameters in their model that the authors claim are plausible.

We have 3,000 people in So and 300 people in Si, so n = 10. We have 30 violent deaths occurring in So and 15 in Si, so q = 5. There is a market in Si where folks from So spend 1/14 of their time and people form Si spend 1/14 of their time in the market and 1/14 in So.
11 of the 15 violent deaths in Si happened in the market. This gives the “plausible” parameters used in the paper and their formula says that R=3.0.

Because the market draws people equally form Si and So, 1 of the deaths at the market was a resident in Si and the other 10 were from So. So residents of So suffered 40 violent deaths and Bo = 40/3000 = 1.3%. Residents of Si suffered 5 violent deaths so Bi = 5/300 = 1.7% and b = Bo/Bi = 0.8. Plugging n=10 and b=0.8 into my formula and we get R= 1.2. So in this example, despite a huge value of n and deaths tending to occur on main streets, the bias was negligible and the MSB model wrongly suggested that the bias was large.

The reason why their model gets it wrong is that it assumes that the risk is the same, on average, everywhere in Si. That means that even though Si residents only spend 1/14 of their time at the market, the model assumes that are exposed to the risk from the market 24 hours a day.

Related

Comments

Although it is only tangentially related to Tim’s post, here is something about the MSB paper which really pisses me off. Since I’m unlikely to get a more appropriate thread in which to sound off about it, I will do so now.

They never really define the survey space, Si, properly. Sure, it’s clear which households are in Si. But that’s not enough. We need to know which deaths took place in Si. If a bomb goes off outside the mosque it may kill people from Si, or from So, or both. But did they die in Si or in So? If in Si, why so?

They just don’t say what their criteria are for allocating deaths to the two zones. Until they address that, their “model” isn’t really a model at all. And it won’t do to take the usual line and say that Burnham should supply the answer to this question. Nobody – not even David Kane – contends that Burnham has, or ought to have, information about where deaths took place.

1) Thanks for posting your analysis. This is just the sort of way that science ought to make progress.

2) If one of the Johnson et al authors responds, would you be willing to post their response in the main post, so of “The authors reply ” option? Given the popularity of Deltoid, I think that they might respond if placed in the main post. I would not recommend that they bother with something in the comment thread, since only we true Lancet afficionados read the comments.

3) In previous discussions, you seemed to imply that there was something wrong with the math. I read this post as you saying: “The math is fine but the parameter values are wrong/misleading/implausible.” Is that a fair summary of your view?

nice post Tim. I think there is a “d” written, when you actually are talking about the variable “b”. i hope i ll have some more time to look at the details later.

3) In previous discussions, you seemed to imply that there was something wrong with the math. I read this post as you saying: “The math is fine but the parameter values are wrong/misleading/implausible.” Is that a fair summary of your view?

David, your attempts to trick people into comments, that you will then repeat all over the internet, are strange at best. childish actually is a better description.

the lack of substance in your comments is starting to become (another) trademark of David Kane.

DK, Tim is saying the model is crap. Specifically, the part of the MSB model that defines what Tim calls b, the ratio of death rates in the unsampled to sampled area. Getting the math right is trivial. Anyone can do algebra. It is the creation of the original equations that is at question.

Tim, was this necessary? Everyone but everyone knows that this is how one combines separate rates to get an overall crude mortality ra….. ooops. Hmmm. Well, on second thought perhaps one can’t be too careful after all.

Kevin, regarding your point in #1: I’m not sure that this is a problem.

The location of death is important in that it gives us some insight into the relative danger of the surveyed space, and the unsurveyed space: in other words, it helps us quantify that mysterious “q” variable. In that case, it seems like we only need to know how many people died, and whether or not the location of the attack falls inside or outside the boundaries of the survey space. We don’t actually need to know whether the victims are from Si or So, do we?

The whole scheme still seems very arcane, but as far as I can tell, the lack of detail on allocating deaths isn’t a problem.

Bruce: it seems like we only need to know how many people died, and whether or not the location of the attack falls inside or outside the boundaries of the survey space. (My emphasis)

That’s what I’m getting at. The MSB model divides Iraq into two zones: Si and So. Now suppose we are having an argument about q, or fi, or fo. We’re agreed about n so that’s not a problem. In fact let’s suppose, for the hell of, that we agree with Johnson et al. that n=10. We haven’t seen Tim’s critique of their maps. Clearly the values of the parameters q, fi and fo must depend to a considerable extent on where the mosques, markets, factories, offices and hospitals are. Maybe I argue that for the most part these things must be distributed in the same way the households are, with 91% of them in So, outside the survey space. But you say, no, a lot of these places will be located near main streets which says to you they are in Si. We can’t possibly have a sensible discussion, can we? We’re not just contending for different values of the parameters. We’re using two completely different models. My Si is nothing like your Si. Sure, it has the same households in it, because we’re singing off the same hymn-sheet where the main roads are concerned. So we’ve coloured in every single residential building in Iraq. The MSB paper tells us how to do that much. But our map doesn’t say which zone all the other stuff belongs to. So there’s a huge gap in our understanding of the model.

When we go to Johnson et al. to see what guidance they give us on this, we find the don’t give us any at all. So and Si are not defined. All the paper says is that the surveyable households are in Si and all other households are in So. How do we determine where the other buildings belong? Unless the shootings and bombings take place in residential areas we just don’t know which zone to allocate the deaths to.

It isn’t just a crappy model. It’s so poorly specified that it really doesn’t deserve to be called a model at all.

Apologies for dragging in my own pet gripe instead of addressing Tim’s post (which I need to think about). But what with the sock-puppets and all, it’s hard to keep discussions around here on topic anyway.

The death rate for people living in Si is Bi and Bo for people living in So.

Johnson et al (2008) does not define or use a death rate for everyone living in a particular area. They define death rates for people, whatever the location of their house of residence, when they are in a particular area. The paper makes this as clear as can be.

Probabilities of death for anyone present in Si or So are, respectively, qi and qo , regardless of the location of the households of these individuals.

So, your critique, as interesting as it may be, has nothing to do with the paper they published.

Now, you may feel that you have created a “correct model of the situation.” Good for you! Please write it up and submit it for publication. But you have not found a flaw in their model. You have created a model of your own. Nothing wrong with that, of course. Just don’t confuse yourself into thinking that “[t]heir formula is wrong.” It isn’t.

David, you don’t seem to have read my post very carefully. Johnson et al don’t define a death rate for people living in So. That’s why I defined it myself as Bo. They do implicitly use such a rate in their derivation of their formula and the derivation would be clearer if they had defined it and used it explicitly.

And if their formula is correct, how come it gives the wrong answer in the example I gave?

Without weighing in on the actual estimation issues, how so? T Lambert says

“The formula they derive for R is equivalent to saying that

b = (q – (q-1)*fo) / (1 + (q-1)*fi)”

I read the post as T Lambert saying “If you wanted to do the study correctly, you would do …; however in this study the authors choose to do … . This is implausible, because … is thought to be true. Now, here is an example which illustrates why I think this.” It’s obvious that T Lambert is making a claim concerning the work done in the paper. Surely, you see this?

The amusing thing about David Kane is he is soooooo…..superficial. He posts this

Johnson et al (2008) does not define or use a death rate for everyone living in a particular area. They define death rates for people, whatever the location of their house of residence, when they are in a particular area. The paper makes this as clear as can be.

Probabilities of death for anyone present in Si or So are, respectively, qi and qo , regardless of the location of the households of these individuals.

triumphantly, without realizing that it concedes the previous argument about the sex of victims. Remember for the ladies Kinder, Kueche, Kirche. For the guys Mosques, Markets and Main Streets.

Ok, who goes to the market, who goes to the mosque, who goes to the main street to shop if it is dangerous. Guys.

DK, qi and qo are not death rates. TL extends the previous analysis by calculating a general bias factor given varying death rates between the ni and no populations. The question then is what is that different death rate. This is where the MSB q’s and f’s come in. The MSB paper is essentially proposing a model for b(q,n). One could make up any sort of model for b, but would still obey the same equation for R(n,b). This is very clear in the description above.

Not to defend the MSB paper, but one could still imagine some sort of street bias–to the extent that death squads kill people in their homes, there might be a bias towards certain types of streets, though I don’t know which way that bias would work. That is, would death squads operate more on main streets or in remote side streets or would there be any particular pattern? And where do death squads pick up their victims –homes or streets? And there are cases of collateral damage, where people get killed in their homes because of fighting, though again I don’t know which way that possible bias would go.

And I don’t know how one would investigate any of this–you could try using IBC data, to the extent that it has info on such things, but then you have unknown forms of reporting bias to worry about (something that one of the sockpuppets dismissed in an earlier thread).

Something which still hasn’t been pointed out about these peoples’ work, to the best of my knowledge, is that they are not proving the existence of main street bias in the lancet papers. To do this, they would need to run a simulation on a map of an Iraqi city (or some model thereof) with chosen values of Bi, Bo etc. and show that the sampling method used would have led to significantly different values of the parameters of interest.

They haven’t done this (to the best of my knowledge). All they are doing is rehashing a theory on effects of sampling bias from another paper and taking as the basis for the bias their unsubstantiated assertion about the differences.

Why this failed to escape the peer reviewers is not exactly a mystery to me, but it is disappointing that such tediously derivative and uninformative work got published.

I see what Lambert is doing. His conclusions about his model are misleading – specifically over how he derives values for Bo and Bi. In his “concrete” example, he claims to use JPR parameter values (in an attempt to show that JPR exagerrates the bias), but he derives “b” from one of his own assumptions – namely that his hypothetical market “draws people equally form Si and So”.

In other words, he’s reincorporating some of the assumptions from his earlier Platonic market example, whilst removing “f” from the equation (which turns out to be convenient for this misleading exercise).

But in order to credibly derive b (in his model), one must still make realistic assumptions regarding the diffusion of people among zones. So Lambert misrepresents JPR when he asserts that his contrived market example “reflects the pattern of violence that the MSB authors think leads to main street bias”. Here is the offending/misleading bit from Lambert’s example:

There is a market in Si where folks from So spend 1/14 of their time and people form Si spend 1/14 of their time in the market and 1/14 in So.

What is the problem with Lambert’s crucial assumption that “the market draws people equally form Si and So”? It’s pretty obvious, so I won’t spell it out for you.

Bottom line: Lambert’s “b” model is no more than a sleight-of-hand used to bypass the earlier criticism which demolished his hypothetical market assumption.

Part 1 – Lambert claims JPR uses “absurd” parameter values. He suggests an alternative value of 2/15 for fo, which implies that the average Iraqi (including women, children and the elderly) spends only 3 hours out of each 24-hr day in their own home/zone (presumably sleeping), and spends the other 21 hours outside their zone. Since this is clearly ludicrous, Tim has said he’s redefining “f”, but he hasn’t explained how anyone could take his redefinition and arrive at the value of 2/16.

Part 2 – Tim claims, by adding two further “main” streets to a map provided by the MSB authors, that “their map and the unsampled area is 0. In their model, that means n=0 and there is no main street bias.” But this is totally misleading. The unsampled area is not 0 in “their map” – it’s 0 in Tim’s tiny portion of their map (and it would be even tinier if we dismissed Tim’s second “main” street, which is half the width of his first main street). Tim hasn’t even bothered to state his guess of the value of n for “their map” (as a whole) taking into consideration his one credible additional main street.

Part 3 – Lambert presents an alternative “b” model which he claims demonstrates how the MSB model exaggerates the bias. But it’s just another misleading exercise in which he claims to plug “exactly” the MSB parameter-values (or equivalents) into his own model. He does no such thing. He derives Bi and Bo from his own dubious hypothetical market assumptions which “loads” the result in precisely the way he wants.

Robert Shone, you can’t make an argument that people don’t go to markets, so yes, it is likely that people from all areas go to the markets to buy stuff. You can make an argument that if it gets dangerous to go out, people will send the guys in the family and not the kids and women, and they will not linger which makes their exposure smaller, e.g. decreases the so called bias.

Now there is one exception, the people who live at the back of the store IN the market, but markets themselves are off main streets, not on them (look at the Google maps).

What is missing here is the fact that abductions will not happen on main streets as a general rule, that drive bys and abductions are a lot easier on uncrowded streets, and that the actual killings take place in isolated areas. Add to that the fact that vast housing tracts in Iraq were controlled by sectarian militias that could easily grab their victims wherever and whenever they wanted and the so-called MSB vanishes.

As Eli has been saying lo these many threads (sg is getting it), to demonstrate a bias related to proximity to main streets, you have to show that there actually is one with the available data (IBC for example), not merely throw the thought out. Otherwise you are engaged in academic onanism, a pleasant occupation, damaging to your metal and social well-being.

Robert Shone, you can’t make an argument that people don’t go to markets…

That’s not the argument I was making. You’re starting to think in the right direction, but you don’t go far enough. What other factors make the assumptions behind Tim’s “concrete example” values for Bi and Bo unrealistic and unreasonable (and unrepresentative of the MSB authors’ parameter values, contrary to what Tim claimed)?

If there is such a bias, Robert, its magnitude and presence have to be confirmed before the debate over these equations becomes relevant. I suggest simulations. These haven’t been done, even though the authors of this paper are physicists well familiar with such methods. It hasn’t been done because it won’t prove what they want to prove.

If there is such a bias, Robert, its magnitude and presence have to be confirmed before the debate over these equations becomes relevant.

The presence of bias seems to be accepted by the Lancet authors, who claimed they used (unpublished) procedures “in an effort to reduce the selection bias that more busy streets would have”.

And presumably they wouldn’t have planned and used such (unpublished) procedures if they didn’t think the magnitude of the bias was significant enough to need reducing.

We might have a different situation if the L2 authors’ response to the MSB criticism had been: “Well, since no bias has been demonstrated, we didn’t include any procedures to reduce it”. But that wasn’t their response, because they’re not stupid.

Given the claim made by the L2 authors, there’s a burden of evidence upon them (to disclose the procedures) – regardless of speculations over the value of q in the case of Iraq.

Note also that it’s important to distinguish the relative risk from network-proximity to main streets (on the one hand) from the posited distortion present in the results of a survey (on the other). Both have been referred to as “bias”, which I think has led to some confused (and confusing) comments.

No Robert, there is no responsibility on the part of the original authors to run simulations to confirm the non-existence of a fantastical claim.

Spagat et al claim that there is main street bias in the published selection method of the L2 study. More than that, they claim a general phenomenon of “Main Street Bias” which they present a (not really) new formula to describe the consequences of. It is their responsibility to prove that this bias is present in someone else’s study, and its magnitude.

They could do this very easily with a set of simulations on either an existing or a theoretical map, in which they posit a range of values for the ratio of So to Si, and simulate, say, 1000 samples from the sampling method described by the L2 authors. They could even use a range of definitions of “main street”. Then they could give a range of values for q, b, etc. This would be a pretty trivial process for most physicists. They don’t.

Until they do, they haven’t shown that their putative new form of bias exists in this or any other study, nor have they given the conditions under which it could occur.

But it’s easier for them and you to claim without proof that it exists, and dispute the precise wording of the paper, than to do the leg-work which would show that under all reasonable definitions of “main street” the bias just doesn’t exist. Tim Lambert did the outline of a single simulation in his previous post, and a bunch of mendacious pricks spent 100 comments arguing about whether an obviously main street was or wasn’t a main street. Most decent physicists with Spagat’s training could probably whack out a simulation in the time it took me to read that odious thread.

This shows the level at which you denialists are operating, i.e. you are lazy (won’t do the simulations, would rather do the name-calling) and you are nasty.

For those so interested there is now a GIS map available at geofabrik.de of Iraq, with roads classified along the lines of trunk, primary, secondary, resiential etc.

You can also download for free quantum GIS (QGIS), and from there the fun begins. It has taken me about two days in total to get a half arsed version of Si vs So and I’m not even a geographer, a compsci or an epidemiologist.

But it’s easier for them and you to claim without proof that it exists, and dispute the precise wording of the paper, than to do the leg-work which would show that under all reasonable definitions of “main street” the bias just doesn’t exist. Tim Lambert did the outline of a single simulation in his previous post, and a bunch of mendacious pricks spent 100 comments arguing about whether an obviously main street was or wasn’t a main street. Most decent physicists with Spagat’s training could probably whack out a simulation in the time it took me to read that odious thread.

You are completely correct. This is exactly what I was arguing with LancetStudy before (s)he went off in a huff. There is absolutely no reason to believe that the academic navel gazing of the JPR paper correlates with anything in the real world.

Unless and until the authors of that paper can be bothered to show – through simulation as you say, or callibration or matching patterns of actual deaths as recorded somewhere, or a complete analysis of certain maps or something – that their pretty equations actually correspond to something in the real world, then there’s no reason to take what they say seriously.

Although, some cynical part of me can’t help think that they have done the simulations and have shown that their bias does not exist… But that’s just me being overly suspicious.

A little off-topic, but I thought this question would fit better in a Lancet/Iraq thread than in the open one. Anyway, does anyone (preferably meaning professional demographers or people capable of making informed guesses on this) think that the estimate of 740,000 Iraqi widows lends support to the L2 estimate of excess deaths by 2006, or are there too many uncertainties? I saw the article (link below) and started making guesses about how many might have been widowed from pre-2003 causes and in the normal course of events, but I don’t really know enough–other laypeople have done similar things.

I mean I’ve seen at least one other person who argued that this number of Iraqi widows supports a very high Iraqi war death toll, and I thought it might myself, but maybe people with some background in the field would say “No, there are too many uncertainties” or even “No, if anything, it supports a much lower death toll.”

By which I mean, we don’t know where that estimate came from, or the ages of the widows, or how long they’ve been widows, or the number of surviving children and their ages, or a host of other data elements that could be used to evaluate the estimate.

Since Tim is clearly away and enjoying himself, Eli would like to point to yet another chapter in that long running soap opera “The Incoherence of Denial”, where Spagat steps on his own line and shows that Main Streets were not the place not to be in Iraq, or as we say at Rabett Run, Back Alley Bias.

Yes, Eli is blog whoring, but he thought Tim would have picked this up by now.

Nice one thanks Eli. With his name on a respectable paper now, I wonder whether Spagat will ever acknowledge any fault or embarrassment over his “main street bias” puffery? Surely not soon, probably never, but it would be good to see.