Thursday, 24 June 2010

Die another day

In a recent post, I asked why the number of 'preventable' deaths seem to add up to more than the number of actual deaths. The answer, in a nutshell, is that you can only die once but your death can be prevented many times. In the comments, Carl V. Phillips explained how it works. I repeat his comments here in case you missed them, and because I have a feeling I'll be referring to them again in the future.

While I certainly agree with you that some or all of the "blame the victim" body count estimates are high, be careful about this criticism -- it is not legitimate. Diseases and deaths always have multiple component causes, all of which can legitimately be called the cause (which is to say a necessary -- not sufficient -- cause of that death or disease at the particular time).

So an individual may well die from smoking AND obesity AND eating junk food, and it is perfectly legitimate to say that had any one of these conditions been eliminated the death would not have occurred so soon. Someone who was killed by a drunk driver because a medical error prevented him from being saved in the hospital is a death due to alcohol use, motorized transport, and medical errors, so the causes add up to 3 for the one death. Thus, there is no reason to expect they would add up to the total. Indeed, they should add up to well more than the total if you have a rich enough list of causes.

I agree that this may not be how the man on the street interprets it, though I suspect if pushed that man would not actually be able to clearly state what he thinks it means. That is one of the problems with reciting raw scientific information to people who do not understand the science. Most people do not understand a relative risk statistic, but are barraged with them. But even descriptive statistics -- which most people probably think they understand at first blush -- are subtle. Nothing causes a fraction of a death -- it either causes it or not (see below). There is no obvious way to assign fractions.

As for bringing a death forward by merely one day, that is a fundamentally different question. And, yes, you could argue that a death that is accelerated by just one day by a particular cause should not be attributed to that cause for purposes of assessing public health statistics.

Wouldn't the word "factor" be more appropriate than "cause"? No. Cause is exactly the right word. It is the right word in the science of epidemiology (which is the source of this information) and is also the right word based on the usual intuitive definition. The latter, which is technically translated in epidemiology and most other sciences, is that in the absence of the particular influence, the particular outcome would not have occurred.

The word "factor" is one of those that often gets used because someone does not want to admit that they are making causal claims, even though that is exactly what they are doing. It doe not really mean anything. E.g., the phrase "risk factor", as used, has at least three or four different very distinct meanings, and therefore is worse than useless.

Following the above, it is easy to see why everything has multiple causes. Every death was caused by not only some disease, but also by birth of the individual in question, the evolution of humanity, the big bang, etc. This is part of why assigning fractions would not make sense, as noted above.

Angry Exile said:

And if a smoker goes outside the pub for a fag in winter and as a result of the alcohol passes out and subsequently carks it of hypothermia who gets to put their statistics up by one? ASH, Alcohol Concern or the Met Office?

If the individual would not have gone outside absent his smoking habit, then, yes, smoking caused his death, as did the alcohol that caused him to pass out, the weather that caused those to result in hypothermia, as well as, perhaps, his failure to put on a coat, his companion's failure to look for him after he disappeared, cutbacks on foot patrols by the police, and any number of other things.

The summary point is very simple: Everything has an infinite multitude of causes. For a particular outcome (e.g., death) we typically identify a particular set of them as the causes we are interested in intervening on (e.g., drug use, diet, medical tech), but there is nothing magical about that list. There is no reason to expect that those causes will not overlap in many cases, and once that list is made rich enough, overlap is inevitable.

30 comments:

Anonymous
said...

I have heard some rotten explanations in my time but this, from Carl V. Phillips, has to be one of the worst.

Yes it's true many diseases are multifactoral, but to then blithely state that all the concerned factors are the cause of someone's death is one hell of an assumption, and it's an assumption that conveniently covers a flaw in medical nanny statistics which are frequently produced by computers, and after torturing data until a suitable result is produced to justify a particular health scare being called for.

When causes of death are claimed one usually expects that the claimant has evidence of a causal chain as opposed to some result produced by wishful thinking.

John Gray, Board Member,The International Coalition Against Prohibitionwww.brusselsdeclaration.org

I hope you don't mind the long post but I thought it might be of interest:From Eysenck's book 'How Many People Does Smoking Actually Kill?(http://www.forces.org/Scientific_Portal/evidence_viewer.php?id=113)...Essentially, it assumes that (1) all risk factors are independent of each other, and (2) risk factors interact multiplicatively to produce cancer.

Independence of risk factors

Let us take just a few well-established risk factors for cancer: Smoking, genetic predisposition, drinking, poor diet, exposure to air pollution, stress. Imagine a person who is genetically predisposed to cancer, is stressed, smokes, drinks, has a poor diet, and is exposed to air pollution. If such a person were to die of lung cancer, his death would, on the Doll & Peto premise, be attributed to smoking to an extent following from the formula given above, i.e. a proportion (R-l)/R of the death of a smoker would be ascribed to exposure from a risk factor (smoking) with a-relative risk of R for the disease in question. But if we started with stress, rather than smoking, his death would be attributed to stress, equally to the extent of R. When there are several risk factors, how can we make any one responsible for mortality? Doll and Peto recognize that proceeding in this fashion, deaths due to smoking and deaths due to other factors, when added, may well exceed total deaths observed. Doll and Peto would justify their procedure as follows. Consider a single case where we have two risk factors, smoking and stress, and death only occurs in persons who are smokers and under stress. Under these circumstances, all the deaths would be due to smoking - in the sense that they would not have occurred had the people involved not smoked. But equally, all the deaths would be due to stress - in the sense that they would not have occurred had the people involved not been under sress. Thus cause X accounts for 100% of all deaths, and cause Y accounts for 100% of all deaths, and we have accounted for 200% of all deaths! This is Alice in Wonderland arithmetic, and is certainly not what most people would understand when told that all deaths are due to Smoking!

I think Eysenck may have been a little over generous to Doll and Peto in the light of this from the McTear vs ITL court case: (http://www.scotcourts.gov.uk/opinions/2005CSOH69.html)'[5.801] Reference was made in this context to Rothman and Greenland 1998. At p.13 the authors stated:"There is a tendency to think that the sum of the fractions of disease attributable to each of the causes of the disease should be 100%. For example, in their widely cited work, The Causes of Cancer, Doll and Peto (1981; Table 20) created a table giving their estimates of the fraction of all cancers caused by various agents; the total for the fractions was nearly 100%. Although they acknowledged that any case could be caused by more than one agent (which would mean that the attributable fractions would not sum to 100%), they referred to this situation as a 'difficulty' and an 'anomaly.' It is, however, neither a difficulty nor an anomaly, but simply a consequence of allowing for the fact that no event has a single agent as the cause. The fraction of disease that can be attributed to each of the causes of disease in all the causal mechanisms has no upper limit: For cancer or any disease, the upper limit of the total of the fraction of disease attributable to all the component causes of all the causal mechanisms that produce it is not 100% but infinity. Only the fraction of disease attributable to a single component cause cannot exceed 100%."'

1. Epidemiology is as much a science as accounting but with much harder to verify answers

2. In response to Angry Exil's comment he notes that *smoking* would be to blame, but not the existence of a smoking ban. By his own logic the existence of a smoking ban would be as much to blame as smoking.

No, Commons, not a joke rebuttal, for, even taking account of Tony's well adumbrated points, which I understand very well, so much rests on assumptions and not facts.

For example, even given that risk factors for cancer may have been well identified, it doesn't therefore follow that they must ALL, in actual fact as opposed to estimation, have played their part in someone's death.

Don't panic everyone. This is something that many doctoral students in epidemiology take a while to understand.

John Gray: There is no assumption involved. This follows immediately from the *definition*. I think I can help you understand if you think about what "cause" means and write down that definition. If you want to post it as a new comment I can go from there.

Robert: You are entirely right that the indoor smoking ban is also a cause of the frozen smokers' death. That is a good one to add to the list. As I said, there is an infinite multitude, so you have to pick which ones you are choosing to include. In the context in question, that is one that seems like it ought to be included (but that is art rather than science).

Tony's quote from ME2 (Rothman and Greenland) basically says most of the same thing I did in different words. The quote about Doll and Peto makes much the same point, but I like the observation that it may give them too much credit.

A few of the other points are not actually on-point. I will respond to the tangential claim that epidemiology is not a science by challenging you to offer a definition of science that includes most of the fields we normally call science but not epidemiology (yes, you can declare that science cannot study people or something like that, but stick to non-silly definitions). I think what you are trying to say is that epidemiology is quite often a badly done and even more badly interpreted science; I think if you review my writings you will find that I am one of the most in agreement with that point among people who have seriously studied epidemiology.

Closer to on-point, and thus somewhat confusing, is Tony's first bit of quote about how "risk factors" (a terrible term, btw) interact. The present discussion is not actually affected by any such assumptions, so it is a bit of a red herring.

John's point is based on a mistaken interpretation of something. No one is claiming that every "risk factor" (again, a terrible term -- I will interpret it in this case to mean "exposures that cause one or more cases of cancer in the population") is a cause of every cancer. Instead, multiple "risk factors" will be causes in every case. The confusion probably come in based on what is hinted at in the last sentence: Since we can only estimate causes of cancer probabilistically (for an actual real person who smoked and got lung cancer, we estimate that the smoking caused it with ~95% probability) we can never actually know whether the smoking was a cause for this person. But the conversation was in terms of hypothetical cases where we know the causes. In real life we never know the causes of some outcomes. So for 100 such cases of cancer, smoking caused 95 of them and the evolution of life on Earth caused all 100 of them. We will never actually know which 95 it is (and, indeed, that is only the best estimate -- there will likely be some random error). So to understand causation start with the hypothetical and try to forget the study estimates until it makes sense without them.

Speaking for myself, I think Dr. Phillips' comments have been delivered in the spirit of honest discussion, and he's shared knowledge based on his considerable expertise. Personally, I would like to take the opportunity of his posting here to learn something from him (for better or for worse), rather than unnecessarily chase him away with anger.

However, I do have some questions about Dr. Phillip's assertions--perhaps they're good questions or perhaps not.

-Dr. Phillip's explained how deaths are accounted for in epidemiology on a multi-factorial basis. However, I'm under the impression that when annual death statistics are compiled by groups like the CDC that they attribute each death to single ICD-9 diagnosis (e.g., heart disease, malignant neoplasm, etc.) So, where's the difference between the epidemiology and the reporting? I'm frankly skeptical of the idea that doctors and/or epidemiologists are diligently recording and reporting all applicable contributing causes to the death of someone who has AIDS, diabetes, and alcoholic liver disease, but ultimately dies of a heart attack, while also indicating that this person smoked 10 cigarettes a day for 5 years in their twenties.

-Is there a common epidemiological standard in a quality study for "significant risk". I saw a study today reported in the media regarding coffee that noted that a 40% increased risk was"significant". The degree of risk anyone is willing to accept is entirely subjective, of course, so is there a common epidemiological standard for significance(assuming the study is presumed to be completely scientifically sound in the first place)?

-There are commonly cited quotations regarding risk that state that a risk less than 3.0 or 4.0 isn't worth concern. Marcia Angel is attributed with one of these commonly cited quotes. However, my understanding is that the medical journal she was the editor of at the time cited risk levels lower than 3.0 or 4.0. If we are being misled by these commonly cited quotations, how?

-I'll be honest and say that despite the many explanations I've read regarding "confidence intervals", I don't really know what this is. Is there a layman's explanation or analogy you can refer me to? I know that 95% is an "accepted standard", but is there a hard and fast standard? Is there a definitive "rule book" for epidemiologists?

Without going into great explanation, I've encountered situations in the writing world where the word "standard" is thrown about arbitrarily. Properly applied a "standard" is a rule, like "16 ounces makes a pound". "Commonly accepted" is not a standard, but a completely subjective interpretation of what's considered fashionable and/or acceptable.

"infinite multitude". Tautology and non-mathematical language (at least I hope it is!). Exactly what I'd expect from yer common or garden fake scientist today. Just what do they teach in those schools.

Don't shoot the messenger. Carl is explaining the subject and this is helpful to the many people, including myself, who want to better understand it. My objection is to the use these statements are put. If the fractions sum to more than 1, then the claim that x causes y deaths is deliberately misleading, because the natural interpretation is that removing x would save y lives - or at least prolong them to around the average life expectancy.

Re death stat classification: This is one where they half deal with and half ignore the challenge of multiple causes. The restriction of causes of death to the specific physical condition that dispatched someone solves the problem of the death being caused by, say, smoking and heart attack -- smoking is not part of the list. However, they are still forced to gloss over the problem when a combination of diseases all contributed, such as COPD weakening someone so that he succumbed to pneumonia. Indeed, in a case where depression caused suicide which was carried out be crashing a car, it is not really clear how to categorize (which is part of the motive for putting each of those three into an "other" category). As you note, someone with AIDS, diabetes, and liver disease who dies of a heart attack very likely had their death caused by all four of those, but only one will be listed. Which one is chosen will mostly have to do with local culture, not science (i.e., in some settings the physician would always say AIDS, but in others he would not).

There is sometimes an attempt to list "secondary" causes (which might be diseases other than the one listed that also caused the death at that time). But you are quite right that this is fairly arbitrary in many cases, and thus the statistics are highly imperfect. And, in particular, almost all of them understate because some other category "stole" some of the deaths that they (also) caused.

Re: "significant risk" That particular phrase would generally be avoided by an epidemiologist because "significant" was long ago taken as technical jargon in the statistics that are used, so it is best to avoid its natural language meaning. As for whether there is a particular standard that defines whether something is ...let's say "substantial" (that is the word I tend to use), then no. Any such statement is meant to invoke common shared understanding of a vague concept, and is not scientifically defined.

-Re "risk less than X does not count" I have never traced them back, but I think most such claims might actually have their origin with someone who knows what they are talking about, and who was justifiably frustrated about the flaws in how epidemiology is generally done. The idea is that it is easy to intentionally or accidentally bias a true 1.0 association into a 1.5, but harder to make it a 3.0. The problem is that someone offers that up as a quip or maybe a very rough guideline and then others who want a recipe and a rule for doing science (physicians and people who think the are epidemiologists but really never learned science come to mind) -- which just does not work -- translate it into a simplistic rule. At that point it reaches a reporter or the public and becomes popular nonsense.

It is easy to see that this "rule" is absurd. Something that increases cardiovasclar disease risk by about 2 (say, smoking) is enormously important and can be very well estimated (though not as well as is sometimes claimed). For that matter, something that increases all CV risk by 10% is potentially very important. If we are talking about a rare disease, however, a 1.1 or even a 2.0 is much less important (from a practical perspective) and may be much harder to measure (because there are few cases to study).

The other problem with this rule of thumb is that it is entirely possible to cook results that are much bigger than 4.0. My inaugural post on my new blog (linked from the Chris's post) is about just such a case.

...I have to break this up to get it to post -- I guess I should have started my own blog entry!...

Re confidence intervals: Sadly, if you asked that question to 100 authors who had just published an analysis with confidence intervals in a health science journal, you would be lucky if one of them could answer it. So you are in good company... well, at least in *a lot* of company. A handful of the 100 might get the technical definition right, but the technical definition is basically useless for practical understanding.

The best way to understand a confidence interval is to think of it as an analog measure of about how great the potential for random sampling error is. A wide confidence interval means that there is a lot of uncertainty from this source (basically that it would have been quite easy to randomly draw different study subjects and thus produce a very different estimate); a narrow confidence interval means the opposite. It is important to note that this reflects only the random error from sample selection (or perhaps one should include other random factors that look like sample selection, like random disease manifestation, depending on who you ask -- it gets a bit tricky if you push on it). That is, if there is measurement error, confounding, various forms of intentional bias, etc. then the confidence interval can be quite narrow, but the errors can be huge.

So what do the actual numbers at the edges of the CI mean? Not much of importance. Here is the technical definition: The upper bound of the confidence interval is the number (less assume we are talking about an odds ratio (OR), just for concreteness) such that if the true odds ratio in the population were that, then the probability that a study would get an OR as low as the one actually reported or lower, due to random error alone, is 2.5%. The lower bound of the CI is defined similarly. Got that? I didn't think so. Most people have no intuition at all for what that means, including the rare ones who can accurately recite it.

We would love to know something like "the range that is 95% likely to include the true answer", but we can't know that using frequentist statistics (sorry -- that is the technical name for the statistical methods that dominate most of this research). Often we would so much like to know that that people mistakenly describe the 95% CI that way, which is totally wrong.

So what is the importance of that exact number that identifies the OR such that ...blah blah blah...2.5%...blah blah? Nothing really. It is arbitrary, as the "blah"s would tend to imply. There is no scientific rule. There is just an established practice that many people who do not understand it fetishize into having actual bright-line scientific meaning. The value of an accepted standard is simply that we are all using the same ruler. It does not matter how long a meter is, so long as we all use the same meter, and there is no magic about something being just over or just under a meter.

Anyone who suggests that there is some important difference between a point just inside the CI and a point just outside of it does not understand what they are talking about. The CI does what it does quite well -- provide an analogy measure using and arbitrary scale of how much random error potential there is for an estimate -- and nothing more.

And to pick up the point from the later comment re the counts being "deliberately misleading": I am not sure that is entirely fair. What we can say is that the counts are perhaps inevitably misleading if they are casually thrown about, because their meanings are rather more subtle than people thing. But there does not seem to be any better way to handle the challenge, so "deliberate" seems unfair as a blanket statement.

However, I will agree that in an area where people have political or similar motives, anything that is inherently potentially misleading will sometimes be used to intentionally mislead. Sometimes, though, I wonder if most people using these numbers in a misleading way are actually more clueless than manipulative. You often see statistics about comparative rankings ("the third leading cause of death") which are almost always completely absurd. The ranking depends not only on how the tricky problems we are talking about here are handled, but also on the aggregation choices which are clearly arbitrary. *Everything* is the second-leading cause of death if we simply define the category "all other causes of death" as the other entry on the list.

Anyway, confidence intervals very frequently mislead people (e.g., into thinking they mean the true answer must be somewhere in the interval), and counts also frequently mislead people. Neither was created to do so, but there is typically not a lot of effort to keep them from doing so.

I'd like to thank Dr Phillips for his detailed and patient explanation.But just for completeness I thought I would point out that correlation is not the same as causation. Unfortunately it seems to me that this golden rule is often ignored by both Epidemiologists and the media.To take a couple of absurd examples:People who never wear socks are thousands of times more likely to die of Malaria. And there is a similar correlation between wearing a bra and breast cancer.Despite being vaguely plausible as causes of 99% of cases, no Epidemiologist would dare to say so. And of course banning bras and making socks compulsory would make no difference to the rates of disease. However, Epidemiologists (or at least the media reports of their work) exercise no such discretion when they find other correlations.I'm sorry to say that I regard Epidemiology (or at least a large part of it) as pseudo-science. Not just because of the methodology but because when (genuinely scientific) intervention trials prove the claims wrong they still don't get withdrawn.

You asked me Carl to define “cause”. If you were expecting a simplistic response, then it would be impossible for me to deliver one for the question “what is cause” or “how do we define cause” is a philosophical question and it would require an essay of considerable length to deal with the matter satisfactorily.

However, no doubt your students might give you a answer along the lines of: “that’s easy, a cause is something which produces something or brings something about.” Having said that, nevertheless, much remains unanswered, for we need to know what characteristics C (cause) must have to bring about E (effect).

One factor that one expects in defining cause is that of temporal precedence, that C precedes E, but one must question whether or not it is true that whenever C precedes E it causes it. If I scratch my head and next thunder rolls in the distance, does the scratching of my head cause the thunder to roll? No, but in most cases the cause does precede the effect although there may not have to be a time lapse between C and E. I say “most cases” because if we consider something like gravity for example, when we throw a ball up in the air, it takes time for the ball to fall back to earth but it does not (as far as we know) take time for gravitation to be exerted upon it. If we want to be quite safe then, we do not say that the cause always precedes the effect but that the cause never comes after the effect.

Another factor is cause as necessary connection. The belief that when the cause C brings about the effect E, there is some necessary connection between C and E, thus when C occurs E MUST occur. This, of course, is not necessarily the case. Smoking may be a cause of lung cancer but smoking does not always bring lung cancer about.

Then there is the confusion of cause with logical necessity. But statements about causality are not logically necessary. “Friction causes heat” is not a logically necessary statement: it is logically possible that friction might have produced magnetic disturbances instead. It is only by empirical observation that we discover what causes what. We cannot, for example, get the answers we need as to what causes what by sitting in an armchair and working it out as we would with mathematics.

For the purposes of brevity, as I could write about the subject of cause at length, I shall consider the point about necessary cause that you made in your original post, Carl. I quote:

“Diseases and deaths always have multiple component causes, all of which can legitimately be called the cause (which is to say a necessary -- not sufficient -- cause of that death or disease at the particular time).”

What exactly, Carl, do you mean by “necessary cause”. Do you mean a necessary connection (outlined above), or, something I have not dealt with yet, namely, a “necessary condition”?

I shall first of all explain the difference between the two. When we say that C is a necessary condition for the occurrence of E we do not mean there is a “necessary connection” between C and E although we may sometimes say: ‘In order for E to occur, C must occur.” What we mean (or should mean) is simply the empirical fact that in the absence of C, E never occurs. Thus in the absence of oxygen we never have fire. So, I shall have to be critical, Carl, and ask you to be more specific here about what you mean by a “necessary cause” as your meaning is not clear. In addition, how does that square up with your view that: “There is no assumption involved. This follows immediately from the “definition”, for in the absence of C (smoking) E (cancer) may frequently occur. Thereofre, Carl, if by necessary cause you mean necessary condition, a necessary condition is not a cause.

Lastly, when you stated above that “diseases and deaths always have multiple causes” you also went on to say that you were talking about a necessary and not sufficient cause. Isn’t this muddled? If a death is brought about by a number of factors and, on you own admission, you say that you are talking about hypothetical cases where you know the causes, then aren’t you really and precisely talking about a sufficient condition and not simply necessary factors without which something cannot happen?

I, too, would like to make some comments on Carl’s posts. Some issues raised are critically important. I apologize from the outset as to the length of the post. I have tried to be as brief as possible – sometimes too brief. However, unless the greater context in which epidemiology occurs is considered, it is impossible to understand why epidemiology is the way it is. Given its length, I have posted my comment on the following thread where it is out of the way. Hope it is of some use.

I glanced at Dr. Phillip's considerate and much appreciated response earlier, and thanked him for it, but didn't didn't read it.

I've read it now. And it confirms what I've already believed about epidemiology.

If I and a hundred of neighbors suffer from food poisoning symptoms, one can survey the victims to the point that it's reasonable to suspect that the food poisoning was caused by the egg salad at the village fair. That's a very useful application of epidemiology as clue-finding.

But even that more easily understood scenario often proves not to reveal a true culprit. As anyone who's watched a police drama knows, even if considerable circumstantial evidence points to a culprit, it's often revealed that the accused party is innocent.

This doesn't only apply to fictional police dramas, but to real life as well. In fact, several prisoners have been freed from death row due to DNA analysis which has proved their innocence, despite the fact that they were found guilty by a jury of their peers.

This is not to say that I'm entirely dismissive of epidemiology anymore than I'm dismissive of evidence collected by police regarding a crime. If a man's wife is murdered, and he says that one-armed man broke into the home and killed her, it's reasonable to suspect in the absence of contradictory evidence that the husband killed her. If for no other reason, because it's probably 10 times more likely that a husband would murder his wife than it is that a home invader would do so.

Such facts shouldn't be ignored, but they're mostly academic, meaning that they're useful for clue finding, but not definitive truth.

For some reason, I've known so many people, even young and otherwise healthy people, in my life with Crohn's disease, diverticulitis or other serious intestinal conditions that, from my perspective, there must be a nationwide epidemic of such diseases. I've known more people with such diseases than any kind of cancer or other disease, and these people aren't related to me and seem to have no racial or social traits in common, and live many miles apart. As far as I can tell, simply being acquainted with me makes other people much more prone to intestinal disorders. Otherwise, it's a completely random sample.

I think I'm trying to say that epidemiology is probably best applied when a cause and effect relationship is already established, rather than using epidemiology as a means to establish cause and effect when the mechanisms are otherwise unknown.

For instance, we know that viruses and bacterium can cause disease, and the biology of this has been observed and applied with great success. I suspect that we can witness the effect of a virus attacking a cell on a microscope slide. So, we know.

But if I wanted to be a great pro baseball player, I probably wouldn't find it of great use to analyze the common characteristics of great baseball players. If I took those matters to heart, I'd have to conclude that I should be black (like Barry Bands or Reggie Jackson), obese (probably Bonds, but definitely Babe Ruth and Cecil Fielder), or an alcoholic (like Mickey Mantle), or over 6 feet 6 inches tall (like Dave Winfield and Dave Kingman) or have an unusual batting stance (like Ty Cobb or Rod Carew and others).

From epidemiology,all I can conclude is that you're "more likely" to die an early death if you drink and smoke a lot unless you're someone like Winston Churchill.

I can see how "more likely" is worth a word of warning, but it hardly seems to justify a quest to transform society. It certainly doesn't justify inserting purely academic findings into the public sphere as matters of policy.

Reading back my last reply, I know I'm not hitting at the heart of the point. I was essentially taking a long-winded approach to saying that what everyone here already knows: "correlation does not equal causation".

I also know that the origins of particular diseases have, in fact, been discovered through epidemiological research when the correlations were rather high. So, perhaps it's unreasonable to suggest that a causative mechanism should be known first.

On the other hand, I've seen high correlations that I'm entirely skeptical of, like childhood leukemia and foods containing nitrates like hot dogs. I've heard this finding criticized as "data dredging" meaning that it was found by digging through study findings to find any strong correlation without respect to the methodology.

I apologize for another long-winded post here, but only to save you the effort of clarifying a point I wasn't expressing very well. Which was:

Given your earlier post about inexact methods and standards, where and how can the wheat be separated from the chaff? I plead "guilty" to having a poor understanding of epidemiology as an applied science beyond a few basics. From what you wrote though, epidemiology seems like it is more of a "social science" than an applied science.

I don't mean to be dismissive of "social sciences" at all. To the contrary, I find them quite interesting myself. What I intensely dislike though (and I think you too) is the constant placement of what seem to be purely academic findings into the mainstream as propaganda for political causes. I don't blame the media, because I believe in transparency and free expression. Instead, I blame the researchers themselves and/or their masters, because they seem to be so preoccupied with publicizing the implications of their findings that they've apparently forgotten to put any rigorous or ethical standards into place.

From there, one can rightly say that people have some duty to inform themselves. Like me. You're an expert on epidemiology and I asked you about "confidence intervals" and you gave me an honest reply saying, essentially, that no one knows. Please don't misunderstand, I'm not critical of your forthright and welcome reply. To the contrary, I think we're on the same page. What I'm pointing out, though, is that even if I, as an interested member of the community take an active interest in the academic findings that form public policy, I will be unable to acquire that information unless I acquire the specialized knowledge of the academics themselves. Who, btw, can't seem to agree on the specialized knowledge themselves. It seems to me that this works as a worthy barrier for anyone who wants to acquire the knowledge, because they'll believe that they simply don't get it instead of realizing that they simply haven't been let in on the joke.

I read something by Richard Feynman once where he detailed his apprehensions regarding "social sciences". I wasn't able to find the text online, but I found this short video clip.

http://www.youtube.com/watch?v=IaO69CF5mbY

I don't want to suggest that heuristic, speculative or clue-finding approaches to science aren't worthwhile, or that data shouldn't be gathered, To the contrary, there would be little scientific progress without utilizing such approaches.

However, inexact science should be understood as such. Otherwise, my earlier baseball analogy would apply, simply because physically and mentally fit people tend to be better athletes. If we took this idea to the extreme, though, we would eliminate some of the greatest players of all time. Intangibles matter greatly, and their value tends to be a package deal.

Hi, everyone who is still reading this. I got too busy with some deadlines, so let this conversations drop. Rather than have it dragging on forever on Chris's blog, I am going to post a summarized version on my blog when I get a chance (probably two or three days from now). I will try to address all the points that have been raised in the comments. If there are continuing points of confusion or disagreement then, I will continue the conversation in the comments on that blog.

I can see both sides in this argument. On the one hand, simple correlations, no matter how carefully strained and titrated and distilled can never actually PROVE something is causal in the social or biological sciences because there is always at least some chance of an overlooked confounder being the true, and invisible, cause.

Some examples:

1) It's pretty widely believed that a certain bacterium causes certain types of pneumonia (at least I hope it is.) Why do we believe this? Simple: every time we find a person with a certain type of pneumonia we find billions of those li'l fellers running around in their system. And when we test people without pneumonia there are generally very few or none of them. Furthermore, we know that if we treat a patient with antibiotics that kill that particular organism we usually cure the patient.

So: we conclude causality.

BUT... we may be wrong. It may eventually turn out that some kind of as yet unknown thing called a NanoVirus is the real cause, and that there's one particular type of NanoVirus that happens to live within just one particular type of bacteria, and that dies when that bacteria is wiped out. Is that likely? I don't think so: I think we've arrived at the point in our understanding of at least some disease processes that we can be pretty sure it's the pneumococci itself causing the sickness.

But it MIGHT really be these little NanoViruses that no one's noticed yet!

A similar argument could be made from smoking and lung cancer. Heavy smoking seems to correlate with much higher lung cancer rates, biological carcinogenic theories support the causation idea because of certain elements common and concentrated in tobacco smoke, and nonsmokers get lung cancer much less often than smokers.

BUT... it's always possible that some other correlating factor that no one has noticed yet is the true cause: perhaps smokers take more showers than nonsmokers because they worry about being stinky (Antismokers would love THAT part of this explanation.) but as a side effect they breathe in far more droplets of asbestos-filled water vapor in the tub everyday.

Aha! Someone does a study on showering and LC and discovers both the above AND the fact that "Showerers" have an even HIGHER chance of getting LC than smokers do after the smoking confounder is removed!

OR... Someone may notice that smokers tend to open their mouths to inhale their smoke, and thus get used to engaging in much more mouth-breathing than nonsmokers, thereby bypassing the nasal defense system. They do a study and find that the real causative correlation lies with mouth breathing rather than smoking! And even better, they find they're able to explain an increase due to secondary smoke by the same mechanism in nonsmokers who don't like the aroma!

Do I think the above things are true? No. But are they *POSSIBLE* ? Yes.

And that's why we like to see the high RRs before claiming causality - because it's less likely that the "noise" of extraneous unknown correlated factors are the real cause without anyone ever having noticed them as a cross-correlated possibility. It's unlikely that smokers take 2,000% more showers than nonsmokers, but on the other hand it's quite possible that nonsmokers living/working with smokers might take 10% more showers than those who don't!

I respect you tremendously, and I'm grateful you took the time to contribute, but I feel compelled to keep my point from being confused with your point. It isn't my intent to hold up an impossible standard for epidemiology, but to ask legitimate questions about methodology.

I, like you would find the burden of proof for the examples you've provided to be over-the-top, and I certainly wouldn't hold legitimate inquiry to the impossible test of "proving a negative".

However, as I stated earlier, there has to be some kind of standard by which the wheat is separated from the chaff. If no standard exists for that, then it becomes a purely subjective and philosophical discussion as to what is "significant".

As I think you'll agree, something has been lost in this area, apparently for lack of rigor and well-principled and established standards. If even the concept of "confidence interval" isn't properly understood within the profession itself, then there is no way that the public can evaluate such findings when they become a matter of public policy. If we simply leave it to the experts without inquiring, then we live in a technocracy.

I know you're well aware of what I'm saying, but I'm articulating it to avoid confusion. I've written it with the intent of conveying a friendly and respectful spirit.

WS, I agree fully with you for the need for reasonable standards. I was not arguing that one should withhold judgement on something like smoking on lung cancer purely on the basis of the extremely small possibility that there are unknown correlative variables. The very high RR involved and the seemingly very consistent and strong significant findings in most studies are persuasive enough for me.

And I've also been concerned about the lack of understanding regarding confidence intervals. I'm not very knowledgeable in the deeper waters of statistics, but I do understand and have tried to impress upon others the limitations of what the CI indicates.

My intent wasn't really to move the discussion off where it was going but simply to add a related observation.

"My intent wasn't really to move the discussion off where it was going but simply to add a related observation."

Understood. It's always great to hear from you, Michael.

I'm honestly being a bit over-cautious regarding specifics here because I don't get the opportunity very often to ask someone with Dr. Phillips' credentials these questions, so I'm trying to maximize the opportunity while I have his attention.

About Me

Writer and researcher at the Institute of Economic Affairs. Blogging in a personal capacity.
Author of Selfishness, Greed and Capitalism (2015), The Art of Suppression (2011), The Spirit Level Delusion (2010) and Velvet Glove, Iron Fist (2009).

"Of all tyrannies, a tyranny exercised for the good of its victims may be the most oppressive. It may be better to live under robber barons than under omnipotent moral busybodies. The robber baron's cruelty may sometimes sleep, his cupidity may at some point be satiated; but those who torment us for our own good will torment us without end, for they do so with the approval of their own conscience."