The recent publication in EPIDEMIOLOGY of a graph about semen quality over time [1] - data that were somehow buried in a governmental report in Denmark - again raises the much-debated point of public access to data [2, 3, 4].

The mere fact of questioning a policy of public access to data, seems like being ‘against motherhood and world peace’. Isn’t it true that “Science is about debates on findings,” “Science serves people, and people (taxpayers) paid for it,” and “Expensive research data should become available to others”? Yet, the issues are more complex than the simple idea that ultimately we will all benefit from open access to data.

Firstly, what is meant by ‘data’? The original unprocessed MRI scans, blood, tissue, questionnaires? Or the processed data – determinations on blood, coded questionnaires? The cleaned data - with the possibility that the authors already have ‘massaged’ inconveniences? The analysis files – in which the authors have extensively repartitioned and recoded the data (another round of subjective choices)? Data should be without personal identifiers – of course – but in our digital age people can be identified by combinations of seemingly innocent bits of information. And, finally, should all discarded analyses, or discarded data, also become publicly available – to check what the authors ‘threw way’ and whether their action was ‘legitimate’?

Secondly, to what extent is the public as the taxpayer, or any organization that pays for the research, really the full owner of the data? Data exist because of ideas about how to collect and organize them. There is intellectual content, not just by the researchers, but also by their research surroundings, their departments, universities, and governmental organizations that make research intellectually possible. Data in themselves are not science. Giving your data to someone else is not an act of scientific communication. Science exists in reducing data according to a vision - some of which may develop during data analysis. Should researchers not have a grace period for the data they collected, or perhaps two: first a period in which they are the sole analysts, and then a period in which they share data only on conditions?

Thirdly, how protective can a researcher remain about her data? Should a researcher have the right to deny access to her data to particular other parties? Richard Smith, the former editor of the BMJ, stated in his blog that denying access is a wrong strategy – why fear open debate, it will only lead to better analyses? In his opinion, one should not deny data access even to the Tobacco Industry [5].

Reality is different: researchers know that when a party with huge financial interests wants access to data, there are three scenarios.

Scenario 1: they search and find some error somewhere in the data. This is always possible –no data are error-proof. The financially interested party will start a huge spin-doctoring campaign, proclaiming loudly in the media that the data are terrible. Remember the discussions on the climate reports?

Scenario 2: another analyst is hired by the interested party, and comes to the opposite conclusion. This is published with a lot of brouhaha. The original researcher writes a polite letter to the editor, explaining why the reanalysis was wrong. The hired analyst retorts by stating that it is the original analysis which was in error. Soon, only the handful of people who really know the data can still follow the argument. That is the signal for a new wave of spin-doctoring, in which medical doctors give industry-paid lectures stating that “even the experts do not know any more; we poor consumers should use common sense; most likely, nothing is the matter”. I witnessed this scenario in a controversy on adverse effects of oral contraceptives. A class action suit was deemed unacceptable by a UK court because, in a meta-analysis in which two competing analyses of the same data were entered (!!), the relative risk was 1.7. This number fell short of the magical 2.0, which is wrongly held by many courts as proof that there is ‘more than 50% chance’ that the product caused the adverse effect [6]. Without studies and reanalyses directly sponsored by the industry, the overall relative risk was well over 2.0 [7]. This was money well spent by the companies!

Scenario 1 and 2 have a name: “Doubt is our product” as it was originally coined by the tobacco industry: it is not necessary to prove that the research incriminating your product is wrong – nor that the company is right – it suffices to sow doubt. [8]

Scenario 3 is that the financially interested party subpoenas the researcher to testify over all parts of allegedly questionable aspects of the data in court. Detail upon detail is demanded. The researchers lose months (if not years) of research and their personal life. That scenario was played out against epidemiologists who did not find particular adverse effects of silicone breast implants [9]. It is recently feared again as the next strategy by the tobacco industry in the UK [10].

Advocates of making data publicly available seem to live in an ideal dream world, in which for every Professor A whose PhD students always publish A, there exists a Professor B whose PhD students publish B. Such schools of thought combat each other scientifically with more or less equal weapons. Other scientists watch this contest and make up their mind as to who has the strongest arguments and data. This type of ‘normal science’ disappears when strong financial incentives exist. Then the weapons are no longer scientific publications, but public relations agents and lawyers. Of course, also in ‘normal science’, there are rivalries that can be strong. It happens that researchers do not want to share their complete data, or only part of the data under conditions. Often this is for the very simple reason that some sources of data, like blood samples, are finite.

Calls for making data publicly available need to take into account these scenarios. Some people hope that open information in the long run provides the ‘real’ truth. But in a shorter timescale, open information may also allow mischief by special interests, with plentiful resources, that are ruthless in their attempts to shape public policy. It seems difficult to ‘experiment’, i.e. to try open access to data for some time and then turn it back when the drawbacks seem too great.

An intermediary solution might be much more easy to implement. Tim Lash and I, following ideas of others, have proposed to make public registries of existing data [11]. This would make it possible to start negotiating with the owners of the data about possible re-use. Such a registry might also facilitate the use of data in ways that were not originally planned. If controversy and distrust complicates the picture, trusted third parties can be sought to organize a reanalysis, with public input possible – a strategy recently proposed by a medical device maker [12].

In short, public access to data is much more complex than the proclamation of some principles that look so wonderfully scientific that nobody can argue against them.

Commentaries about this topic are greatly welcome. They can be published a full guest blog of about 450 words maximum. Please mail to Epidemiologyblog@gmail.com

Scientists often portray themselves as the noble but hapless victims of sensationalism and exaggeration in the popular media [1]. But are scientists in fact sometimes complicit in these abuses, hyping their work in media interviews, making claims that would not survive peer review in the published articles? If so, this constitutes an important ethical violation that deserves further scrutiny, since communication with the public is at least as socially consequential as communication between scientists. Public opinion plays a long-term role in funding levels for competing research programs, for example, which makes exaggeration in news stories a serious abuse of the power granted to the scientist by a credulous and trusting media and public.

Here’s one example I came across recently which may fit this description. Nature Genetics published a meta-analysis by Dara Torgerson and colleagues in their September issue. The authors pooled North American genome-wide association studies of asthma that included over five thousand cases, including individuals of European, African and Latino ancestry [2]. They reported a number of susceptibility loci, most of which showed similar associations across ethnic populations and had been previously described. But one variant was novel, and the association was described as being specific to individuals of African descent. Table 2 of the paper reported a SNP near the gene PYHIN1 on chromosome 1 with an odds ratio (OR) among African Americans and Afro-Caribbeans of 1.34 (95% CI: 1.19-1.49). In a replication data set, this association remained substantial (OR=1.23), although at a slightly different locus. For European Americans, the corresponding association for this SNP was reported as “NA”, which a footnote defined as “not available (the SNP was not polymorphic).” As noted by the authors, this finding is potentially interesting and important because of the substantial racial/ethnic disparity in asthma prevalence in the US (7.7% in European Americans versus 12.5% in African Americans).

Although the main text of the paper reports only the odds ratios and their confidence intervals, Table 1 on page 18 of the electronic supplement details the allele frequencies by group. Surprisingly, it is the minor allele, which was not observed in European Americans, that is associated with lower risk. The major allele had reported prevalences of 77.0% and 71.9% in African-origin cases and controls, respectively. There is no association in European Americans because 100% have the major allele. If this SNP is taken to be causal, therefore, the pattern for this variant would be opposite of the observed disease phenotype prevalences, with 100% of European Americans having the high risk variant. Under the more likely interpretation that the SNP is a marker in linkage disequilibrium with a causal variant in the gene PYHIN1, however, the data have nothing at all to say about PYHIN1 and asthma in European Americans. The authors would have a basis to consider the unknown variation in PYHIN1 as explaining some cases of asthma within the African-origin population, but no claim to this being relevant in any way to racial/ethnic disparities. European Americans might have more or less of the high risk version of this gene; the data are completely silent on this issue.

It came as a surprise, therefore, to see the news reporting on this publication. For example, the Reuters story published on July 31st began "U.S. researchers have discovered a genetic mutation unique to African Americans that could help explain why blacks are so susceptible to asthma." [3] The story seemed to portray the SNP as the causal variant itself:

"But because the study was so large and ethnically diverse...it enabled the researchers to find this new gene variant that exists only in African Americans and African Caribbeans. This new variant, located in a gene called PYHIN1, is part of a family of genes linked with the body's response to viral infections, Ober said. "We were very excited when we realized it doesn't exist in Europe," she said."

How can one make sense of this text in relation to the published paper? If the reported SNP is by some great stroke of luck the causal variant itself, then it cannot explain the observed racial/ethnic disparity since it would lower risk in some blacks in relation to whites. If, on the other hand, the SNP is merely a marker for a causal variant somewhere nearby, presumably in PYHIN1, then it is nonsense to say of this unknown variant that it “doesn’t exist in Europe.” The data reveal nothing at all about the distribution of this variant in European Americans since no marker for this gene was found in that population. Either way, therefore, the news story did not seem to reflect the data that were reported in the article.

Thinking that this was an example of the press being irresponsibly sensationalistic, and misrepresenting the peer-reviewed article, I sent a letter on August 2nd to the Reuters science reporter and editor, signed by myself and about a dozen colleagues. We also sent a copy of the letter to the corresponding author of the article, the University of Chicago statistical geneticist Dan Nicolae.

The Reuters editor sent a detailed response without delay. She reviewed the statistical significance of the association measure and the proposed biological mechanism for how the PYHIN1 gene might affect asthma risk, and noted that the science reporter’s text was supported by interviews with two of the researchers as well as from a contact at the National Institutes of Health. To document this, she attached an e-mail from two of the authors, Dan Nicolae and Carole Ober, in which they affirmed their approval of the coverage their work had received. “First let us say that we think that the article is very well written and we have no major issues with it. We do not understand the issues raised in Dr. Kaufman's letter,” they wrote. They went on to note that perhaps the Reuters title might “slightly overstate the conclusion of our study”, but that it was a “subtle distinction” at best. “We thank you for helping us promote our science,” they concluded.

I then wrote to Dan Nicolae directly, asking him how the Reuters text could be construed to be consistent with the information in the paper. “I understand that race is a sensitive issue, subject to many debates,” he responded. “My research is on understanding molecular mechanisms of complex diseases, with the hope that this will lead to better treatments. It has nothing to do with this debate. On the Reuters news item, let me state that there are several scenarios where our data would fit with that headline. I will not discuss these scenarios here because I am convinced they will produce other discussion, and I prefer to use my time on my research projects.”

Apparently, Dr. Nicolae was comfortable that the Reuters reporting did not reflect the content of the paper because he believed that there were theories, not explored in the published article, which could make the news story valid. On the basis of his reply, I came to believe that this incident was not the result of a science reporter misunderstanding the published paper. Rather, it seemed to be the case of the scientist providing a speculative interpretation that was not vetted by the reviewers or the editors of the journal. Dr. Nicolae offered that my confusion may have arisen from ignorance, and recommended that I read up on tag SNPs, differences in linkage disequilibrium patterns between Europeans and Africans, and association signals produced by interactions. “These will lead you to these scenarios I am referring to,” he concluded.

While it is possible for a risk factor to operate in different directions across two populations, this entirely sidesteps my concern, which is that the reporting strayed from what could be said based on the content of the published article. There could be no evidence of effect measure modification presented for this variant, since there was no exposure variation in the European Americans, and therefore no association measure could be estimated in that group. Dr. Nicolae did not appear to disagree with me on this point, but seemed to view the media interview as an opportunity for presenting his research program as relevant to racial disparities in a way that could not be directly derived from the published data. This is surely a fine line, because journalists often want scientists to give their expert opinions on the broader interpretation of the published work. But how far should authors go in describing what they might speculate to be true, rather than what they actually found? The impetus for the news story was the publication of an article in a respected scientific journal. Are there really no constraints on how far authors can extend their interpretation while claiming to be referring to the article? Should they clearly indicate that they are speculating - and should they also present at the same time the potential contrary or skeptical view? With so much attention and funding riding on efforts to understand and reduce minority excess burden of disease, the authors’ speculation risks the appearance of being self-serving. If scientists sometimes disparage science reporters as the source of popular misinformation, the fair reply might therefore be “Cura te ipsum!”

If you like to comment, Email me - or in this case Dr Kaufman - directly at epidemiologyblog@gmail.com or submit your comment via the journal which requires a password protected login. Unfortunately, published comments are limited to 1000 characters.

Judging from the recent spate of articles about false-positive results in epidemiology, initiated by Ioannidis’s thought-provoking piece in our July issue [1], something seems wrong with our field. Still, the mainstay of epidemiologic reasoning is making comparisons. Hence the question: has our field more problems than other fields of medical science?

Let’s start with randomized trials. And, just as critics of epidemiology focus on specific fields (say, nutritional and life style), let’s also focus – for example, on antidepressants. John Ioannidis produced another thought-provoking paper, entitled: “Effectiveness of antidepressants; an evidence based myth constructed from a thousand controlled trials." [2] Right, that seems to settle the score with epidemiology.

Would this problem exist for anti-depressant randomized trials only? The people who first signalled the selectiveness of the evidence construction on antidepressants were employees of the Swedish drug licensing authorities. They stated that in their experience the problem is not confined to this class of drugs. [3] Just read the extremely funny paper about a new class of antipsychotics which showed that in head-to-head randomized trials Drug A was better than B, B better than C, and in turn C better than A – all for the same indications! [4] The outcome just depended on the sponsor. All in all, should we be mistaken in concluding that the field of randomized trials has as many problems with its credibility as epidemiology?

Let’s continue with Genome Wide Association Studies (GWAS). A paper in The Economist called the whole genomic scene into question with the title: “The looming crisis in human genetics” with subheading: “They have simply not been delivering the goods”. [5] The crucial sentence: “Even when they do replicate, they do never explain more than a tiny fraction of any interesting trait”.

Of course, we, epidemiologists knew this all along. Let’s focus on cancer. In a 1988 (!) study of adopted children and their biological and adoptee parents in Scandinavia, [6] the relative risk of developing any malignancy by children whose biological parent had developed a malignancy before age 50, but who had been educated in adoptive families, was 1.2 – while the relative risk if the adopting parent had developed cancer became 5-fold. A later editorial made clear that only rather weak concordance of cancer was found in twin studies. [7] Mere logic should have alerted us that it is impossible to find strong explanations of complex diseases by single somatic genes: these diseases come into being because of multiple pathways that go wrong, and each pathway can go wrong in multiple ways. Of course, there are rare families where cancer is hereditable – these have been detected, and that was it.

We pass the fact that almost none of those great GWAS discoveries has yet resulted into anything meaningful clinically or to public health – in great contrast to the record of epidemiology. And, as the Economist wrote, the yield of classic genetics has been much higher than GWAS, and clinically much more important. My only (but rather successful) brush with genetics concerned a mutation that was quite prevalent with a high relative risk, and that was discovered by first elucidating the biochemical abnormality and thereafter reasoning backwards to the gene. [8] GWAS would never have found it. If Ioannidis calls antidepressant randomized trials a well-constructed myth, should we not call GWAS the same?

For almost any field of medical science, we can point out easily that its published record must contain a massive amount of irrelevance and error. This is nothing to worry about. It is normal science – it has always been like that and it will continue to be so. Again, John Ioannidis comes to the rescue and proves the point. A personal parenthesis first: for years, during lectures, I had been telling audiences that if you want to understand how science evolves you should go to the library and look at The Lancet or BMJ or NEJM or JAMA of 50 years ago, or better 100 years ago – most of the papers you cannot understand anymore, and the rest are either irrelevant or plainly wrong. I only did the thought experiment, and never even published it, but John Ioannidis and his collaborators gathered real data, and confirmed my prejudices. In a paper entitled: “Fifty-year fate and impact of general medical journals”, [9] they write “Only 226 of the 5,223 papers published in 1959 were cited at least once in 2009 and only 13 of them received at least 5 citations in 2009.” They were mostly clinical papers, describing syndromes.

Perhaps, with the latter publication Ioannidis is biting his own tail. I am thinking about his 2005 “Why most published research findings are false”. [10] In that paper he seemed to single out observational epidemiology as the main culprit. Judging from his later judgement about myth creating by ‘a thousand RCTs’ and his recent judgement about the transiency of all science, the ultimate question becomes whether all scientific processes can be improved – not just epidemiology - so as to be less wasteful and yield more often truth.

Having just returned from the 3rd North American Congress of Epidemiology in Montreal, I started wondering whether it is epidemiologists who are most acutely aware of the tentativeness of any scientific finding. At that congress you could go from one session to the other hearing about problems of data, analysis and inference and listen to plenary lectures about wrong turns in our science. Would the same happen at, say, a congress of cardiologists? Or geneticists?

In 1906 Sir William Osler delivered an Harveian oration about“The growth of truth” in which he wrote about the vagaries of truth, and the many detours and false alleys of scientific research: “Truth may suffer all the hazards incident to generation and gestation…”. [11] His views were echoed almost 100 years later in a Millenium essay by Stephen Jay Gould who described how science progresses “in a fitful and meandering way”. [12] Perhaps the wastefulness of science is inevitable and might be compared to the zillions of meaningless mutations happening in biological systems – very few of which might carry any survival advantage.

One day, when discussing a paper with our PhD students, one asked in exasperation: “How can you ever be certain that a paper is true”. My spontaneous answer was: “Grow 25 years older – and even then…”.

If you like to comment, Email me directly at epidemiologyblog@gmail.com or submit your comment via the journal which requires a password protected login. Unfortunately, comments are limited to 1000 characters.

A health care initiative calling for comparative effectiveness research (CER), with US$ 1.1 billion initial funding was one of the most remarked early actions of the newly elected Barack Obama in 2009 [1]. The measure has now come into law, and epidemiologists and methodologists are jumping on the bandwagon - eager to contribute to a new era in health care where decisions on the worth of treatments should be based rationally on numerical evaluations – and perhaps also with an eye on research funding. The series of papers in the May 2011 issue of EPIDEMIOLOGY attempts to jump-start a discussion about CER. The new ideals are a reincarnation of France’s 1830s movement of “Médecine d’Observation” [2] – but even more worth to enthusiastically strive for in the early 21st century.

‘Haven’t we all always been CER researchers?’ – is the gist of Miguel Hernán’s commentary [3]. Yes, we have – but up to now epidemiologists have mostly covered the easy part: the adverse effects of medical treatments. In adverse-effects research, confounding by indication is mostly absent because adverse effects are usually different diseases (with different risk factors) from the one that is treated – and quite often unpredictable. Confounding by contra-indication [4], if present, can often be described in a few prescribing rules that may lead to successive restrictions during data-analysis [5]. Thus, in adverse-effects research, restrictions and a careful choice of comparators and (where necessary) ”new users” [6] leads to quite credible “expected exchangeability” of patient groups. Such research has the added advantage of being more generalizable than randomized trials, which are limited to selected populations [7].

Classic papers by methodologists as diverse as Rubin [8] and Miettinen [9] have outspoken messages: “confounding by indication” in medical research on the intended effects of treatments is tractable only by randomization. The whole Evidence-Based Medicine movement, as well as the Cochrane Collaboration, are built on this very idea. Both tried to revolutionize medicine at the end of the last century. If randomization is the only solution to confounding by indication, then the prospects of CER are severely crippled -- CER would be limited to adverse-effect pharmacoepidemiology – which is indeed what we have always done.

However, the main aim of CER, as explicitly announced by Obama himself, is to compare effectiveness of drugs in daily practice [10]. So it is no surprise that, in an earnest effort to join forces to change health care (and to bring the US closer to what is happening in Europe, e.g., in NICE [2, 11]), people from all sides are enthusiastically trying to nibble away at these classic notions. Admittedly, when confounders are few and easily measured precisely (as in the example of sequential CD4 counts and HIV treatment [12]), the classic papers have been proven wrong. However, in other instances, when judgments about prognosis of patients are complex and may include hard-to-quantify characteristics like “degree of oedema,” or “impression of frailty” [13], it has been shown repeatedly that confounding by indication remains “a most stubborn bias.” [14, 15].

Should we give up in advance, or should we see how far we can get in attempting what was judged impossible: to evaluate the beneficial effects of treatments by non-randomized studies? I have strong sympathies with people who make the attempt. Epidemiology is an evolving discipline that makes progress. Think about our insights about confounding, and about case-control studies that were revolutionized in the late 1970s and early 1980s, and then again over the last decade. Still, it is likely that in most instances mere statistical adjustment for confounding will not suffice to replace randomization. We should explore techniques that promise to address unmeasured confounding by indication, such as instrumental variables or severe restrictions, which can help in particular circumstances that should be defined. However, severe restrictions may wreck another ideal of CER: to show what works in daily practice for a wide array of patients. So – we should explore how far we can push observational epidemiology, we should seek to develop new methods, but we should keep an open mind for the possibility of failure. Whatever one’s hopes or enthusiasms, the classic papers may still be right. Clinical trialists have already predicted that CER will lead to a lowering of standards of evidence because of “data mining.”[16] If, on the other hand, CER succeeds, Obama’s presidential legacy will include a change of epidemiologic theory.

If you like to comment, Email me directly at epidemiologyblog@gmail.com or submit your comment via the journal which requires a password protected login. Unfortunately, comments are limited to 1000 characters.

Once I asked a science writer with a lot of experience in organizing science exhibitions whether it would be possible to organize an exhibition about epidemiology for a general public. He admitted that it would be difficult, because there are no artefacts to show, nor compelling pictures – mainly graphs and tables. We left it at that. After all, the first professor of epidemiology at the London School of Hygiene and Tropical Medicine, Greenwood (1880-1949), once characterized himself as one who had loved his friends, wife and dog, and “… had pleasure in books and numbers” [1]. Having pleasure in ideas and numbers would still characterize many an epidemiologist today – but how can one make this into stuff that is exhilarating for a wider public?

It was mailed to me by Allen Wilcox who described it as "two centuries of social and public health progress boiled down to four graphic minutes.” The author and presenter is Hans Rosling, Professor of International Health at Karolinska Institute, Sweden. The remarkable achievement of his video does not come out of the blue. Rosling has made countless videos on statistics of diverse international health topics, like HIV, rising economies, and maternal and child health. He has his own website 'Gapminder', dedicated to teaching a “fact-based world view” at http://www.gapminder.org/ . He is a frequent contributor to TED (Technology, Entertainment and Design), and was subject of a BBC program on the 'Joy of statistics'.

I am struck by the originality of Rosling’s techniques (and how he constantly improves them) in communicating complex numerical data. This culminates in his new video in which he shows the history of life expectancy and its determinants, for 200 countries over 200 years, with 120,000 data points. Awesome: a must-see.

If you like to comment, Email me directly at epidemiologyblog@gmail.com or submit your comment via the journal which requires a password protected login. Unfortunately, comments are limited to 1000 characters.

[1] Pemberton J. Will Pickles of Wensleydale; the life of a country doctor. Page 121: Letter from Greenwood to Pickles. Royal College of General Practitioners 1984 (2nd ed). [First edition, 1970].

The Netherlands always prided itself in its custom of delivery by midwives – widely regarded by the Dutch as more humane and natural. On 2nd of November 2010, BMJ published a paper showing that in an area of The Netherlands, perinatal mortality was twice as high in deliveries started under the supervision of midwives as in those started under the supervision of an obstetrician [1]. On the 14th of December the Minister of Health announced sweeping changes to the organisation of health care during pregnancy and delivery in the Netherlands [2].

The findings in the BMJ paper rocked Dutch obstetric care to its foundations. Delivery by a midwife, either at home or as an outpatient, has long been the Dutch default option. Midwives select which deliveries they expect to be high risk (multiple pregnancy, previous caesarean, known congenital and placental problems, growth retardation, preeclampsia etc). Those pregnant women are referred to obstetricians and have a fully ‘medicalized’ delivery in-hospital. The remaining pregnancies are regarded as sufficiently low risk for home or outpatient delivery under supervision by a midwife.

The BMJ paper was criticized in an avalanche of letters to the editor, as well as in sharp comments in Dutch newspapers. All kinds of methodologic arguments were raised against the findings: the mortality in the low risk group in the new study would be different from that found in other studies, the study could not correct for confounding, the study was not ‘prospective,’ and the analysis was not prespecified in a protocol (!). All arguments and counterarguments – including my own entry – can be found in the Rapid Responses to the paper in BMJ [3]. My personal view is that, when all the counterarguments are weighed, they do not detract from the results unless a gross calculation error had been made or there was some very strange play of chance.

The most interesting counterargument was that there was no control for confounding. Indeed, the researchers could not stratify the denominator according to delivery and perinatal risk factors. Still, it is obvious that any difference in risk indicators between the two groups would go against the results found: pregnant women with known higher risk are referred to obstetricians. If correction for baseline risk had been possible, the difference would even have been more dramatic.

The new data come against a particular background. Already in 1986, a paper had shown that The Netherlands had dropped in its rank order in European national perinatal statistics [4]. Where once The Netherlands was one of the countries with lowest pregnancy-related baby deaths, by the early 1980s it was amongst the European countries with highest mortality. As in 2010, the 1986 paper was met with a barrage of methodologic criticisms. The main argument, next to arguments about ‘fishing expeditions’, was that registration of perinatal deaths was different and more complete in the Netherlands. How this could account for a shifting rank order always remained a puzzle to me: either the Netherlands would have steadily improved its perinatal death registration (and was originally also amongst the countries with high death rates), or registration of all other European countries would have become considerably sloppier. Anyway, nothing happened.

The poor rank order position of the Netherlands was confirmed in a pan-European survey (PERISTAT-I) in 2003, and swiftly dismissed by the Ministry of Health with the same old arguments. The position of the Netherlands was again confirmed in PERISTAT-II in 2008. By this time, The Netherlands had twice the mortality of leading European countries. In the wake of this confirmation, working groups were finally established. It appeared that about 50% of women who started labour for a first delivery under supervision of a midwife needed to be referred to an obstetrician during the delivery because of acute complications. It also appeared that these transitions were far from smooth. In the recent study the highest mortality and Neonatal Intensive Care admission rates were found in the babies of women who needed urgent referral during labour – despite the fact that these pregnancies had been regarded as low risk throughout the pregnancy and up to the initiation of labour.

These developments seem to conform to the law of ‘de remmende voorsprong’: the law of the ‘retarding lead.’ Coined by the Dutch historian Huizinga in 1937, this notion suggests that being ahead may slow you down in improving further. Decades ago, when medical interventions were crude, a solid force of midwives delivered the country in a more even and resourceful way than may have happened in other countries. But as the medicalization of pregnancy and delivery became more refined, this advantage was lost.

Besides methodologic arguments, current proponents of midwife delivery argue that we should not become preoccupied with the statistics of perinatal deaths, which are only a very few. They argue that generalized care of pregnancies by obstetricians will result in unwanted medicalization (such as having too many Caesareans) and that women will pay the price in subsequent pregnancies. The nightmarish example is, of course, the US: numbers are cited of 30% Caesarians, coupled to poor perinatal outcomes. So, they argue that ‘we should still be proud of home delivery’.

In January 2010, a committee of the Ministry of Health, installed in 2008, had issued a report, written by representatives of midwives (who are largely self-employed) and obstetricians (who have their practices affiliated to hospitals), urging the abolishment of barriers between midwife and obstetric care. In the wake of the publication of the latest study in BMJ [1] the recommendations of this report have now been urgently accepted by the current Minister of Health in a letter to parliament, signed on the 14th of December 2010 [2]. The report argues for restructuring the process of care so that the central focus is the baby and mother rather than the type of care giver. The report asks for less than 15 mins transit time between midwife and obstetrician ‘for necessary interventions during delivery’. Barriers to cooperation exist not just during the acute delivery phase, but also during pregnancy, with poor referral practices in both directions. While the debate is not yet completely closed, the methodological arguments will slowly be replaced by investigations about practical solutions. The recent data (which, rumour has it, will be confirmed in other Dutch provinces), clearly increased the urgency of the recommendations. Still, the fact is that it has taken three decades for professional organisations to sit together to find solutions based on epidemiologic health care research.

If you like to comment, Email me directly at epidemiologyblog@gmail.com or submt your comment via the journal which requires a password protected login. Unfortunately, comments are limited to 1000 characters.

As I mentioned in the editorial announcing this blog, I am open to ideas, great and small to blog about. This is one volunteered by Jay Kaufman:

“DataThief”

“DataThief III is a program to extract (reverse engineer) data points from a graph. Typically, you scan a graph from a publication, load it into DataThief, and save the resulting coordinates, so you can use them in calculations or graphs that include your own data.” [from the link]

What a nifty idea! For many teaching examples, we’d like to analyze data presented in graphs and figures. It can be a painstaking process to read the data points off of the image, especially if the type is small or the resolution poor. We asked a colleague to do a validation run of test cases (where the original data were available). He reports that, while not completely user-friendly (getting the marker exactly on the data point isn't easy), the answers on a test case with known values were within rounding error. With some effort this could really come in handy.

If you like to comment, Email me directly at epidemiologyblog@gmail.com or submt your comment via the journal which requires a password protected login. Unfortunately, comments are limited to 1000 characters.

Two recent debates have addressed the registration of observational research (discussed in the September issue of this journal [1]). One debate was at the August meeting of the International Conference on Pharmacoepidemiology (ICPE) at Brighton, UK, and the second was at the September meeting of the American College of Epidemiology (ACE) at San Francisco. [Full disclosure: I took part in both.]

The idea of registering observational research was launched at the end of 2009 in a meeting organized by a group representing the European chemical industry [2] – an industry that feels epidemiology is behaving irresponsibly. Thereafter, the registration idea was enthusiastically embraced by Lancet [3] and BMJ [4] with arguments that reveal the great confusion that prevails when observational research is discussed and pitted against RCTs.

BMJ editors consider observational research 'vulnerable to bias and selective reporting': researchers 'may … craft a paper that selectively emphasises certain results, often those that are statistically significant or provocative'. In the future, BMJ will demand 'a clear statement of whether the hypothesis arose before or after the inspection of the data' (if afterwards, the journal will demand extra explanations), and they will ask 'whether the study was registered, and if registered whether the protocol was registered before data acquisition or analysis began'. BMJ’s reason is that they are interested only in papers that have clear and immediate clinical relevance.

Are we allowed to have new ideas while exploring existing data? At ICPE, the debate was about multiplicity in pharmacoepidemiology. The argument against multiple analyses of pharmacoepidemiologic data was defended by Stan Young and Stuart Pocock, based on the same reasoning that makes subgroup analyses ‘not done’ in randomized trials (RCTs). On the other side, Ken Rothman and Sonia Hernandez-Diaz argued that multiple analyses are a hallmark of good science: good science investigates several aspects of a question and is not limited to a single prespecified question and analysis. Epidemiologists learn during data analysis, in particular in large complex databases; they behave like lab scientists who adapt their experiments and change their protocols after seeing the results of the previous experiment.

Consider real epidemiology practice. Of course, we always tell our PhD students to have prespecified research questions and a prespecified plan when ‘attacking’ a data set. The reason is not to make the results more believable. The reason is to avoid getting lost in your data analysis: to know what you are doing, why you are doing it and where you came from - just as lab scientists keep notes of their experiments in lab journals.

Almost all science starts with a preconceived idea, and a lot of science will have some protocol. Think of archeologists. They will start digging somewhere with an idea in mind – otherwise they would not get funded. Suppose that while working at the terrain, they notice that the strange shape of the next hill is also promising. After a test dig, artefacts are found. Are they 'data dredgers' whose findings should be treated with suspicion?

At ACE in San Francisco, the debate session was about registration of observational research. The 'pro' position was defend by Douglas Weed, of DLW Consulting Services [5] who largely approved of the document of the chemical industry. In Weed’s view, true transparency was an obligation to society and meant making protocols available beforehand. On the other side, Richard Rothenberg (editor of the Annals of Epidemiology) felt that for journals to require registration would promote standardization and restrict an editor's mandate to foster innovation and creativity. I also spoke against registration, based on the premise that RCTs – which seem to be guiding beacons – are, in fact, scientifically the 'odd man out'. RCTs try to avoid multiple and post hoc analyses at all costs. These safeguards are necessary for the credibility of the small number of RCTs that usually suffices for drug approval. Indeed, the whims of an investigator who sees something interesting in the data of a single trial should not bear on medical decisions that have consequences for millions of patients. Registration of RCTs was set up as a stringent measure to avoid selective reporting, and rightly so.

Recently, Mark Parascandola defined 'epistemic risk': “In drawing an inferential conclusion or accepting a hypothesis as true, one takes on an ‘epistemic risk’ – the risk of being wrong.” [6]. The RCT procedure can be seen as minimizing epistemic risk – that is, minimizing the risk of a wrong answer for the key question. However, minimizing type I error increases type II error, and hence prevents us from seeing new things. It is not clear which error (type I or II) is the worst when we try to explain Nature. Much good can come from an idea that initially lacks strong support, or that seems at first ‘useless’, or that while wrong leads to new insights. Maximal avoidance of type I error is contrary to an important aim of science: to discover new explanations.

What seems to be happening is that the mantra of ‘type I error avoidance’ that serves RCTs so well, is now indiscriminately carried over to observational research. When the BMJ editorial is followed to the letter, any new idea that occurs during data analysis should be registered first – and even then the researcher is cheating, since the idea occurred after seeing the data.

The support of Lancet and BMJ for registration rests on the premise that all sciences should behave like RCTs. Imagine telling a theoretical physicist, an evolutionary biologist, a molecular biologist or an astronomer that she should not publish any thought or finding other than the ones she had in mind several years earlier! Science requires publication of those insights that seem to carry us forward - not the whole history of all wrong ideas, mishaps and detours. The acceptance of your paper will come from others who explore the consequences of your ideas, and who look for alternative explanations (like bias and confounding). Often, this is a long process. When alternative explanations are ruled out in a credible way, observational data may lead to action – even regulation - as much as RCTs. Whether a particular hypothesis or analysis was prespecified plays no role in that process.

The debate on registration of observational research touches on the fundamentals of how scientific progress is made. No real surprise that this will be different for different sciences. That makes these debates interesting and exciting.

More debates are forthcoming. The next one that I know of is on 14 December 2010 at the Amsterdam Medical Center in the Netherlands, where the lecturer is Kay Dickersin, Director of the US Cochrane Center at Johns Hopkins. She has published extensively about selective publications that may wreck meta-analyses of RCTs. Rumour has it that there are budding plans to bring up the topic at the 3rd North American Congress of Epidemiology in Montreal in 2011, as well.

If you like to comment, Email me directly at epidemiologyblog@gmail.com or submt your comment via the journal which requires a password protected login.

Sure, you are not misreading and this is no printing error. Raj Bhopal, professor of epidemiology, chair of Public Health Sciences at Edinburgh, Scotland, and writer of a popular textbook on epidemiology, has proposed this very idea. He mentions 7 sins of epidemiology, and as one of the remedies he proposes a World Council for Causality (http://www.ete-online.com/content/6/1/6 ) . Two things the Council might do, he suggests, would be to:

“....[provide] authoritative statements on epidemiological evidence and [make] recommendations on when and how epidemiological data on associations are ready for application”.

Such a Council

“... could hasten advances, and counter the onslaught of undigested associations that bewilder us and will be multiplying as computerised data mining, data linkage, genetic epidemiology, and grand-scale epidemiology on millions of study participants become commonplace.”

The concerns that prompt Raj Bhopal are shared by many of us. On one hand, there is unwarranted criticism of epidemiology. How often have we heard 'this study is observational, so there is always a potential problem of bias and confounding'. Do people forget that all genetics and all research on infectious disease outbreaks is observational? On the other hand, we are dismayed by unwarranted credulity: 'researchers showed that men who continue leading active sex lives....'. The latter might be replaced by anything that you have to do regularly and that needs a healthy brain and/or body to continue doing it – like taking your statins regularly – which has led to a boom of papers about the associated health benefits of doing so. Papers about seemingly-nonsensical associations are a source of criticism from outside of epidemiology, but also clearly a worry to Raj Bhopal. He revisits the problem in the second edition of his textbook 'Concepts of Epidemiology' (Oxford University Press 2008).

Should we bring some order? Could we help people to arrive at balanced judgements? From a completely different angle, an attempt at structuring our thoughts was made by Paul Rosenbaum, whose name will be forever linked to the Propensity Score. In the first chapter of his recent book on the 'Design of Observational Studies' (Springer, 2010). Rosenbaum lists 7 – seven again! – basic ingredients of epidemiologic studies. Each ingredient is discussed in the context of three types of studies: a randomized experiment, a better observational study and a poorer observational study. For example, under the heading 'How were treatments assigned', Rosenbaum writes that in the better observational study,

"...circumstances for the study were chosen so that treatment seems haphazard, or at least not obviously related to the outcomes subjects would exhibit under treatment or under control”.

In contrast, the poorer observational study gives

“little attention … to the process that made some people into treated subjects and others into controls”.

A troubling point is that Bhopal’s and Rosenbaum’s 7 points do not overlap except for the question of whether treatment and control groups are comparable. Their outlooks are different, one writer being a public health person and the other a statistician writing about study design. By the way, checking the comparability of groups can become a recipe for disaster if applied to case-control studies: when persons not familiar with this design demand that cases and controls are comparable in 'all respects' save disease. That is impossible: even in randomized trials the patients that develop the study outcome (in either treatment arm) will be different in many respects from those who do not.

I was involved in drafting STROBE, the guidelines about reporting observational studies ( http://www.strobe-statement.org ). It has often crossed my mind whether we should have additional guidelines to help people to think in a structured way about the credibility of an epidemiologic finding. Guidelines for reporting like STROBE are, unfortunately, often believed to say something about the validity and therefore credibility of a study. If nothing else, having a separate set of guidelines to help people interpret observational studies, might clarify the difference with guidelines for reporting. The greatest use of guidelines for interpretation might be by persons who are not professional epidemiologists and who are bewildered by all the arguments that can be used to either deride or bolster an epidemiologic finding. In my own work, I have come not much further than paraphrasing Rosenbaum’s treatment assignment rule, but I have brought in a component that neither Bhopal nor Rosenbaum clearly mentions: the independent existence of evidence about an hypothesis, possibly formalized as prior odds, strongly determines our belief in a finding. The strength of the prior may even explain why randomized trials look more credible than observational studies [1]. So, in my teaching, I am stuck with a mere 2 rules.

Should we worry about criticisms of epidemiology that we publish too many associations? Is it possible to educate people in thinking about the credibility of observational studies? Or is any judgment totally ad hoc, depending as much on subject-matter knowledge as on formal methods? A Dutch cartoon shows two people staring at a computer and commenting: “The probability that almost all professors of statistics agree... is, of course, very small.”

If you like to comment, Email me directly at epidemiologyblog@gmail.comor submt your comment via the journal which requires a password protected login.

The January 2010 issue of EPIDEMIOLOGY contained a spate of papers about our trade’s fixation with relative measures of risks – amongst others, the odds ratio in case-control studies. This continuing discussion remains a source of profound wonder. Our fixation with the odds ratio in case-control studies has its origin in Cornfields 1951 paper [1] in which he proposed the “rare disease assumption” to turn the odds ratio into a relative risk. For all intents and purposes, this approach should have been buried 30 years ago after the publication of Miettinen’s “Estimation and Estimability in Case-Referent Studies”[1].

The quantity that we calculate from case-control studies was not always known as the odds ratio; neither was the “rare disease assumption” inevitable. In the discussion section of their 1950 case-control study on smoking and lung cancer, Doll and Hill wrote [1] – a year before Cornfield:

“If it can be assumed that the patients without carcinoma of the lung who lived in Greater London at the time of their interview are typical of the inhabitants of Greater London in regard to their smoking habits, then the number of people in London smoking different amounts of tobacco can be estimated. Ratios can be obtained between the number of patients seen with carcinoma of the lung and the populations at risk who have smoked comparable amounts of tobacco.”

These ratios are not actual risks, they wrote, but are proportional to those risks. Upon dividing these ratios, Doll and Hill presented relative risks as a direct estimate, without the rare disease assumption. Just imagine that if Cornfield had had his flash of insight a few years later. Doll and Hill’s approach might have become the landmark example. Then, to teach and explain case-control studies, we might have used “density sampling” as the most natural thing in the world. In “density sampling” the ratio of exposed vs. non-exposed persons in the control group stands for a ratio of exposed vs. unexposed person-years from which cases emerge (and not for the proportion of exposed or unexposed persons who do not become diseased at the end of a fixed follow-up). Epidemiology could have developed without the rare disease assumption. This is not a mere flight of fancy: the quantity that was described in words by Doll and Hill is called a “pseudo-rate” in the 2009 edition of by Rothman, Greenland and Lash, with a similar rate ratio calculation as the basis for understanding case-control studies (page 113).

Since the 1980s, the different meanings of the odds ratio in diverse sampling situations have been refined in numerous papers and textbooks. Yet, this has failed to influence the practice of epidemiology. In a recent overview, we found that authors overwhelmingly prefer the term “odds ratio” in case-control studies, without any further interpretation; the few times that authors try to explain what their odds ratios mean, they make errors [2]. More surprisingly, a large number of textbooks still uses Cornfield’s 1951 teaching of the rare-disease assumption as if nothing happened over the past half century (list of textbooks and detailed results in [2]). In a personal discussion, one textbook author defended this practice, saying he felt that the rare-disease assumption was still the easiest way to explain case-control studies: any audience would immediately grasp it, without further background knowledge.

Granted, Cornfield’s 1951 paper was an enormous leap forward, and should be credited for making a first step in statistically formalizing case-control studies and making them credible. Doll and Hill never formalized what they did. Yet, why do we still use Cornfield’s teaching to explain case-control studies? A colleague who took the Distance Learning Programme at the London School of Hygiene and Tropical Medicine was told to use the long- antiquated reasoning to pass the basic course exam; the insights that have been developed over the past three decades would count as wrong answers. Granted again, to comprehend why an odds ratio can be a rate ratio, a risk ratio or a prevalence odds ratio (or any other terminology that you fancy), necessitates some basic knowledge about open vs. closed cohorts (dynamic vs. fixed populations, if you prefer those terms).

My diagnosis is that we persevere with our odds-ratio fixation in case-control studies because of deficient teaching in our basic epidemiology courses. It must be possible to set up teaching modules that explain incidence rates in open populations, without needing much more than high-school algebra. If such teaching exists in some “Epidemiology 101”, please tell the epidemiologic community, because it makes the bridge to “density sampling” immediate and natural. Density sampling in an open population is the basis of comprehension of case-control studies. Further refinements about the rarely-applied nested or case-cohort designs in closed cohorts [2] can be left to advanced courses. Thus, we might once-and-for-all replace the odds ratio and rare-disease assumption with knowledge that has existed already for more than 30 years!

If you like to comment, Email me directly at epidemiologyblog@gmail.com or submt your comment via the journal which requires a password protected login.