Wednesday, 21 November 2012

There’s no doubt that getting tipsy while pregnant is a seriously bad idea. Alcohol is a toxin that can pass through the placenta to the foetus and cause damage to the developing brain. For women who are regular heavy drinkers or binge drinkers, there is a risk that the child will develop foetal alcohol syndrome, a condition that affects physical development and is associated with learning difficulties.
But what of more moderate drinking? The advice is conflicting. Many doctors take the view that alcohol is never going to be good for the developing foetus and they recommend complete abstention during pregnancy as a precautionary measure. Others have argued, though, that this advice is too extreme, and that moderate drinking does not pose any risk to the child.

Last week a paper by Lewis et al was published in PLOS One providing evidence on this issue, and concluding that moderate drinking does pose a risk and should be avoided. The methodology of the paper was complex and it’s worth explaining in detail what was done.

The researchers used data from ALSPAC, a large study that followed the progress of several thousand British children from before birth. A great strength of this study is that information was gathered prospectively: in the case of maternal drinking, mothers completed questionnaires during pregnancy, at 18 and 32 weeks gestation. Obviously, the data won’t be perfect: you have to rely on women to report their intake honestly, but it’s hard to see how else to gather such data without being overly intrusive. When children were 8 years old, they were given a standard IQ test, and this was the dependent variable in the study.

One obvious thing to do with the data would be to see if there is any relationship between amount drank in pregnancy and the child’s IQ. Quite a few studies have done this and a recent systematic review concluded that, provided one excluded women who drank more than 12 g (1.5 UK units) per day or who were binge-drinkers, there was no impact on the child. Lewis et al pointed out, however, that this is not watertight, because drinking in pregnancy is associated with other confounding factors. Indeed, in their study, the lowest IQs were obtained by children of mothers who did not drink at all during pregnancy. However, these mothers were also likely to be younger and less socially-advantaged than mothers who drank, making it hard to disentangle causal influences.

So this is where the clever bit of the study design came in, in the shape of mendelian randomisation. The logic goes like this: there are genetic differences between people in how they metabolise alcohol. Some people can become extremely drunk, or indeed ill, after a single drink, whereas others can drink everyone else under the table. This relates to variation in a set of genes known as ADH genes, which are clustered together on chromosome 4. If a woman metabolises alcohol slowly, this could be particularly damaging to the foetus, because alcohol hangs around in the bloodstream longer. There are quite large racial differences in ADH genes, and for that reason the researchers restricted consideration just to those of White European background. For this group, they showed that variation in ADH genes is not related to social background. So they had a very specific prediction: for women who drank in pregnancy, there should be a relationship between their ADH genes and the child’s outcome. However, if the woman did not drink at all, then the ADH genotype should make no difference. This is the result they reported. It’s important to be clear that they did not directly estimate the impact of maternal drinking on the child’s IQ: rather, they inferred that if ADH genotype is associated with child’s IQ only in drinkers, then this is indirect evidence that drinking is having an impact. This is a neat way of showing that there is an effect of a risk factor (alcohol consumption) avoiding the complications of confounding by social class differences.

Several bloggers, however, were critical of the study. Skeptical Scalpel noted that the effect on IQ was relatively small and not of clinical significance. However, in common with some media reports, he seems to have misunderstood the study and assumed that the figure of 1.8 IQ points was an estimate of the difference between drinkers and abstainers – rather than the effect of ADH risk alleles in drinkers (see below). David Spiegelhalter pointed out that there was no direct estimate of the size of the effect of maternal alcohol intake. Indeed, when drinkers and non-drinkers were directly compared, IQs were actually slightly lower in non-drinkers. Carl Heneghan also commented on the small IQ effect size, but was particularly concerned about the statistical analysis, arguing that it did not adjust adequately for the large number of genetic variants that were considered.

Should we dismiss effects because they are small? I’m not entirely convinced by that argument. Yes, it’s true that IQ is not a precise measure: if an individual child has an IQ of 100, there is error of measurement around that estimate so that the 95% confidence interval is around 95-105 (wider still if a short form IQ is used, as was the case here). This measurement error is larger than the per-allele effects reported by Lewis et al., but they were reporting means from very large numbers of children. If there are reliable differences between these means, then this would indicate a genuine impact on cognition, potentially as large as 3.5 IQ points (for those with four rather than two risk alleles). Sure, we should not alarm people by implying that moderate drinking causes clinically significant learning difficulties, but I don’t think we should just dismiss such a result. Overall cognitive ability is influenced by a host of risk factors, most of which are small, but whose effects add together. For a child who already has other risks present, even a small downwards nudge to IQ could make a difference.

But what about Heneghan’s concern about the reliability of the results? This is something that also worried me when I scrutinised Table 1, which shows for each genetic locus the ‘per allele’ effect on IQ. I’ve plotted the data for child genotypes in Figure 1. Only one SNP (#10) seems to have a significant effect on child IQ. Yet when all loci were entered into a stepwise multiple regression analysis, no fewer than four child loci were identified as having a significant effect. The authors suggested that this could reflect interactions between genes that are on the same genetic pathway.

I had been warned about stepwise regression by those who taught me statistics many years ago. Wikipedia has a section on Criticisms, noting that results can be biased when many variables are included as predictors. But I found it hard to tell just how serious a problem this was. When in doubt, I find it helpful to simulate data, and so that is what I did in this case, using a function in R that generates multivariate normal data. So I made a dataset where there was no relationship between any of 11 variables – ten of which were designated as genetic loci, and one as IQ. I then ran backwards stepwise regression on the dataset. I repeated this exercise many times, and was surprised at just how often spurious associations of IQ with ‘genotypes’ was seen (as described here). I was concerned that this dataset was not a realistic simulation, because the genotype data from Lewis et al consisted of counts of how many uncommon alleles there were at a given locus (0, 1 or 2 – corresponding to aa, aA or AA, if you remember Mendel’s peas). So I also simulated that situation from the same dataset, but actually it made no difference to the findings. Nor did it make any difference if I allowed for correlations between the ‘genotypes’. Overall, I came away alarmed at just how often you can get spurious results from backwards stepwise regression – at least if you use the AIC criterion that is the default in the R package.

Lewis et al did one further analysis, generating an overall risk score based on the number of risk alleles (i.e. the version of the gene associated with lower IQ) for the four loci that were selected by the stepwise regression. This gave a significant association with child IQ, just in those who drunk in pregnancy: mean IQ was 104.0 (SD 15.8) for those with 4+ risk alleles, 105.4 (SD = 16.1) for those with 3 risk alleles and 107.5 (SD = 16.3) for those with 2 or less risk alleles. However, I was able to show very similar results from my analysis of random data: the problem here is that in a very large sample with many variables some associations will emerge as significant just by chance, and if you then select just those variables and add them up, you are capitalising on the chance effect.

One other thing intrigued me. The authors made a binary divide between those who reported drinking in pregnancy and those who did not. The category of drinker spanned quite a wide range from those who reported drinking less than 1 unit per week (either in the first 3 months or at 32 weeks of pregnancy) up to those who reported drinking up to 6 units per week. (Those drinking more than this were excluded, because the interest was in moderate drinkers). Now I’d have thought there would be interest in looking more quantitatively at the impact of moderate drinking, to see if there was a dose-response effect, with a larger effect of genotype on those who drank more. The authors mentioned a relevant analysis where the effect of genotype score on child IQ was greater after adjustment for amount drank at 32 weeks of pregnancy, but it is not clear whether this was a significant increase, or whether the same was seen for amount drank at 18 weeks. In particular, one cannot tell whether there is a safe amount to drink from the data reported in this paper. In a reply to my comment on the PLOS One paper, the first author states: “We have since re-run our analysis among the small group of women who reported drinking less than 1 unit throughout pregnancy and we found a similar effect to that which we reported in the paper.” But that suggests there is no dose-response effect for alcohol: I’m not an expert on alcohol effects, but I do find it surprising that less than one drink per week should have an effect on the foetal brain – though as the author points out, it’s possible that women under-reported their intake.

I’m also not a statistical expert and I hesitate to recommend an alternative approach to the analysis, though I am aware that there are multiple regression methods designed to avoid the pitfalls of stepwise regression. It will be interesting to see whether, as predicted by the authors, the genetic variants associated with lower IQ are those that predispose to slow alcohol metabolism. At the end of the day, the results will stand or fall according to whether they replicate in an independent sample.

Thursday, 15 November 2012

I just love the fact that the BBC have a Democracy Live channel where you can watch important government business. The Public Accounts Committee may sound incredibly dull, but I found this footage riveting. The committee grills executives from Starbucks, Amazon and Google about their tax arrangements. Quite apart from the content, it provides a wealth of material for anyone interested in how we interpret body language as a cue to a person's honesty. But for me it raised a serious issue about Starbucks. Is it run by aliens?

Tuesday, 13 November 2012

Early in October a weird story hit the media: a nation’s chocolate consumption is predictive of its number of Nobel prize-winners, after correcting for population size. This is the kind of kooky statistic that journalists love, and the story made a splash. But was it serious? Most academics initially assumed not. The source of the story was the New England Journal of Medicine, an august publication with stringent standards, which triages a high proportion of submissions that don’t get sent out for review. (And don't try asking for an explanation of why you’ve been triaged). It seemed unlikely that a journal with such exacting standards would give space to a lightweight piece on chocolate. So the first thought was that the piece had been published to make a point about the dangers of assuming causation from correlation, or the inaccuracies that can result when a geographical region is used as the unit of analysis. But reading the article more carefully gave one pause. It did have a somewhat jocular tone. Yet if this was intended as a cautionary tale, we might have expected it to be accompanied by some serious discussion of the methodological and interpretive problems with this kind of analysis. Instead, beneficial effects of dietary flavanols was presented as the most plausible explanation of the findings.

The author, cardiologist Franz Messerli, did discuss the possibility of a non-causal explanation for the findings, only to dismiss it. He stated “as to a third hypothesis, it is difficult to identify a plausible common denominator that could possibly drive both chocolate consumption and the number of Nobel laureates over many years. Differences in socioeconomic status from country to country and geographic and climatic factors may play some role, but they fall short of fully explaining the close correlation observed.” And how do we know “they fall short?” Well, because the author, Dr Messerli, says so.

As is often the case, the blogosphere did a better job of critiquing the paper than the journal editors and reviewers (see, for instance, here and here). The failure to consider seriously the role of a third explanatory variable was widely commented on, but, as far as I am aware, nobody actually did the analysis that Messerli should have done. I therefore thought I'd give it a go. Messerli explained where he’d got his data from – a chocolatier’s website and Wikipedia – so it was fairly straightforward to reproduce them (with some minor differences due to missing data from one chocolate website that's gone offline). Wikipedia helpfully also provided data on gross domestic product (GDP) per head for different nations, and it was easy to find another site with data on proportion of GDP spend on education (except China, which has figures here). So I re-ran the analysis, computing the partial correlation between chocolate consumption and Nobel prizes after adjusting for spend per head on education. When education spend was partialled out, the correlation dropped from .73 to .41, just falling short of statistical significance.

Since Nobel laureates typically are awarded their prizes only after a long period of achievement, a more convincing test of the association would be based on data on both chocolate consumption and education spend from a few decades ago. I’ve got better things to do than to dig out the figures, but I suggest that Dr Messerli might find this a useful exercise.

Another point to note is that the mechanism proposed by Dr Messerli involves an impact of improved cardiovascular fitness on cognitive function. The number of Nobel laureates is not the measure one would pick if setting out to test this hypothesis. The topic of national differences in ability is a contentious and murky one, but it seemed worth looking at such data as are available on the web to see what the chocolate association looks like when a more direct measure is used. For the same 22 countries, the correlation between chocolate consumption and estimated average cognitive ability is nonsignificant at .24, falling to .13 when education spend is partialled out.

I did write a letter to the New England Journal of Medicine reporting the first of my analyses (all there was room for: they allow you 175 words), but, as expected, they weren't interested. "I am sorry that we will not be able to print your recent letter to the editor
regarding the Messerli article of 18-Oct-2012." they wrote. "The space available for
correspondence is very limited, and we must use our judgment to present a
representative selection of the material received."

It took me all of 45 minutes to extract the data and run these analyses. So why didn’t Dr Messerli do this? And why did the NEJM editor allow him to get away with asserting that third variables “fall short” when it’s so easy to check it out? Could it be that in our celebrity-obsessed world, the journal editors think that there’s no such thing as bad publicity?