I wish no harm to the authors of “Fruit and vegetable consumption and all-cause, cancer and CVD mortality: analysis of Health Survey for England data”

J Epidemiol Community Health doi:10.1136/jech-2013-203500

whose recent publication has received approving coverage in the media. We are colleagues at the same godless institution, so they cannot be all bad. But (and you expect nothing less from me) I am not bowled over by their arguments about the benefits of vegetables. You will have caught the general tenor of my criticisms of this sort of work in “Diet is an IQ test” http://drjamesthompson.blogspot.co.uk/2013/11/diet-is-iq-test.html

Following Prof David Colquhoun, I joined him in quoting with approval a paper BMJ 2013;347:f6698 doi: 10.1136/bmj.f6698 (Published 14 November 2013) by John Ioannidis in whose train I ranted thus: “Samples of about 70,000 followed until death (with a proper link to death registers) will be required to identify even a few general patterns in diet which might account for a 5-10% increase in risk. If the studies are to mean anything, IQ, personality, sociological and occupational variables will have to enter the mix, and participants will probably have to be paid to stick to the course, and put up with random visits of inspectors looking in the fridge and the medicine cabinet.”

So imagine my pleasure, or alarm, when this paper turns out to have followed 65,226 persons drawn from a nationally representative sample for 7.7 years and visited them at home to find out what they had eaten yesterday (thus remarkably improving accuracy of their recall) and then linking the respondents to death registers. Rather disarming, isn’t it? The authors seem to have got good data without paying participants or raiding their refrigerators. The authors admit that the main limitation is that measurement of fruit and vegetable intake occurred at only one point in time and relies on self-report. There may be social desirability bias and random error (forgetting) in the recall of fruit and vegetable consumption. However, while short of perfect monitoring, this is a big step forwards. All this is very good, and shows epidemiology at its best.

Undaunted, I moved to the second half of my diatribe “IQ, personality, sociological and occupational variables will have to enter the mix”. Here I have found some things to complain about. Although they included sociological and occupational variables, they did not measure IQ or personality. Frankly, I don’t expect that of epidemiologists, because those measures are often neglected by psychologists anyway.

That aside, the authors carry on doing good things by offering us some old fashioned means and standard deviations, with participants categorized by the number of portions of fruit and vegetables they consume. These are the sorts of simple statistics I can understand. For example, the English eat slightly over 2 portions of fruit, and 1.5 portions of vegetables a day. The propaganda about vegetables has left them relatively unmoved.

Table 1 shows that those who do eat vegetables tend to have non-manual occupations, and the more vegetables they eat, the more likely they are to be in middle class occupations. Do vegetables make you rise in social class? Do bananas telephone? Do efficient compasses misdirect? (Can you spot the origin of the last two questions?)

Similarly, 7-vegetables-a-day types are much more likely to have a university degree than vegetable refusers, who tend to be less educated folk. Also, they are less likely to smoke and are more likely to be physically active. On the other hand, they are just as fat and almost as boozy as everyone else. Those who consumed more fruit and vegetables were generally older, less likely to smoke and more likely to be women, in a non-manual household, with degree level education. Veggie Mummies, yah?

Finally, when it comes to deaths during the study period, here’s the crunch: overall, 6.7% of the sample died during the study period of 7.7 years. The sad fact is that if you are 57 years old you have a 6.7% chance of being dead by the time you are 65 years old. (Or would have been 65, for pedants). Those who eat no vegetables have an 8.2% chance of death, the One to Three vegs a day 7.9%, the Three to Five vegs a day 6.4%, the Five to Seven 5.3% and the Seven Plus vegetables only 4.1%. So, although your chance of dying is relatively low, you can make it even lower by feasting on vegetables.

At first glance, the avid vegetable eaters have half the death rate of the no vegetable eaters. It suggests that vegetables are the cause of the difference. However, it could be that vegetables have nothing to do with it.

In table 2 they offer a “fully adjusted” Model 1: Adjusted for sex, age-group, cigarette smoking and social class; and the even more adjusted Model 2: Adjusted for sex, age-group, cigarette smoking, social class, BMI, education, physical activity and alcohol intake. Of course, as sharp eyed readers you will note that they do not offer a Model 0: adjusting for sex and age, the only things which are truly not controllable by individuals. That is a pity.

In table 2 they use hazard ratios, where eating no vegetables (the highest apparent risk category) is set to 1 and the other conditions lots of vegetables rates as 0.69. This certainly shows the differences with increasing consumption of vegetables, but no longer reveals absolute risk. I prefer table 1. In fact, I would have liked to have seen a correlation matrix. I can read those. I concede that such a matrix would not reveal covariance, but it would allow me to begin to think about the associations between the variables. One or two plots of data would also have helped. In my usual ferreting mode I had a look at the supplementary data.

At about 120 months the fruit effect dies out for some, probably artefactual, reason.

In both these adjusted models and in other variations the effect of vegetable consumption continues to be significant. They go into further detail about vegetables (good) and fruit (slightly less efficacious in keeping you alive) and note that canned fruit seems to slightly increase mortality, probably because of the sugary syrup in which they float.

The authors have bundled together factors that none of us can control like our age and sex, with factors we can control like how long we stay in education and the sort of work we do; with factors we can and probably ought to control like how much we eat and drink. All those different categories are “controlled for”. Some mistake, surely? I can understand the “control” for age. Older people are more likely to die in any time period than younger people. However, if I chose to become a university teacher, why “control” for that choice? I took up that occupation precisely because I thought it would be agreeable, if not well paid, and that I would be highly unlikely to suffer industrial accidents. My choice, plus my ability to get such undemanding light labour against, frankly, rather sparse competition, reveals something about me. It may explain my willingness to follow health advice, or it may simply be that I am a cautious man, minimising my risks in my personal and occupational life. A simple fearfulness of character could explain all the associations.

Consider the adjustments. These are based on the assumption that the cigarette smoking, social class, BMI, education, physical activity and alcohol intake are not related to something which itself has an influence on health. They are seen as imposed external factors which can influence health, rather than a series of behaviours related to an intrinsic factor: system integrity. System integrity is a hypothesized intrinsic characteristic which gives you a good body and a good mind, such that you are healthy and intelligent. This may be related to your genetics and/or a favourable beginning in utero. The one give-away sign of system integrity is fast reaction times to simple stimuli. See the Edinburgh group under Ian Deary for all these findings.

Seen this way, the intelligent live longer and healthier lives not because they are wise, but because they are lucky. They eat vegetables because it seems to be the clever thing to do from a health point of view, and perhaps because they can work out that the need for protein from meat is relatively small, so vegetables are more cost-effective. They may even like the taste of them. They also wear seatbelts, use condoms, brush their teeth, don’t smoke, go for walks, don’t eat or drink too much, study hard, strive to get good jobs and always save money.

The conclusion of this study is that we should eat our vegetables, and 7 portions rather than only 5. Perhaps so. It is still possible simply that bright people live longer, even when they are slightly plump and somewhat boozy. No, my gripe is about the way they have interpreted the findings, and the assumptions which underlie their calculations of hazard ratios. The authors make it clear that “This study has found a strong association, but not necessarily a causal relationship. There are additional unmeasured confounders not included in the analyses, including other aspects of diet.” However, they go on to mention other dietary factors, not the psychological ones.

Vegetables may be good for you. But I have been assured that scientists make a most delicious, nourishing, and wholesome food, whether stewed, roasted, baked, or boiled; and I make no doubt that it will equally serve in a fricassee or a ragout.

12 comments:

too bad they couldn't control for IQ - at least they used proxies like education & social class. too bad they couldn't control for genetics by using identical twins, & randomly assigning (condemning:) some to eat veggies... i agree with you & JayMan: the most consistent & parsimonious explanation is that smarter people live longer - & are willing to eat veggies (but they don't necessarily live longer b/c of the veggies:)

Dr Oyinlola Oyebode writes in to explain:"the reason the graph flattens out at 120 months is because we had people who were surveyed between 2001 and 2008 and followed them up until 2013 ie: the longest follow ups we had were 12 years = 120 months. I guess I should have cut the graph off at that point to save confusion but it was requested by a reviewer so I just took it as it was and stuck it in. Maybe in 5 years time we will continue to see the lines going down in parallel". Sometime she will do the age and sex basic control condition. Hat tip for the fastest ever author reply.

Thank you my man! You've basically said all the things I would have so now I don't have to.

I agree, getting IQ and personality data should at least be a bare minimum in these types of studies. But, we can't do that: we might find that all their supposed correlations go away, as they do whenever someone tried to control for these things when we actually have IQ.

It's worth pointing out that the work of Lars Penke has failed to find that rare variants are associated with low IQ (He has also found that paternal age is not associated with reduced IQ, once parental IQ is taken into account. This is quite unlike mental disorders like autism, schizophrenia, or bipolar disorder, where there is a clear paternal age effect). This casts doubt on the connection between genetic load and IQ (and, by extension, physical health).

So true, there's no need to force anyone into something they don't like but rather convince them that it's good anyways and they'll eat it if they likeHealthy Recipes. I also give in to some unhealthy snacks but I feel it's right since I don't really eat unhealthy food all the time. :)

These points apart, there is sort of a statistical issue I feel here with this kind of research & much of social science; when you control for X you are essentially examining a different population to that which you started off studying. In the real world, of course, people do have different incomes, IQs etc, and there is a false sense of certainty given in pretending that they do not.

This is probably related to the widespread misuse of ANCOVA (analysis of covariance), where the assumption of independence of the covariate from the independent variable is routinely violated. In the mental health & educational psychology literature it is very common to study two naturally occurring groups, measure them for X at baseline, and then examine outcome Y controlling for X. But a fundamental assumption of ANCOVA is that the groups do not differ on X, and non-randomized situations they usually will. I lose count of the number of non-randomized (or badly randomized with small sample sizes) educational intervention study reports I have read that do something like this.

Imagine a scenario where we have blue and red plants growing in a field. The blue plants are taller and also produce more maize. What many social scientists want to know is how much maize the blue plants would produce were they the same height as the red plants, but alas, this is an unanswerable question that neither ANCOVA nor anything else can sort out. Their height is a fundamental property of the blue plants, and altering that in your analyses alters the groups themselves. Your answers are sort of gibberish at this point.

Brian Everitt, who was much involved in the development of cluster analysis always used to say to me that it was a lamentable shortcoming of statistics that the answers to straightforward questions had to be given in numbers. He would have preferred the answers to be "Yes" or "No". Late in the day I suppose that every researcher should write out their questions in plain English, and also their assumptions when doing their data analyses. Your Red Plant and Blue Plant examples are very helpful here. Nostalgic point: I was helped to understand statistics by Brian Everitt, AE Maxwell, Julian Peto, John Rust and another 4 or 5 people, all of whom either invented new statistics and/or wrote text books and/or had distinguished careers in which statistical techniques were center stage. Good training. All errors are my own.

yep, when we control for something - say enter it in first in a multiple regression or do an ANCOVA - we are one step further away from reality. when we pretend everybody is the same on IQ, then here's what we get...

BUT, it's more useful than it sounds (!) if you predict say (the criterion variable of) achievement by first entering in (the independent variable) IQ, then enter in (say dummy coded) group membership, then IQ x group membership interaction - if the interaction is "significant" your slopes are different (relationship between IQ & achievement is different depending what group someone's in), if interaction is not significant then group membership adds above & beyond IQ in predicting achievement, meaning slopes are equal but intercepts are different (parallel slopes) & in that case the common regression line overpredicts achievement for one group & underpredicts for other, etc.

controlling for variables has its uses, but should be augmented by reality, & different ways of looking at the data such as scatterplots, graphing, etc. where you & the data are back in the real world :)

PS - my favorite way of applying a Pearson r correlation coefficient is to say "as i go up 1 std deviation on X i go up "r" standard deviations on Y" :)

There was a great talk at last year's society for social medicine conference on confounding and adjustment. The author demonstrated how you change the research question entirely depending on what you choose to adjust for. Unfortunately I don't think the abstract gives much away, but it is here incase you're interested. http://jech.bmj.com/content/67/Suppl_1/A45.1.abstract?sid=7694c1b4-f491-4ddd-b53c-f13708435180