Friday, 20 July 2018

Standing on the shoulders of giants, or slithering around on jellyfish: Why reviews need to be systematic

Yesterday I had the pleasure of hearing George Davey Smith (aka @mendel_random) talk. In the course of a wide-ranging lecture he recounted his experiences with conducting a systematic review. This caught my interest, as I’d recently considered the question of literature reviews when writing about fallibility in science. George’s talk confirmed my concerns that cherry-picking of evidence can be a massive problem for many fields of science.

Together with Mark Petticrew, George had reviewed the evidence on the impact of stress and social hierarchies on coronary artery disease in non-human primates. They found 14 studies on the topic, and revealed a striking mismatch between how the literature was cited and what it actually showed. Studies in this area are of interest to those attempting to explain the well-known socioeconomic gradient in health. It’s hard to unpack this in humans, because there are so many correlated characteristics that could potentially explain the association. The primate work has been cited to support psychosocial accounts of the link; i.e., the idea that socioeconomic influences on health operate primarily through psychological and social mechanisms. Demonstration of such an impact in primates is particularly convincing, because stress and social status can be experimentally manipulated in a way that is not feasible in humans.

The conclusion from the review was stark: ‘Overall, non-human primate studies present only limited evidence for an association between social status and coronary artery disease. Despite this, there is selective citation of individual non-human primate studies in reviews and commentaries relating to human disease aetiology’(p. e27937).

The relatively bland account in the written paper belies the stress that George and his colleague went through in doing this work. Before I tried doing one myself, I thought that a systematic review was a fairly easy and humdrum exercise. It could be if the literature were not so unruly. In practice, however, you not only have to find and synthesise the relevant evidence, but also to read and re-read papers to work out what exactly was done. Often, it’s not just a case of computing an effect size: finding the numbers that match the reported result can be challenging. One paper in the review that was particularly highly-cited in the epidemiology literature turned out to have data that were problematic: the raw data shown in scattergraphs are hard to reconcile with the adjusted means reported in a summary (see Figure below). Correspondence sent to the author apparently did not achieve a reply, let alone an explanation.

Even if there were no concerns about the discrepant means, the small sample size and influential outliers in this study should temper any conclusions. But those using this evidence to draw conclusions about human health focused on the ‘five-fold increase’ in coronary disease in dominant animals who became subordinate.

So what impact has the systematic review achieved? Well, the first point to note is that the authors had a great deal of difficulty getting it accepted for publication: it would be sent to reviewers who worked on stress in monkeys, and they would recommend rejection. This went on for some years: the abstract was first published in 2003, but the full paper did not appear until 2012.

The second, disappointing conclusion comes from looking at citations of the original studies reviewed by Petticrew and Davey Smith in the human health literature since their review appeared. The systematic review garnered 4 citations in the period 2013-2015 and just one during 2016-2018. The mean citations for the 14 articles covered in their meta-analysis was 2.36 for 2013-2015, and 3.00 for 2016-2018. The article that was the source of the Figure above had six citations in the human health literature in 2013-2015 and four in 2016-2018. These numbers aren’t sufficient for more than impressionistic interpretation, and I only did a superficial trawl through abstracts of citing papers, so I am not in a position to determine if all of these articles accepted the study authors’ conclusions. However, the pattern of citations fits with past experience in other fields showing that when cherry-picked facts fit a nice story, they will continue to be cited, without regard to subsequent corrections, criticism or even retraction.

The reason why this worries me is that the stark conclusion would appear to be that we can’t trust citations of the research literature unless they are based on well-conducted systematic reviews. Iain Chalmers has been saying this for years, and in his field of clinical trials these are more common than in other disciplines. But there are still many fields where it is seen as entirely appropriate to write an introduction to a paper that only cites supportive evidence and ignores a swathe of literature that shows null or opposite results. Most postgraduates have an initial thesis chapter that reviews the literature, but it's rare, at least in psychology, to see a systematic review - perhaps because this is so time-consuming and can be soul-destroying. But if we continue to cherry-pick evidence that suits us, then we are not so much standing on the shoulders of giants as slithering around on jellyfish, and science will not progress.

2 comments:

Not exactly on the area of systematic reviews but I have found, at least in non-peer-reviewed policy papers, a disturbing tendency to cite or quote from papers that may be only tangentially related to the issue or which tend to ignore dissenting views.

Actually, now that I think of it, a lit review for a policy paper is a systematic review in many ways.

It sometimes appears that authors are not cherry-picking as much as hoovering up anything that a PubMed or similar search tosses up.

Either way, cherry-picking or hovering gives us a seriously distorted view of the area of knowledge.

Thanks for this commentary, thought provoking and an interesting read.

The relationships between 'research and research', as in development of a field of human knowledge and 'research and policy', as in providing advocacy and an argument for policy makers to pick-up on (or even for practitioner implementation) are distinct areas of epistemology in several ways.

Yes - we need to be more precise with our science; and also we need to focus the resources (intellectual and time-based) required for systemic review carefully. What is the best guidance for us in that quest?