Category: Critical Reading

The articles in this section come from a series of articles that I wrote for Update magazine in 2005/6 and for Pulse in 2001 as well as a few other articles which are relevant when critically appraising the medical literature.

This free access 2014 article in BREATHE unpacks what goes into Systematic reviews and gives pointers about how to appraise them. We use an example of the Cochrane systematic review comparing spacers with nebulisers to deliver salbutamol in acute asthma in adults and children.

Information Overload

The sheer volume of material published in medical journals each week is well beyond any of us to keep up with, and in order to save us from drowning in information the writers of systematic reviews aim to collect together and appraise all the evidence from appropriate studies addressing a focussed clinical question. The Cochrane Collaboration has been working at this task for the past twenty years and, in September 2016, there were 7038 completed reviews on the Cochrane Database of Systematic reviews and a further 2520 protocols that will become reviews in the future.

The File-Drawer Problem

So what was wrong with the traditional narrative review from an expert in the field? The previous emphasis has been on understanding the mechanisms of disease and combining this with clinical experience to guide practice.(1) The main problem with this approach is that we all have our preferred way of doing things, and there is a natural tendency to take note of articles that fit in with our view. We may cut these out and keep them in our filing cabinet, whilst articles that do not agree are filed in the rubbish bin. This means that when asked to review a topic it is natural for an expert to go the drawer and quote all the data that supports their favoured approach.

What is a Systematic Review?

So how is a systematic review different? Let’s start with a definition:

Systematic review (synonym: systematic overview): A review of a clearly formulated question that uses systematic and explicit methods to identify, select and critically appraise relevant research, and to collect and analyse data from the studies that are included in the review. Statistical methods (meta-analysis) may or may not be used to analyse and summarise the results of the included studies.

The difference here is that the way the papers were found and analysed is clearly stated. The reader still needs to be satisfied that the search for papers was wide enough to obtain all the relevant data. Searching Medline alone is rarely enough, and if only English language papers are included this may leave out potentially important evidence.

All Cochrane reviews start as a published protocol; this states in advance how the review will be carried out (searching for data, appraising and combining study data). There is therefore some protection against the danger of post-hoc analysis, in which reviewers find that by dividing up the trials in a particular way spurious statistical significance can be generated in sub-groups of patients or treatment types.

Is the Question focussed?

But we have moved on to thinking about how the review was carried out before checking whether the question being addressed is an important one. The PICO structure set out in the first article in this series(2) can be used here to check that the Patient groups, Interventions used, Comparator treatment and Outcomes are sensible. Watch out in particular for surrogate outcomes that may not relate well to the outcome that matters to the patient. One example of this can be found in trials relating influenza vaccine to the prevention of asthma exacerbations. Some trials measure antibody levels to the flu-vaccine given, but what really matters is whether asthmatics have fewer exacerbations or admissions to hospital, and there is precious little data from randomised controlled trials about this (3).

What was the quality of the trials found?

A further issue to think about in Systematic Reviews is whether the type of included studies is appropriate to the question being asked. In a previous article in this series(4) the problems of bias was discussed. In general in questions related to treatment I would expect the review to focus on randomised controlled trials, as this will minimise the bias present in the included studies. Whilst Meta-analysis can be used to combine the results of observational studies, this is unreliable because they may all suffer from the same bias, and this will be combined in the pooled result from all the trials.

When looking at randomised controlled trials the reviewers should report whether the allocation of patients to the treatment and control groups was adequately concealed (allocation concealment). Allocation is best decided remotely after the patient is entered into the trial; even opaque sealed envelopes can be held up to a bright light by trialists who want to check which treatment the next patient will receive. Poor allocation concealment, failure to blind and poor reporting quality in reviews have all been shown to be associated with overoptimistic results of randomised controlled trials.(5)

Publication bias remains a problem, in that studies that may happen to produce results that are statistically significant are more likely to be published than ones that do not, since editors of medical journals like to have a story to present. This will never be fully overcome until all trials are registered in advance and the publication of results becomes mandatory (whether they show significant differences or not).

Forest Plots

The results of a Systematic Review are often shown graphically as a Forest plot (6). An example from the 2013 update of a Cochrane Review comparing Spacers with Nebulisers for delivery of Beta-agonists(7) is shown below.

Figure 1 Forest Plot of Hospital Admissions for Adults and Children with Acute Asthma when treated with Beta-agonist delivered by Holding Chamber (Spacer) compared to Nebuliser (edited to include data in the 2013 update of the review).

The left hand column lists the included studies, which have been sub-grouped into those relating to adults and children. The columns listed ‘Holding Chamber’ and ‘Nebuliser’ list the proportion of patients in each group admitted to hospital and the Relative Risk of admission is shown next to them as a graphical display. Admission is undesirable so the squares and diamonds to the left of the vertical line favour the spacer group. The size of the blue square relates to the weight given to each study in the analysis; this is listed in the next column and generally increases for larger studies. The width of the horizontal line is the 95% confidence interval for each study and this is reported in text in the final column.

The pooled results from adults are shown in the top diamond, for children in the lower diamond . This shows that by combining all the studies in children we can be 95% sure that the true risk of admission when using a spacer lies between 0.47 and 1.08 in comparison with using a nebuliser. There is no significant difference between the two methods and the confidence interval suggests that, in children, nebulisers are at best no more than 8% better than spacers and may be up to 53% worse.

So how can these results be translated into clinical practice? This question will be the focus of the next article in this series.

The Prescriber series on evidence-based medicine aims to provide the reader with an easy-to-follow guide to a complex topic. Using practical examples, the articles will help you apply evidence-based medicine to daily practice. In this final article, we look at how cost-effectiveness is calculated.

Previous articles in this series have described the statistical methods used to find out whether treatments are effective in clinical trials and, before embarking on cost-effectiveness analysis, it is wise to check first that there is good evidence that the treatment works. There are many extra levels of uncertainty when costs are considered, as this article demonstrates, and it is important to ensure that the foundational evidence of clinical benefit is in place before building a cost-effectiveness analysis that may rest on a treatment that has not reliably been shown to be better than placebo.

Let us take as an example the recent report on the benefit of ramipril (Tritace) in the secondary prevention of stroke from the Heart Outcomes Prevention Evaluation (HOPE) investigations.1 This was a large study of 9297 high-risk patients over 55 who were treated with either 10mg ramipril daily or placebo for an average of 4.5 years. The study showed a highly statistically significant 32 per cent reduction in the risk of stroke – 95 per cent confidence interval (CI) of 16-44 per cent reduction. The risk of fatal stroke was reduced by 61 per cent (95 per cent CI of 33-78 per cent reduction) over the 4.5 years of the study.

So here we have convincing evidence that ramipril was better than placebo. Subsequent correspondence, however, has pointed out that the presentation of the results concentrates on relative rather than absolute benefits and there is no mention of the potential costs involved in preventing strokes with ramipril.2 Various points are made in the letters, both about the way the results are presented and about the remaining uncertainty in relation to whether this effect is specific to ramipril (or ACE inhibitors in general) or is a general benefit of blood pressure reduction.

Cost-effectiveness analysis

In order to carry out a cost-effectiveness analysis, the consequences of a treatment must be measurable in suitable units (those that measure an important outcome),3 so in this case the unit could be one stroke. By making the unit of analysis ‘one stroke prevented’, the costs of caring for stroke can be set on one side and the costs of different treatments can be calculated. There is debate about whether non-NHS costs should be included, but for simplicity we will restrict ourselves to the ingredient costs of the drugs used for stroke prevention and ignore the costs of blood tests for monitoring.

We find that because strokes only occurred in 4.9 per cent of the patients in the placebo group, the impressive 32 per cent relative risk reduction actually translates into an absolute risk reduction of 1.5 per cent and a number needed to treat (NNT) of 66 people (95 per cent CI of 49 to 128) for 4.5 years to prevent one stroke.

The cost of the drug for 66 people for this length of time is about £58 000 (£196 per year each), and the confidence intervals of the NNT translate into a range of £43 000 to £113 000 to prevent one stroke. The temptation to make direct cost comparisons with the results of other drugs in reducing stroke is strong but care needs to be exercised.

There is a recognised difficulty in comparing NNTs for different treatments that do not have the same duration4 and this can be overcome by looking at cost-effectiveness as the duration of treatment is taken into account. This is because more events will be prevented with longer trial durations but the costs of treatment will go up in parallel, so the cost per event should stay the same whatever the duration considered.

There is, however, a residual problem in relation to any kind of absolute treatment effect (including cost-effectiveness). The size of absolute benefit is closely related to the baseline risk of the patient being treated, so high-risk patients will tend to show lower NNT and lower costs per event saved. This is because the relative risk reduction tends to be fairly consistent across different levels of baseline risk. This is demonstrated in the ramipril results where the relative risk reduction is very similar for patients with high and normal blood pressure,1 but those with higher blood pressure have higher absolute risks of stroke and therefore derive more benefit from treatment.

A further example of this relates to the cost of using statins. To prevent one cardiovascular event, fewer patients need to be given a statin when they are used for secondary prevention (where the baseline risk is high) in comparison with primary prevention (lower baseline risk). For this reason, before comparing costs between trials or meta-analyses of different treatments against placebo, it is important to check that the baseline risk of the patients in the placebo group is similar. In fact the patients included in the Heart Protection Study5 did have similar baseline risks of stroke and a similar duration of treatment.

Here it is reasonable to compare the costs of using a statin and this works out as more expensive, at around £100 000 to prevent one stroke – this allows for the fact that some placebo arm patients ended up on a statin and not all the active patients stayed on treatment. Aspirin is many orders of magnitude cheaper at around £500 per stroke prevented, but hopefully most patients will be receiving this already.

Head-to-head comparisons of different interventions in a single trial can overcome the above difficulties, but in order to generate the power required to reliably detect small differences, prohibitively large numbers of patients need to be recruited. This in turn raises a further question about whether the costs of finding the answer outweigh the benefits of knowing it!

Cost minimisation

It is a mistake to think that economic analysis is only about minimising the costs of the treatment itself; if this were the only concern all asthmatics would be treated with oral steroids (the cheapest option). Clearly this ignores the known risks of long-term systemic treatment with oral steroids and would be entirely unethical.

In some situations, however, there is enough reliable information to persuade us that different treatments lead to similar outcomes, and in this instance a cost minimisation approach can be used. An example of this is the use of different delivery devices in asthma. A systematic search of the literature6 found that there is little evidence for any of the devices producing superior outcomes in clinical trials, so a cost minimisation analysis was carried out in which the costs of the devices were directly compared. Since a metered-dose inhaler with spacer is the cheapest method available this is the preferred first-line delivery method to try, but of course this does not mean that some patients will not need dry powder devices or breath-activated inhalers.

Cost-utility analysis

In some cases treatments cannot be directly compared using one of the simpler methods above as the treatments alter quality and quantity of life. Many of the treatments used in cancer fall into this category and assessments have to be made that incorporate both mortality and quality of life (QoL).

One way of judging how much people value their current health status is by using a standard gamble technique. Patients are asked to consider the theoretical possibility of having a treatment for their condition that had a chance of leaving them in perfect health or causing death; the odds of each outcome are adjusted until they are unsure whether to accept the treatment or not, and this can be used to rate their current QoL. This information can then be turned into quality-adjusted-life-years (QALYs) to allow the results of treatments for different diseases to be compared.

Sensitivity analysis

Since all economical analysis requires assumptions to be made about the cost of treatments and the value of outcomes, it is usual to carry out a sensitivity analysis to see how much the results of the analysis vary when the assumptions are altered. In particular, it may be necessary to predict what would happen beyond the timescale of the trials by using modelling techniques. If the results are very unstable when the assumptions are adjusted, this should be made clear and the reader will need to interpret the analysis with more caution.

Decisions have to be made

In the real world medical needs will always exceed the ability of any healthcare system to provide them. Hard choices have to be made every day about how best to use the resources that are available to us. The best available evidence of treatment efficacy (usually from systematic review of the results of randomised controlled trials) has to be combined with an economic analysis. Then hard choices must sometimes be made.

These are the processes used by the National Institute for Clinical Excellence (NICE), and they should be as transparent as possible so that we can see how the decisions were reached, even if we do not agree with all of them.

Table 1. Glossary of terms

Cost-effectiveness analysis

A form of economic study design in which consequences of different interventions may vary but can be expressed in identical natural units; competing interventions are compared in terms of cost per unit of consequence

Cost-minimisation analysis

An economic study design in which the consequences of competing interventions are the same and in which only inputs are taken into consideration; the aim is to decide which is the cheapest way of achieving the same outcome

Cost-utility analysis

A form of economic study design in which interventions producing different consequences in both quality and quantity of life are expressed as utilities; the best known utility measure is the quality-adjusted-life-year or QALY; competing interventions can be compared in terms of cost per QALY

Sensitivity analysis

A technique that repeats the comparison between inputs and consequences, varying the assumptions underlying the estimates – in doing so, sensitivity analysis tests the robustness of the conclusions by varying the items around which there is uncertainty

I would like to thank Professor Miranda Mugford for permission to use the glossary terms from Elementary Economic Evaluation and for helpful comments on this article.

Acknowledgement

I would like to thank Professor Miranda Mugford for permission to use the glossary of terms from Elementary Economic Evaluation in Health Care and for helpful comments on this article.

An introduction to Evidence Based Medicine

What do you mean by evidence-based medicine? Whilst the term evidence based medicine (EBM) is probably familiar to most readers, it is worth pausing initially to think about what we understand by the term. The claim that a position is “evidence based” can be used to try to silence any questions or argument. On the contrary, asking questions about the evidence for any suggested course of action is at the heart of EBM philosophy. I can do no better than to quote the introduction to one of my favourite books in this area, Follies and Fallacies in Medicine(1), in which the authors describe themselves as suffering from incurable “scepticaemia”.

The aim of our book is to reach inquisitive minds, particular those who are still young and uncorrupted by dogma. We offer no solutions to the problems we raise because we do not pretend to know of any. Both of us have been thought to suffer from scepticaemia* but are happy to regard this affliction, paradoxically, as a health promoting state. Should we succeed in infecting others we will be well content.
*Scepticaemia: An uncommon generalised disorder of low infectivity. Medical school education is likely to confer life-long immunity.

The first step towards using EBM to inform our daily practice is to be prepared to question whether we always know the best course of action or have looked at the evidence that underpins the decisions that we make.

We are certainly influenced by our own past experience, what our colleagues do and what experts tell us. These often enlighten us and inform our practice, but we must also be aware that experiences are subject to chance variation, and that the person who is closest at hand may not give the best advice. For example, the experience of the last patient with a condition is not necessarily the best pointer for the next one. What we were taught in medical school may also now be out of date. We do well, however, to remember that our own experience and those of our patients are always important and worth exploring. How many times have you had the experience of suddenly understanding why a patient has presented with a longstanding headache when they let slip that a friend at work had been diagnosed as having a brain tumour?

What EBM is not

Whilst it is invaluable to know what the evidence is in relation to problems that we have to investigate and treat, you may be surprised to learn that the advocates of EBM would be the first to agree that evidence is only a small part of making clinical decisions (see box).

"First, evidence alone is never sufficient to make a clinical decision. Decision-makers must always trade the benefits and risks, inconvenience, and costs associated with alternative management strategies, and in doing so consider the patient's values."
Users Guides to the Medical Literature(2)

EBM is not a kind of cookbook medicine full of easy answers to difficult questions, and it can be quite time-consuming. In general as we dig into the evidence we find that there is much that is unknown, but tolerance of uncertainty is well known to us in primary care, and in my experience sharing this uncertainty carefully with patients is often surprisingly well received.

'For every complex problem there is a simple answer, and it's wrong.'
HL Menken

Why is EBM important?

There is an ever-increasing quantity of medical literature published each week and keeping up to date is a huge challenge. It is simply not possible to read all the relevant literature (even in our areas of special interest), so how can we stay in touch with recent developments? If you have written a personal learning plan I wonder whether this is a recognised problem and how you plan to address it?

Increasingly we are put under pressure by patients who have read about a new treatment in the paper or found an article on the Internet, or by consultants who advocate particular referral or treatment pathways for patients with particular symptom presentations. So how are we to respond?

The medical literature is a powerful resource for us, but we have to recognise that it serves many different needs. Those who commission and carry out medical research need somewhere to publish the findings of their work. This may be of high or low quality, and it is not necessarily safe to assume that publication of a paper in a peer-review journal means you can believe all that the authors say. Just look at the subsequent correspondence if you want to see what I mean!

The bottom line is whether this paper means that I should change what I am currently doing, and in order to assess this some basic skills are needed. Many of these, including some explanation of statistical concepts, will be covered in later articles in this series, but the first useful skill is being able to turn a vague concern into an answerable question.
We need to be able to pose a question that reliable research studies can answer. The structure of such a question in relation to treatment options will have 4 parts to it and can be summarised using the acronym PICO. We need to consider the Patient’s problem, the Intervention suggested, the possible Comparative treatments and the Outcomes that matter (see Box).

Thus “Does my child need antibiotics for this ear infection?” might be rephrased “In children with acute otitis media, how much difference do antibiotics make in comparison with paracetamol alone, in terms of duration of pain, deafness, recurrent infections and serious complications”.

Once we have determined the question that we want to ask, we can move on to decide what is the most valid evidence to answer the question and how to find it.

Archie Cochrane’s Challenge

I was impressed as a student by Archie Cochrane’s book ‘Effectiveness and Efficiency’ in which he pointed out that we could be as efficient as we like in providing medical care, but that if it is not effective care we are wasting our time(3). He set out a challenge in 1979 as follows(4):

It is surely a great criticism of our profession that we have not organised a critical summary, by specialty or subspecialty, updated periodically, of all relevant randomised controlled trials.

In response to this challenge the Cochrane Collaboration prepares and updates such summaries in the form of systematic reviews of the best evidence available, and there are now over 1,000 of these on the Cochrane Library. Whilst there will inevitably be gaps in this database for some time to come, increasing numbers of reviews do address issues related to primary care.
I would be the first to admit that Cochrane reviews are not light reading, but a later article in this series will address the subject of how to understand systematic reviews. Moreover part of the purpose of publications such as Clinical Evidence is to summarise the results of Cochrane reviews in a concise understandable format.

EBM in daily practice

If we want to practice better medicine we will need to keep up to date with new developments and decide how to integrate them into our practice. The concept of Clinical Governance challenges us to demonstrate whether we have been able to measure changes in our practice as a result. This can be challenging and exciting but we have to be realistic about how much can be achieved in the face of numerous demands made upon us and the volume of uncertainties that we face every day. We also need to avoid efficiently implementing treatments that are not effective!

There is little point wasting time looking for answers that probably do not exist, and in my experience the quickest place to start looking is in a synopsis of published research that has already been assessed for quality, such as Clinical Evidence or Best Evidence (an electronic summary of Evidence Based Medicine Journal and ACP Journal Club). Whilst searching Medline may be more familiar the best data tends to be buried in a sea of other material. Again this will be dealt with in more depth in a future article.

So if all this sounds like hard work – it is! But it is worth it and it can be fun, so look out for the future topics in this series that may change the way you read journals and perhaps even how you practise in the future.

In an article on sub-group comparisons I warned about the danger of paying too much attention to results from patients in particular sub-groups of a trial, arguing that the overall treatment effect is usually the best measure for all the patients.

In the same way, when the results of all available clinical trials are combined in a Systematic Review (for example in a Cochrane review) care is still required in the interpretation of the results from each individual trial, and the main focus is on the pooled result giving the average from all the trials. The results are often displayed in a forest plot as demonstrated below. The result of each trial is represented by a rectangle (which is larger for the bigger trials) and the horizontal lines indicate the 95% confidence interval of each trial. The diamond at the bottom is the pooled result and its confidence interval is the width of the diamond.

As hospital admissions for acute asthma were rare in each trial (shown in the columns of data for Holding Chamber and nebuliser) the uncertainty of the individual trials is seen in wide confidence intervals but when these are pooled together the uncertainty shrinks to a much narrower estimate. The pooled odds ratio of one indicates no difference shown between delivery methods for beta-agonists in acute asthma as far as admission rates are concerned, but the estimate is still imprecise and compatible with both a halving or a doubling of the odds of being admitted to hospital. So we have to say that we do not know whether there is a difference in the rate of admissions between the two delivery methods.

Before all the results are combined it is wise to carry out statistical tests to look for Publication Bias. There is evidence that positive results from Clinical trials are more likely to be published in major journals, and in the English language than similar trials that report negative results. When published studies are combined this leads to a tendency to overestimate the benefits of treatment. The easiest way to look for this is using a funnel plot of the results from the trials, where the results of each trial are plotted against the size of each study. Chance variations mean that small studies should show more random scatter in both directions around the pooled result. If all the small studies are showing positive results there is a suspicion that other small studies exist with negative results but were not published. The funnel plot shown below is taken from a Cochrane review of the use of Nicotine gum for smoking cessation and is reasonably symmetrical.

A further important check is to look for Heterogeneity. The individual trials will again show chance variation in their results and in a Systematic Review it is usual to test whether the differences are larger than those expected than by chance alone. The Forest plot above shows that the Heterogeneity in this set of trials is quite low. However if significant Heterogeneity is shown (in other words the results are more diverse than expected) it is recommended to explore the reasons why this may be. Although statistical adjustments can be made to incorporate such Heterogeneity (using a so called Random Effects Model) this should not be accepted uncritically. It may be more sensible not to try to combine the trial results at all.

An example of this can be found in the BMJ in October 1999 in which a group from Toronto published a meta-analysis of Helicobacter eradication (1). The statistical tests showed considerable Heterogeneity between the trials that was largely ignored by the authors. Inspection of the trials shows that there were two types; some with outcomes measured at six weeks using single treatments and others using triple therapy and measuring dyspepsia at one year. There is no good clinical reason to put these together and this may well explain the diversity of the results (2).

The message is to use your common sense when deciding whether the differences between the outcomes measured and the treatments used in each trial mean that it is safer not to calculate a single average result (not least because the average is not easy to interpret and apply to clinical practice).