Appraise the evidence

Appraisal of references typically has a number of stages. For example, ‘first appraisal’ based on abstracts: cutting down on ‘noise’ to hone in on the relevant condition (e.g. chronic asthma; migraine; etc.) and high-quality studies of the correct methodology (e.g. systematic reviews; RCTs; diagnostic studies, etc.)

Once a systematic search has been run, the titles/abstracts of retrieved papers need to be assessed using the criteria stated for that particular review/question (often this will be done by the person undertaking the search). If an abstract indicates that the study definitely does not match the criteria, you would exclude it. If the first appraiser is unable to definitively exclude a study using the information in the title/abstract, they would include the reference in the selected set of references which are earmarked for further consideration.

‘Second appraisal’ based on full papers

References for further consideration, are passed on for additional full-text evaluation in order to decide which papers will be used and cited in the final content (often this will be done by the main author). If they are undertaking a systematic review/overview, they will also justify the exclusion of any references they wish to make. They retain these inclusion/exclusion forms for use in their review, so that an excluded study list can be generated and their decisions about exclusions recorded.

‘Third appraisal’ (QA check) based on full papers

Completed systematic research reports are usually subjected to a further review of the selected material, validating the quality and relevance of the included studies as appropriate. This may be done independently by a co-author, or an editor/final assessor before the report is finalised.

Parallel Appraisal

Usually, authors of systematic reviews will have at least two individuals who independently assess references at both abstract and full paper stages, discussing any differences in opinion and resolving them (using an additional assessor to act as final arbitrator if necessary), to come to a consensus on which studies should be included and excluded. If you wish to follow an evidence-based style approach for study selection, it is generally recommended that you should involve more than one person in the appraisal and choice of references to include.

Appraising the quality of study methods

It should be noted that no study is perfect. For practical purposes, it might be helpful to consider three possible scenarios with regard to study methods:

If the methods were sound – we would include

If the methods were suboptimal – we would include but would cite reservations and appropriate caveats with regard to interpreting the result

If the methods were unsound, that is, there was a fatal flaw or a reasonable possibility of biased results – we would exclude.

Studies are assessed whether they have minimum quality criteria (that is, in terms of the minimum acceptable size, follow-up, level of blinding [if blinding is possible], length of follow-up, etc). However, minimum quality criteria are just that, minimum criteria. For example, it may be that a trial describes itself as randomised but on further reading it becomes apparent that treatments were allocated by the day of admission or by alternate allocation. We would then describe this trial as quasi-randomised and may exclude it on this basis.

Similarly, with regard to systematic reviews, quality may vary widely between reviews with regard to the methods employed and the extent to which data are reported. Indeed, on occasion, it may be difficult to decide whether a review is systematic or not if the search methods used are poorly reported. It is impossible to be comprehensive with regard to all the possible methodological issues that might arise or with regard to what their relative importance might be. For example, one element that is markedly weak may throw doubt on the entire conclusions of the study (a ‘fatal flaw’).

Quality issues that you may consider when assessing a systematic review might include:

Are the questions and methods of the review clearly stated?

Are the search methods described, and are they comprehensive and reproducible?

Are explicit methods used to determine which studies are included in the review?

Was the methodological quality of primary studies assessed?

Was the selection and assessment of primary studies appropriate, reproducible, and free from possible bias?

Are differences in individual study results adequately explained?

Are the results of primary studies combined appropriately?

Are the reviewers’ conclusions supported by data cited?

Quality issues that you may consider when assessing an RCT might include:

Were the setting and study population clearly described?

Was assignment genuinely random and similarity between groups documented?

Was allocation to study groups adequately concealed from participants and investigators?

What was the level of blinding?

Were all clinically relevant outcomes reported?

Were over 80% of people who entered the study accounted for at its conclusion?

Did the RCT analyse in groups to which people were randomised to (intention-to-treat analysis)?

Were both the statistical significance and the clinical importance of the statistical result considered?

Considering evidence on harm

Of all study types, well conducted RCTs or systematic reviews of RCTs provide the best evidence of causality, that is, that one treatment causes an effect compared with another treatment. Usually, you would also report any data on adverse effects reported by included RCTs or systematic reviews of RCTs. However, RCTs are often underpowered to detect adverse effects, some of which may be serious but rare. Because of this, you may also need to include, on occasion, non-RCT data that gives information on adverse effects to enhance the practical and clinical relevance of your findings.

It should be noted that observational data may be more subject to confounding or bias. Bias due to non-comparability of groups is more likely in cohort studies, and more likely still in case-control studies. Case series or case reports are the weakest forms of evidence, although associations with harms in case reports have often been subsequently confirmed, and have sometimes provided the first indication that a given treatment is associated with a particular adverse effect.