This is the second in a series of Mental Elf blogs produced in partnership with the British Journal of Psychiatry. Each month we select a new paper from the BJPsych to be blogged by a young psychiatrist with research experience. We’re really excited about this new venture and look forward to highlighting important new evidence that has implications for practice. And now, over to Andrew Sommerlad for the second blog!

At least one in twenty elderly people have major depressive disorder and two to three times as many have sub-threshold depressive symptoms (Meeks T. et al., 2011). They specifically are at risk of depression due to physical illness, bereavement and increasing social isolation.

As well as being deeply distressing for the person and their family and friends, depression in older people is associated with increased dementia risk (Ownby R. et. al., 2006), worse day-to-day functioning, increased contact with healthcare services (Creed F. et. al., 2002) and suicide (Iliffe S. et. al. 2010).

The core features of depression in old age are the same as in younger populations:

Pervasive low mood

Loss of pleasure in activities

Feeling guilty or worthless

Marked change in appetite, weight or sleep

Observable slowing or agitation of movements

Tiredness and poor concentration

Thoughts of dying or suicide

A range of effective treatment options are available, including medication, psychological and social therapies, but rates of older people in treatment are lower than young adults (Rodda J. et. al. 2011). This is because depression is more difficult to detect as symptoms are incorrectly attributed to physical illness or dementia (Wells C. et. al. 1979), and due to therapeutic nihilism (Burroughs H. et. al. 2006).

Though stopping short of recommending screening, NICE does suggest that clinicians are alert to people who may have depression (NICE, 2009).

The authors of a new systematic review (Tsoi et al, 2017) set out to determine whether people with depression can be identified by an extremely short screening test.

Two-question screening test

During the last month, have you often been bothered by feeling down, depressed or hopeless?

During the last month, have you often been bothered by having little interest or pleasure in doing things?

Answering yes to either of these questions is considered a positive test result, warranting further assessment.

Rates of older people in treatment is lower than young adults because depression is more difficult to detect and there can be therapeutic nihilism.

Methods

This study is a systematic review and meta-analysis of studies evaluating how the two-question screen performed compared to other instruments. The authors scrutinised review articles to generate a list of depression screening instruments and then searched four scientific databases looking for studies assessing the accuracy of the instruments they found in identifying depression in people aged over 60 years, compared to gold-standard diagnosis using accepted criteria. The authors excluded studies not written in English.

Two researchers extracted information from the included studies about their publication year and location, participants, measurements and results. The authors adapted a rating scale for assessing study quality. Random effects meta-analysis combined the sensitivity, specificity and diagnostic odds ratios from different studies for each screening test. Sensitivity analyses investigated whether diagnostic performance differed in studies of major depressive disorder compared to less severe depression, and for people from clinical settings and nursing homes, compared to those recruited from general populations or primary care.

Measuring a screening test’s performance

A new test’s performance is measured against a ‘gold-standard’ more thorough and established test. The perfect new screening test would be brief and easily administered to many people, and should agree completely with the gold-standard, i.e. identify correctly everyone who has the illness and not mis-diagnose anyone who is healthy. As this is rarely possible, a trade-off is needed. A test usually aims to pick up almost all unwell people, but it is ok if it identifies as potentially unwell, some people who eventually prove to not have the illness. These ‘false positives’ can be given a clean bill of health later when more thoroughly assessed.

The following measures of test performance are used in this study:

Sensitivity = the proportion of people with the illness, who are identified as unwell by the new test.

Specificity = the proportion of people who do not have the illness who are correctly identified as healthy.

Diagnostic odds ratio = ratio of the odds of a positive result in unwell people compared to odds of a positive test result in a well person.

For comparison, liquid based cytology used in the UK national cervical cancer screening programme has sensitivity of around 90% and specificity 70% (Coste J. et. al. 2003, Arbyn M. et. al. 2008).

From the six studies evaluating the two question screen, for which significant heterogeneity in results was found, the combined:

Sensitivity was 0.92 (95% CI, 0.85 to 0.96)

Specificity was 0.68 (0.58 to 0.76)

The diagnostic odds ratio (OR) was 23.6 (9.4 to 58.9)

This compared to other commonly used instruments:

Sensitivity

Specificity

Diagnostic OR

Two question screen

91.8

67.7

23.6

Geriatric Depression Scale

82.8

72.2

12.5

Beck Depression Inventory

85.7

73.5

16.7

Patient Health Questionnaire-9

83.4

85.8

30.5

Centre for Epidemiological Depression Scale-20

79.7

76.5

12.8

Hamilton Rating Scale for Depression

88.6

84.9

43.8

One question screen

66.4

82.1

9.0

Results for the two-question screen held up when only studies of major depressive disorder were assessed (sensitivity = 89.8%, specificity = 66.2%). The instruments for which comparison between clinical and community settings was possible seemed to be more accurate in clinical settings.

The two-question screen’s sensitivity was 0.92 and its specificity was 0.68.

Conclusions

The authors conclude that the two-question screen is comparable in its accuracy to other instruments so suggest that, considering its acceptability due to its brevity and ease of use, it should be favoured over other instruments in screening for depression in older people.

Strengths and limitations

Overall, this is an informative and well-conducted study which aids our understanding of screening instruments for depression in the elderly. The methodology was appropriate and the authors’ conclusions are accurate. However, the reporting of this study could have been more thorough and some caution is needed when interpreting the results.

Reporting of methods and results should be completely transparent and some journals mandate authors to compete a PRISMA checklist, (Moher D. et. al. 2009) reporting adherence to gold-standard conduct, and encourage prospective registration of study protocols in a database such as PROSPERO. This paper does not include either of these so some information about the study is lacking.

A search strategy should be sufficiently detailed to allow a reader to reproduce the search. However, we are not provided with the full search terms and, rather than a single search looking for studies evaluating screening tools of depression in older people, the authors used a two stage approach of generating a list of screening tools and then seeking studies for each instrument, using keywords of ‘depression’ and ‘elderly’. They also did not contact experts in the field, ask for unpublished data or include non-English language articles. These deviations from best practice may have resulted in eligible screening tools not being found.

Quality ratings for studies (online appendix DS1), are high; each tool scores a median of 7 or 8 out of 8 which raises questions about how critical these rating criteria were. The quality rating was not used in the analysis, and it would have been interesting to see a sensitivity analysis of only the highest quality studies.

We are not provided with the full information extracted from studies; instead summary information is presented for each screening instrument. Considering the high level of heterogeneity observed in the two-question screen results (heterogeneity statistics for other instruments are not presented), the interested reader could have looked for potential sources of heterogeneity in a detailed table, e.g. in an online appendix.

A potential source of heterogeneity is participant eligibility criteria differing between studies. While there is benefit in including a wide range of studies, as the authors here have done, this might have been better restricted (or examined with sensitivity analyses). Depression in dementia has different causative factors and can present with different symptoms to depression in non-demented older people (Korczyn A. et. al. 2009) so it is unusual to analyse general depression screens together with a specialised screen such as the Cornell Scale for Depression in Dementia. Similarly, the six included studies for the two question screen included such diverse participants as people with Parkinson’s disease, those receiving palliative care, people with coronary heart disease and patients on an acute medical ward. The nature of depressive symptoms in these diverse settings may well have differed and affected diagnostic performance.

Finally, for the main outcome regarding the two question screen, the included studies were only set in Ireland, US and UK, so we can only really apply the main conclusion about the performance of this test to these and culturally similar settings. The meaning of these questions to non-English-speakers may well be different and further testing would be needed to evaluate their use in other populations.

This is an informative and well-conducted study, but the reporting could have been more thorough and some caution is needed interpreting the results.

Summary

This study summarises the published evidence and finds that the two-question screen is of equivalent quality to other brief instruments for detecting depression in older people, so can be easily used by clinicians to identify those who need more thorough assessment.

Though screening is not recommended, this instrument’s performance is equivalent to those used in national screening programmes.

Considering depression’s prevalence and the high number of at risk older people who are in contact with healthcare services, including these two simple questions in your assessment might help to make a difference.

Including these two simple questions in your assessment might help to make a difference.

Andrew studied medicine at University College London before undertaking core and old-age psychiatry training in North London. He completed an MSc in psychiatric research before beginning a Wellcome Trust funded PhD fellowship at UCL. He has a particular interest in social functioning in dementia, and is currently using large longitudinal research studies to look at changes in social engagement in relation to cognitive decline.