The modified Rankin Scale (mRS) is the most prevalent functional outcome measure in contemporary stroke research.1 A weakness of mRS grading is the potential for interobserver variability. Variability implies end point misclassification and can weaken statistical power.1 Various attempts to quantify mRS reliability have been reported.

Clinical trial use of the mRS is global2 and often used by research nurses and professions allied to medicine.2 A contemporary, systematic review of the international literature, including allied healthcare journals, was performed.

Methodology

Two clinical researchers independently reviewed the literature. Throughout the process, we adhered to Meta-analysis of Observational Studies in Epidemiology (MOOSE) guidelines for meta-analysis.3

Participants

Study populations included human stroke survivors only. We used no restrictions for mRS assessor.

Study Methodology

Bias and Trial Quality

As a minimum data set to allow assessment of trial quality, we collected details on: patient selection; blinding of mRS interviewers (to others scores); mRS methodology; number of patients/observers; details of withdrawals; and dropouts.

Outcomes

No restriction on the basis of mRS assessment methodology was applied. Studies using mRS derivatives (Rankin Scale and Oxford Handicap Scale) were reviewed for references but not included.

Search Strategy

A comprehensive battery of crossdiscipline electronic databases were interrogated: AMED; British Nursing Index; CINAHL; Embase; Health and Psychosocial Instruments; Internurse.com; Medline; and PsychINFO (all inception to December 2008). Key words were formulated using MeSH headings and designed to be as inclusive as possible (Figure).

To identify studies not yet in print, proceedings of scientific meetings were hand-searched: International Stroke Conference; European Stroke Conference; and World Stroke Congress (January 2006 to November 2008). Bibliographies of retrieved articles were searched for further references and the process repeated until no new articles were found.

We retrieved full text of articles that either reviewer suspected may be relevant. Data were extracted according to prespecified criteria. Decisions on inclusion were by consensus. Where potentially relevant data were not published, electronic contact with original authors was attempted. For studies not published in English, professional translation services were used.

Statistics

To allow for comparison and where data permitted, we described results using κ; quadratic weighted κ, and percentage agreement. Based on previous work suggesting a beneficial effect of a structured interview approach,4 we performed separate analysis comparing “structured” and “traditional” mRS. A one-group descriptive study using average absolute difference with a fixed effects model was performed using MIX software Version 1.7 (www.mix-for-meta-analysis.info).

Results

The review profile is detailed in the Figure. Ten studies involving 587 patients were included in the final analysis (reference list available as supplemental data; available at http://stroke.ahajournals.org). Median number of included patients was 47; median number of researchers performing mRS was 2.

Interobserver variability of mRS varied from “near perfect” (weighted κ=0.95) to “poor” (κ=0.25). Overall, reliability was “moderate” for the 2 approaches to mRS (Table 1 and 2⇓). Three studies4–6 (162 patients) measured intraobserver variability of mRS. Overall reliability was very good with combined weighted κ=0.94 and percentage agreement 84%.

In the included studies, diverse methodologies were used to administer and study mRS (Supplemental Table, available online at http://stroke.ahajournals.org). No study met our “minimum” criteria to allow assessment of quality: no description of patient selection (n=5); no data on blinding (n=5); inadequate description of mRS methodology (n=2); and no description of location/timing of mRS (n=2). As a result, we included all relevant studies regardless of methodological quality.

Discussion

We have demonstrated that overall reliability of mRS is moderate, but there remains potential for improvement. The effect of structured interview on reliability remains unproven; apparent benefits seen with combined κ analysis were lost when “weighted” κs were applied. We should be cautious in interpreting these data; only 4 studies purported to examine the structured approach and almost two thirds of the data came from studies performed by the authors of the original structured interview. The nonparametric nature of κ does not allow for comparative meta-analysis and so the safest conclusion is that structuring mRS may partly improve mRS reliability, but effects have not been consistent.

It is interesting that studies with larger numbers of patients and observers reported poorer reliability. The importance of interobserver reliability in clinical trials becomes apparent when the number of potential end point assessors is considered. In the recent Stroke-Acute Ischemic NXY Treatment (SAINT) trials, >1000 assessors from 25 countries were trained in outcome assessment.7 We have shown that numbers of assessors included in mRS reliability studies are considerably smaller. The ideal methodology to assess mRS reliability would involve observers of differing backgrounds and from differing international centers. Only one study approaches this “ideal” and it reports a concerningly low reliability for standard mRS.4 We should note that all included studies measured reliability across mRS 0 to 5. For clinical trials, a grade of mRS 6 (death) is often added; addition of this objective end point may improve overall reliability.

Intraobserver studies suggested excellent reliability. However, 2 studies measured mRS at distinct periods in the patient’s recovery and thus are prone to recall bias and potential for functional improvement between assessments.4,5 Although theoretically interesting, intraobserver variability may be of less relevance to clinical trials, in which primary outcome assessment is performed once only.

There was heterogeneity between studies in several important aspects of methodology (Supplemental Table). As an example, only one study6 made use of the recognized mRS training resource.7 This study reported no beneficial effect of structured interview, perhaps suggesting that structured approach is unnecessary if assessors are adequately trained. Collation of studies with differences in methodology potentially weakens our meta-analysis but is perhaps necessary; recent review of mRS reported substantial heterogeneity in its application.1 Quality of studies varied and for all those included, certain data were incomplete. Again this weakens our analysis because we were unable to exclude potentially biased studies.

Heterogeneity was further evident in the statistical methods used in the studies: κ; weighted κ; intraclass correlation coefficient; and percentage agreement were all used. All are appropriate and there is no accepted optimal test. However, the resultant data were not readily interchangeable and offered limited potential for meta-analysis without access to original individual patient data. Our own use of statistics demands discussion. No universally accepted analysis method for multiple κ statistics has been described. Recognizing this limitation, we used a group analysis technique that made the fewest assumptions of the underlying data.

Accepting these limitations, our study does have certain strengths. Literature searching was comprehensive and systematic, considering reports from non-English and “nonmedical” sources and excluding studies that would not help describe mRS reliability in a clinical trial setting.

There remains uncertainty regarding mRS reliability. Available studies are likely underpowered and have design flaws that limit generalization. Studies closest in design to large-scale trials demonstrate potentially significant variability. Although we await definitive data on mRS reliability, we must acknowledge a degree of interobserver variability is inherent in mRS.

Acknowledgments

We are grateful to the National Health Service library services of North Glasgow for document retrieval and Mr Aaron Ward for translation services. We acknowledge all those researchers who responded to our electronic enquiries.

Disclosures

K.R.L., M.W., J.D., and T.J.Q. have been awarded academic grant support from the Chief Scientist Office to continue work on developing stroke outcome assessments using mRS and all have published data in support of mRS as the optimal end point for acute stroke trials.