Results ‘Mildly ill’ according to the CGI approximately
corresponded to a BPRS total score of 31, ‘moderately ill’to a
BPRS score of 41and‘markedlyill’to a BPRS score of
53.‘Minimally improved’according to the CGI score was associated
with percentage BPRS reductions of 24, 27 and 30% at weeks1, 2 and 4,
respectively. The corresponding numbers for a CGIrating of ‘much
improved’ were 44, 53 and 58%.

Conclusions The results provide a clearer understanding of how to
interpret BPRS total and percentage reduction scores in clinical trials with
patients acutely ill with schizophrenia who are experiencing positive
symptoms.

The Brief Psychiatric Rating Scale (BPRS;
Overall & Gorham, 1962) is
one of the most frequently used instruments for evaluating psychopathology in
patients with schizophrenia. Although its psychometric properties in terms of
reliability, validity and sensitivity have been extensively examined (for a
comprehensive review, see Hedlund &
Vieweg, 1980), the clinical implications of BPRS scores are not
always clear. For example, to our knowledge it has never been analysed how ill
a patient with a BPRS total score of say, 30, 50 or 90 actually is from a
clinical judgement point of view. Furthermore, in clinical studies a reduction
of at least 20% (e.g. Kane et al,
1988; Marder & Meibach,
1994), 30% (e.g. Arvanitis
et al, 1997; Small
et al, 1997), 40% (e.g.
Beasley et al, 1996)
or 50% (e.g. Peuskens & Link,
1997) of the initial BPRS score has been used as a cut-off to
define response, but what these cut-off levels mean clinically is again
unclear. The Clinical Global Impression scale (CGI;
Guy, 1976), another frequently
used instrument, is to some extent more informative in this regard: because it
describes a patient’s overall clinical state as a ‘global
impression’ by the rater, it provides results that (in contrast to BPRS
scores) can be understood intuitively by clinicians
(Nierenberg & DeCecco,
2002). The purpose of our study therefore was to find – with
statistical means – corresponding points for BPRS and CGI ratings within
a large sample of patients with schizophrenia participating in antipsychotic
drug trials. To know which BPRS score corresponds to a CGI – Severity
rating of, for example, ‘moderately ill’ or ‘severely
ill’ or which percentage BPRS reduction from baseline corresponds to a
CGI – Improvement rating of ‘minimally better’ or ‘
much better’ could increase our understanding of the clinical
implications of BPRS scores.

METHOD

Database

Original patient data from seven trials (baseline n=1979; 1361
men, 618 women; age 35.8 years, s.d.=10.6; weight 72.6 kg, s.d.=15.8; height
172 cm, s.d.=9) comparing amisulpride or olanzapine with other antipsychotics
or placebo, which used both the original BPRS
(Overall & Gorham, 1962)
and the CGI (Guy, 1976), were
pooled for this analysis (Table
1). All studies were randomised, and all but one
(Colonna et al, 2000)
were double-blind. Each trial included patients with schizophrenia or
schizophreniform disorder according to DSM–III–R or DSM–IV
(American Psychiatric Association,
1987,
1994). With one exception
(Carrière et al,
2000), all studies required various minimum scores as eligibility
criteria to assure that the patients had florid positive symptoms. Please note
that the criteria in Table 1
were eligibility criteria before the wash-out phases. Some patients had
already improved during the wash-out phases and had scores below the
eligibility criteria at baseline. The patients in the study without
scale-derived minimum scores
(Carrière et al,
2000) were all inpatients and had a mean BPRS score of 65 at
baseline, so that patients with severe symptoms were also involved in this
study. The mean BPRS total score at baseline in all studies was 58.9
(s.d.=12.2) and the mean CGI – Severity scale score was 5.2 (s.d.=0.8).
All studies used the 18-item version of the BPRS with its original anchors;
the items were not derived from the Positive and Negative Syndrome Scale
(PANSS; Kay & Fiszbein,
1987). The single items were rated on a seven-point scale (1, not
present; 2, very mild; 3, mild; 4, moderate; 5, moderately severe; 6, severe;
7, extremely severe). Thus, the range of possible BPRS total scores is from 18
to 126. The CGI – Severity (CGI–S) and the CGI – Global
Improvement (CGI–I) scales (Guy,
1976) were also available for all studies. The CGI–S
assesses the clinician’s impression of the patient’s current
illness state. The rater is asked to ‘consider his total clinical
experience with the given population’. As with the BPRS, the time span
considered is the week before the rating, and the following scores can be
given: 1, normal, not at all ill; 2, borderline mentally ill; 3, mildly ill;
4, moderately ill; 5, markedly ill; 6, severely ill; 7, among the most
extremely ill patients. The CGI–I assesses the patient’s
improvement or worsening since the start of the study using the following
scores: 1, very much improved; 2, much improved; 3, minimally improved; 4, no
change; 5, minimally worse; 6, much worse; 7, very much worse. A third item of
the CGI, which tries to relate therapeutic effects and side-effects –
the efficacy index – was not used for the analysis.

Statistical analysis

An often-used, but nevertheless inadequate, method to compare scores would
have been to regress BPRS scores on CGI scores or vice versa. Both measures
showed only median high correlations (see Results) and, therefore, regression
equations would give different results depending on the direction of the
regression equation. Linear regression treats one scale as the independent
variable measured without error and the other as the dependent variable
measured with error. This is conceptually wrong, because both variables are
measured with random error. Within the psychometric literature the search for
corresponding points on different, but correlated, measurement devices is
referred to as ‘linking’
(Linn, 1993) or, in its most
strict sense, as ‘equating’
(Kolen & Brennan, 1995).
For this study we used equipercentile linking, a technique that identifies
those scores on both measures that have the same percentile rank. We used the
SAS program EQUIPERCENTILE (Price et
al, 2001), a realisation of the algorithms described by Kolen & Brennan (1995). In the
first step, percentile rank functions are calculated for both variables. Using
the percentile rank function of one variable and the inverse percentile rank
function of the other, one then finds for every score of one variable a score
on the other variable that has the same percentile rank. The exact formulae
are described in Chapter 2 of Kolen & Brennan
(1995). With regard to our
large database, no smoothing was applied, either to the cumulative
distribution functions or to the resulting linking functions. Only evaluations
at baseline and at weeks 1, 2 and 4 were analysed, because although the
duration of the studies ranged from 4 weeks to 51 weeks not all studies
provided data for other time points, so that trial effects could have biased
the results. For each linking task we included all patients with valid values
on both measures, because analysing the data only of those who completed the
studies would have implied a selection. However, approximately 20% of the
patients withdrew between baseline and week 4. In a sensitivity analysis we
therefore included only patients who were still in the studies at week 4, so
that a rating was available at each time point. With the exception of a
somewhat more notable variation concerning the association between the
CGI–I ratings much worse/very much worse and percentage BPRS worsening
of up to 4–6% BPRS points, the results were so similar that only those
of the primary analysis are shown.

Linking of CGI–S score and BPRS total score

Figure 1 shows the result of
the linking between CGI–S rating and the BPRS total score at baseline
and at weeks 1, 2 and 4. They suggest that being considered ‘mildly
ill’ on the CGI (CGI–S score 3) approximately corresponded to a
BPRS total score of 32 at baseline and at week 1 and a total score of 30 at
weeks 2 and 4. Being considered ‘moderately ill’ (CGI–S
score 4) corresponded to BPRS total scores of 44 at baseline and 40 at weeks
1, 2 and 4. ‘Markedly ill’ (CGI–S score 5) corresponded to
BPRS scores of 55 at baseline, 53 at weeks 1 and 2, and 52 at week 4. ‘
Severely ill’ (CGI–S score 6) corresponded to BPRS scores
of 70 at baseline and 68, 67 and 65 at weeks 1, 2 and 4, respectively.
Extremely ill (CGI–S score 7) corresponded to BPRS scores of 85 at
baseline and 89, 84 and 88 at weeks 1, 2 and 4, respectively. Thus, the
results were relatively consistent over the four time points examined,
although there was a slight tendency that, for a given BPRS score, CGI ratings
were somewhat less severe at baseline and became more severe during the course
of the treatment. This effect, however, was neither large nor always
consistent.

Linking of CGI–I score and percentage BPRS change from
baseline

Figure 2 shows the linking
function between the CGI–I scale and the percentage BPRS change from
baseline at weeks 1, 2 and 4. Ratings of ‘minimally improved’
(CGI–I score 3) at weeks 1, 2 and 4 corresponded to percentage BPRS
reductions of 23, 27 and 30%, respectively. Ratings of ‘much
improved’ (CGI–I score 2) corresponded to percentage BPRS
reductions of 44, 53 and 58% at weeks 1, 2 and 4, respectively. Ratings of ‘
very much improved’ (CGI–I score 1) corresponded to
percentage BPRS reductions of 71, 79 and 85% at weeks 1, 2 and 4,
respectively. Thus there was a consistent time effect indicating that a
smaller percentage change in BPRS total score was necessary for a patient to
be considered improved 1 week after the initiation of treatment than at later
time points. This effect is also seen for the ‘no change’ rating
according to the CGI–I (score 4), which was linked with a 5% BPRS score
reduction at weeks 1 and 2 and an 8% reduction at week 4.

DISCUSSION

Although the BPRS is a frequently used and psychometrically sound
assessment device collecting explicitly certain aspects of psychotic
behaviour, the clinical meaning of a given scale value has not been anchored
to a global clinical judgement. In our study the psychometric procedure of
equipercentile linking was used to link the BPRS to a clinically meaningful
global rating. Applying this procedure in a large sample of acutely ill
patients across various multicentre studies did result in a calibration or
anchoring of the rating instrument to the clinical judgement. The linking
functions linking BPRS scores to the CGI can provide a better understanding of
the BPRS and can help clinicians to interpret the results of clinical trials.
For example, the data indicate that trials in which the average BPRS total
score at baseline was 40 are unlikely to have examined a severely ill
population. Furthermore, frequently used cut-off points to define response in
treatment trials – a 20 or 50% reduction of the BPRS baseline scores –
seem to mean that on average the patients were ‘minimally
improved’ and ‘much improved’ respectively, according to the
raters’ clinical impression. In fact, the data suggest that somewhat
higher cut-off points than 20% (rather 25–30%) and 50% (rather 55%)
might be better indicators of ‘minimal improvement’ and ‘
much improvement’.

These results are relevant not only for the readers of publications on
antipsychotic drugs, but also for the definition of response criteria of
future trials: considering that a 25% BPRS score reduction means that the
patient is just minimally better compared with baseline, this criterion might
be a useful cut-off for studying patients with treatment-refractory disease,
but not for the ‘average’ patient. In treatment-refractory cases
even a small improvement in symptoms might be clinically important. However,
in acutely ill patients with non-refractory conditions, a 50% criterion (i.e.
clinically much improved) would seem to be a more appropriate reflection of
clinically meaningful improvement, because such patients usually respond well
to antipsychotic drugs (Cole,
1964). Considering only a 25% reduction (i.e. only minimally
improved) of the overall symptoms as a ‘response’ would probably
not meet clinicians’ expectations of drug treatment and would be of
questionable clinical importance. In contrast to our findings, recent
antipsychotic drug trials in patients with acute exacerbations often used a 20
or 30% criterion to distinguish between responders and nonresponders
(Marder & Meibach, 1994;
Arvanitis et al, 1997;
Small et al, 1997).
Ironically, the 20% cut-off level was indeed initially used in a study of
patients with refractory disease (Kane
et al, 1988), but was subsequently widely applied in
studies of non-refractory cases.

The main strength of our analysis is the large number of patients, which
should make the results rather robust. However, a number of limitations of our
analysis must be considered. Despite the widespread use of the CGI in drug
trials, there have been only a few studies of its psychometric
characteristics, so the CGI is certainly not an ideal measure for ‘
evaluating’ the BPRS. In 116 patients with panic disorder and
depression, Leon et al
(1993) found good concurrent
validity and sensitivity for change using the CGI. In two trials, Khan et
al (2002,
2004) showed that the
sensitivity of the CGI–S and CGI–I was similar to that of the
Montgomery–Åsberg Depression Rating Scale
(Montgomery & Åsberg,
1979) and the Hamilton Rating Scale for Depression
(Hamilton, 1960). However,
Beneke & Rasmus (1992)
criticised the CGI on semantic (e.g. asymmetric scaling), logical (e.g.
non-meaningful combinations of CGI–S and CGI–I ratings) and
statistical grounds (e.g. relatively low test–retest reliability in a
heterogeneous sample of patients with ‘schizophrenic, depressive and
anxiety disorders’).

Although the algorithms for linking and equating are the same, the terms
have different meanings. For example, equating two forms of a college
admission test is done to assure that both forms can be used interchangeably
and provide the same decision. In our application the meaning is far less
rigorous as the instruments differ, showing correlation coefficients for the
CGI–S v. BPRS total score comparison of 0.60–0.76 in
weeks 1 to 4 and of only 0.40–0.41 at the baseline measurement. Linking
is thus best understood here as a kind of anchoring that helps in
understanding the clinical meaning of a given scale score. The correlation at
baseline was especially low. This may in part be explained by the minimum of
symptoms required at baseline by most studies, so that variability was
reduced, accounting for the relatively low correlation.

From a purely statistical point of view, correlating an implicit difference
rating (CGI–I rating) with an explicit, calculated ‘percentage
improvement’ score is problematic. It was nevertheless reassuring that
these two measures showed higher correlations than the severity scores
themselves, thus demonstrating that clinicians are able to give meaningful
differential global ratings reflecting something like a ‘relative amount
of change’. There was a time effect in the percentage BPRS reduction,
suggesting that a somewhat smaller ‘objective’ percentage change
as measured by the BPRS was necessary for patients to be considered improved
according to the CGI–I at 1 week after the initiation of treatment than
at later weeks. This result probably reflects physicians’ expectations,
which may be lower after short durations of treatment than at later stages.
Whereas the investigators received training in BPRS rating before the trials,
this was usually not the case for the CGI. Interrater reliabilities for the
BPRS between 0.87 and 0.97 have been reported
(Collegium Internationale Psychiatrae
Scalarum, 1996). A small study reported interrater reliabilities
for the CGI–S and the CGI–I of 0.66 and 0.51, respectively (37
physicians rating 12 patients with dementia;
Dahlke et al, 1992).
Recently a somewhat better-anchored CGI scale for patients with schizophrenia
has been developed (the Clinical Global Impression – Schizophrenia
scale) and its validity and reliability have been verified: the interrater
reliability was 0.75 (Haro et al,
2003). A replication with this new scale would be useful. Such
data could also show that a more objective measure of clinical psychopathology
might be obtained by raters who were masked to which week of participation the
patient is in.

It is important to emphasise the nature of the patients involved, as the
results might not be the same when different patient populations are analysed.
We assembled a data-set composed of people suffering from acute exacerbations
of schizophrenia with positive symptoms. For example, in patients suffering
only from negative symptoms, the relationship between the BPRS and the CGI –
Severity scale might be very different. Such patients could be
considered severely ill according to the CGI, but would have relatively low
BPRS total scores owing to a lack of positive symptoms. Similarly, a 50% BPRS
reduction might have a different clinical meaning in patients with low
baseline BPRS scores. We therefore hasten to emphasise that our results relate
only to acutely ill patients with schizophrenia with positive symptoms similar
to those included in our database.

Despite these limitations, we consider that the results are an important
contribution to a better understanding of the clinical meaning of the BPRS
total score and percentage BPRS change in score in acutely ill patients with
schizophrenia. Future studies should examine other patient populations (e.g.
patients with residual schizophrenia and predominant primary negative
symptoms) and should use anchored versions of the CGI and specifically trained
raters. In addition, efforts are under way to develop criteria for ‘
remission’ that could be applied to schizophrenia and used in
evaluating treatment effects in a more objective and consistent fashion
(Andreasen et al,
2005).

Clinical Implications and Limitations

CLINICAL IMPLICATIONS

The linking functions linking Brief Psychiatric Rating Scale (BPRS) total
scores to the Clinical Global Impression (CGI) severity ratings provide
certain anchors that may help in understanding the results of clinical
trials.

Studies in acutely ill, treatment-responsive patients with schizophrenia
and positive symptoms should use a 50% BPRS score reduction cut-off to define
response rather than lower thresholds.

Linking CGI improvement ratings with percentage BPRS reduction showed a
time effect indicating that a smaller percentage BPRS change was necessary for
a patient to be considered improved 1 week after the initiation of treatment
than at later time points and suggesting that expectation bias might play a
part part in assessing improvement.

LIMITATIONS

The results are only generalisable to patients with schizophrenia and at
least moderate positive symptoms.

The psychometric properties of the CGI have not been well evaluated, and
the analysis should be repeated using better-anchored versions of this
measure.

Acknowledgments

We are indebted to Sanofi-Aventis and Eli Lilly for allowing us to analyse
individual patient data from their database. The study was supported by a
grant from the Zucker Hillside Hospital Intervention Research Center for
Schizophrenia (MH-60575).