Abstract

Background Psychiatric disorders are among the top causes worldwide
of disease burden and disability. A major criterion for validating diagnoses
is stability over time.

Aims To evaluate the long-term stability of the most prevalent
psychiatric diagnoses in a variety of clinical settings.

Method A total of 34 368 patients received psychiatric care in the
catchment area of one Spanish hospital (1992–2004). This study is based
on 10 025 adult patients who were assessed on at least ten occasions (360 899
psychiatric consultations) in three settings: in-patient unit, 2000–2004
(n=546); psychiatric emergency room, 2000–2004
(n=1408); and out-patient psychiatric facilities, 1992–2004
(n=10 016). Prospective consistency, retrospective consistency and
the proportion of patients who received each diagnosis in at least 75% of the
evaluations were calculated for each diagnosis in each setting and across
settings.

Results The temporal consistency of mental disorders was poor,
ranging from 29% for specific personality disorders to 70% for schizophrenia,
with stability greatest for in-patient diagnoses and least for out-patient
diagnoses.

Conclusions The findings are an indictment of our current
psychiatric diagnostic practice.

Diagnosis is essential in clinical practice, research, training and public
health. Definitions for psychiatric diagnoses are derived from expert opinion
rather than the biological basis of the disorder. The modest knowledge base
regarding the causation of disease has hindered the use of aetiological
factors in psychiatric classification systems. The current classifications
(World Health Organization,
1992; American Psychiatric
Association, 2000) were designed to achieve high interrater
reliability of diagnostic assessment. It is widely believed that if future
editions of the DSM and the ICD are to be a significant improvement on their
predecessors, the validity of the diagnostic concepts they include will have
to be enhanced (Kendell & Jablensky,
2003). Follow-up studies including evidence of diagnostic
stability and diagnostic consistency over time have traditionally been
proposed to test the validity of psychiatric diagnoses
(Robins & Guze, 1970;
Kendler, 1980;
Andreasen, 1995). However,
several authors have noted that as longitudinal data become available,
significant fluctuations in diagnostic stability and changes in clinical
presentation are seen (Krishnan,
2005).

The aim of our study was to evaluate the long-term stability of the most
prevalent chronic psychiatric diagnoses according to ICD–10 in a range
of clinical settings.

METHODS

Participants

In total 34 368 patients received psychiatric care in the catchment area of
Fundacion Jimenez Diaz General Hospital, Madrid, between 1 January 1992 and 31
December 2004. This hospital is part of the Spanish national health services
and provides free medical coverage to a catchment area of 280 000 people.
There were 449 317 psychiatric consultations in a variety of clinical
settings, including visits to out-patient psychiatric facilities (438 622),
emergency visits (9101) and admissions to the psychiatric brief
hospitalisation unit (1594). The current study is based on 10 025 patients
aged 18 years and over who were assessed on at least ten occasions during the
period studied. These patients had 360 899 psychiatric consultations,
including visits to out-patient psychiatric facilities (355 166), psychiatric
emergency visits (4628) and admissions to the psychiatric brief
hospitalisation unit (1105).

Individual service users are reliably identified in the database used for
our analyses because each patient is given an identifying number (a numeric
code is used to ensure patient anonymity), which remains the same throughout
all contacts with psychiatric services within the study area. To ensure that
no patient had been assigned more than one identifier, we reviewed all the
cases in the database and removed any duplicates we found. We defined
duplicates as `patients with identical first name, family name, gender and
year of birth'; `patients with identical first name, family name, gender and
street address', or `patients with identical first name, family name, gender
and hospital/ambulatory record number'. We deleted any cases with significant
suspicion of duplication.

Diagnostic procedures

Procedure during ambulatory visits

Since 1986 public mental health centres within the province of Madrid have
had to record all ambulatory visits in a regional registry, the Registro
Acumulativo de Casos de la Comunidad de Madrid. All diagnoses in this registry
must be coded according to the ICD–9
(World Health Organization,
1978). Since 1992 diagnoses have been assigned according to
ICD–10 (World Health Organization,
1992) criteria and recorded with the appropriate ICD–9
coding numbers; ICD–10 codes were converted to ICD–9 codes using
the guidelines published by the World Health Organization
(Organizacion Mundial de la Salud,
1993). The psychiatrists at each mental health centre recorded one
or two diagnoses per patient during each ambulatory visit. Diagnoses were
assigned after reviewing all available information, including data from
medical records and clinical interviews with the patient and relatives.

Procedure during emergency visits

The emergency diagnoses were taken from the emergency medical records.
Emergency diagnoses were assigned by clinical psychiatrists after reviewing
all available information, including data from clinical interviews with the
patient and relatives.

Procedure during admissions to the in-patient unit

Clinical diagnoses during admissions are the result of an intensive
diagnostic and treatment process by physicians with specialty training in
psychiatry, including data from medical records, other research assessments
and clinical interviews. The psychiatrists who assigned the clinical diagnoses
were not aware of the study in process.

Diagnostic groups included in analysis

Among all chronic psychiatric diagnoses, we selected those disorders
assigned to more than 500 patients in our sample (prevalence higher than 5%).
According to data from naturalistic studies like ours, the frequency and use
of the ICD–10 two-digit, three-digit and four-digit diagnostic
categories show significant variations. Some categories are not used at all,
and others represent less than 0.1% of the samples studied
(Mussigbrodt et al,
2000). In the latter study of a sample of 33 857 treated cases
from 19 departments of psychiatry in ten different countries, `on a
four-character level (Fxx.x), the ten most often used diagnostic categories
represented 40% of all main diagnoses, and 70% on a three-character level
(Fxx.-)' (Mussigbrodt et al,
2000). The diagnoses analysed here (with ICD–10 codes)
are:

disorders of adult personality and behaviour (F60–69), including the
individual diagnoses of specific personality disorders (F60) and other
specific personality disorders (F60.8).

Data extraction and analysis

Diagnostic stability through all the evaluations is calculated according to
Schwartz et al
(2000). Three measures of
stability are presented for each diagnosis. The first, `prospective
consistency', is the proportion of individuals in a category at the first
evaluation who retain the same diagnosis at their last evaluation. This would
correspond to positive predictive value if the last diagnosis were the gold
standard. The second measure, retrospective consistency, is the proportion of
individuals with a diagnosis assigned at the last evaluation who had received
the same diagnosis at the first evaluation; this is conceptually similar to
sensitivity. The third measure is the proportion of patients who received the
same diagnosis in at least 75% of the evaluations. The agreement between
diagnoses at the first and the last evaluations was calculated by the kappa
coefficient, which measures the agreement correcting the effect of chance.

Using the Statistical Package for the Social Sciences, version 13.0 for
Windows, we performed four different analyses: three separate analyses for
each clinical setting (psychiatric emergencies, out-patient visits and
hospitalisations) to control for influences of the setting on the stability of
diagnoses; and a fourth analysis of the combined data from the three clinical
settings to reflect the evolution of diagnoses through the clinical
process.

RESULTS

The socio-demographic characteristics of the sample are presented in
Table 1.

Stability of diagnoses

Data about the prospective and retrospective consistency of the diagnoses
across settings, in the out-patient setting, in the emergency setting and in
the in-patient setting are presented in Tables
2,
3,
4,
5 and graphically in a data
supplement to the online version of this paper. The percentages of patients
who received the same diagnosis in at least 75% of their evaluations, across
settings, in the out-patient setting, in the emergency setting and in the
in-patient setting are presented in Table
6.

Percentage of patients who received a diagnosis in at least 75% of the
evaluations across settings, in the out-patient setting, in the in-patient
setting and in the emergency setting

Across clinical settings

Prospective consistency ranged from 28.7% for other specific personality
disorders to 69.6% for schizophrenia,
(Table 2). The prospective
consistency of the three most prevalent diagnoses at first evaluation was
44.7% for dysthymia, 69.6% for schizophrenia and 49.4% for bipolar affective
disorder (see Table 2).
Retrospective consistency at the last evaluation ranged from 23.4% for bipolar
affective disorder, current episode mild or moderate depression, to 58.0% for
eating disorders; it was 43.7% for dysthymia, 45.9% for schizophrenia and
38.1% for bipolar affective disorder (see
Table 2). The proportion of
patients who received the same diagnosis during at least 75% of their
evaluations ranged from 9.8% for other specific personality disorders to 47.1%
for schizophrenia, schizotypal and delusional disorders see
Table 6).

Out-patient setting

Prospective consistency ranged from 29.4% for other specific personality
disorders to 69.1% for schizophrenia. The prospective consistency of the three
most prevalent specific diagnoses at the first evaluation was 45.7% for
dysthymia, 69.1% for schizophrenia and 50.6% for bipolar affective disorder
(see Table 3). Retrospective
consistency at the last evaluation ranged from 23.2% for bipolar affective
disorder, current episode mild or moderate depression, to 57.7% for eating
disorders; it was 43.6% for dysthymia, 46.0% for schizophrenia and 39.3% for
bipolar affective disorder (see Table
3). The proportion of patients who received the same diagnosis
during at least 75% of the evaluations ranged from 10.7% for other specific
personality disorders to 49.6% for schizophrenia, schizotypal and delusional
disorders (see Table 6).

Emergency department setting

Prospective consistency ranged from 44.4% for other specific personality
disorders to 81.1% for bipolar affective disorder. The prospective consistency
of the three most prevalent specific diagnoses at the first evaluation was
79.2% for schizophrenia, 81.1% for bipolar affective disorder and 62.5% for
dysthymia (see Table 4).
Retrospective consistency at the last evaluation ranged from 41.7% for
obsessive–compulsive disorder to 80.0% for recurrent depressive
disorder; it was 67.0% for schizophrenia, 70.6% for bipolar affective disorder
and 69.0% for dysthymia (see Table
4).

The proportion of patients who received the same diagnosis during at least
75% of the evaluations ranged from 19.5% for residual schizophrenia to 54.6%
for schizophrenia, schizotypal and delusional disorders (see
Table 6).

In-patient setting

Prospective consistency ranged from 66.7% for recurrent depressive disorder
to 100.0% for obsessive–compulsive disorder and eating disorders. The
prospective consistency of the three most prevalent specific diagnoses at the
first evaluation was 90.9% for schizophrenia, 91.5% for bipolar affective
disorder and 81.8% for dysthymia (see Table
5). Retrospective consistency at the last evaluation was between
63.1% for specific personality disorders and 100.0% for recurrent depressive
disorder and obsessive–compulsive disorder; it was 91.5% for
schizophrenia, 89.3% for bipolar affective disorder and 75.0% for dysthymia
(see Table 5).

The proportion of patients who received the same diagnosis during at least
75% of the evaluations ranged from 37.5% for bipolar affective disorder,
current episode mild or moderate depression, to 100.0% for
obsessive–compulsive disorder and other specific personality disorders
(see Table 6).

DISCUSSION

The main variable influencing diagnostic stability for the most prevalent
chronic psychiatric diagnoses was the clinical setting in which the patients
were assessed. The in patient setting showed the highest diagnostic stability,
followed by the emergency and out-patient settings. The temporal consistency
of psychiatric disorders was lower than that found in other studies.

Strengths and weaknesses of the study

The main strengths of this study are the large, representative sample, the
length of follow-up (up to 12 years) and the large number of evaluations.
Moreover, although most previous studies focused on one psychiatric diagnosis
assessed in a single clinical setting, we assessed the stability of all
psychiatric diagnoses naturally presenting in clinical practice. Psychiatric
diagnoses were evaluated in three different clinical settings, using the same
diagnostic procedure that is used during regular clinical practice. Clinicians
who assigned the diagnoses were masked to the study process. Other work has
used semi-structured interviews and other diagnostic instruments not used
ordinarily in clinical practice. The results of our study may more accurately
reflect the real use of diagnostic classifications in psychiatric practice and
may be more useful in estimating the clinical utility of current psychiatric
classification systems.

Diagnostic changes over time may reflect the evolution of an illness, the
emergence of new information or unreliability of measurement
(Schwartz et al,
2000). Spitzer et al
(1978) divided the sources of
unreliability that lead to diagnostic disagreement among clinicians into
categories (sources of variance): subject variance, occasions variance (e.g.
different episodes of bipolar disorder), information variance (e.g. the
differences across settings and informants), observation variance (e.g.
differences among clinicians) and criterion variance. Our study has
limitations that may reflect the influence of these sources of unreliability.
The stability of bipolar disorder may be affected by the occasions variance,
particularly the diagnostic category of bipolar affective disorder, current
episode mild or moderate depression (ICD–10 F31.3). Information and
observation variances can be significantly reduced by training clinicians in
interviewing techniques and observational skills, and by the use of structured
or semi-structured clinical interviews. Because of the naturalistic nature of
our research, structured or semi-structured clinical interviews were not used
in the study. This might have increased the criterion variance. The clinicians
who assigned the diagnoses were not specifically trained to improve interrater
reliability, which might have influenced the consistency of the analysed
diagnoses. Psychiatrists used different diagnostic classifications to code the
diagnoses through-out the study period.

Other authors have reported rates of consistency that are much higher than
the ones found in our study (Tsuang et
al, 1981; Schwartz et
al, 2000; Veen et
al, 2004; Kessing,
2005b; Schimmelmann
et al, 2005). However, most studies that have evaluated
the stability of chronic psychiatric diagnoses have shorter follow-up periods
than in our study and have focused on a single clinical setting (mainly the
in-patient setting). Schwartz et al
(2000) reported that rates of
consistency of some diagnoses decreased as the follow-up period increased. For
example, the retrospective consistency of schizophrenia was 73.1% in a
comparison of 6-month and 24-month diagnoses, but fell to 55% (similar to the
figure of 45.9% obtained in our study across clinical settings) when baseline
and 24-month diagnoses were compared. However, the retrospective consistency
of bipolar disorder remained high: 84.8% (6-month and 24-month diagnoses) and
73% (baseline and 24-month diagnoses). Compared with the data from the study
by Schwartz et al
(2000), the retrospective
consistency of bipolar disorder across clinical settings in our study (38.1%)
is strikingly low. The third measure of stability that we calculated (the
percentage of patients who received the same diagnosis in at least 75% of the
evaluations) may more accurately reflect the diagnostic process through
different evaluations, and was also strikingly low in our study. Some examples
of low values are bipolar affective disorder (23.1%) and specific personality
disorders (12.7%), whereas schizophrenia (42.4%) and eating disorders (43.9%)
showed the highest rates of stability.

The very low consistency for the category `bipolar affective disorder,
current episode mild or moderate depression' may be explained by the fact that
this diagnosis is inherently expected to change, since it represents an
episode rather than a disorder. Perhaps the use of semi-structured interviews
would have enhanced reliability and therefore stability. A structured
interview, the Structured Clinical Interview for DSM–III–R was
used to provide DSM–III–R psychiatric diagnoses in the study by
Schwartz et al
(2000).

Explanations and implications for clinicians and policy makers

There may be several explanations for the differences in diagnostic
stability across clinical settings. First, it may be easier to diagnose a
disorder correctly when symptom severity is at its highest, as in hospital
admissions and emergency visits. We did not have data regarding illness
severity; however, it would be interesting to conduct a similar study
controlling for symptom severity. Second, during hospitalisations,
round-the-clock surveillance and symptom observation may increase the accuracy
of the diagnoses. In addition, during hospitalisations, clinicians can more
easily interview the patient's family, and there is more time for thorough
diagnostic assessment and questioning about areas of functioning and symptoms.
According to Spitzer et al
(1978), this may contribute to
information variance, and may partially explain the differences in diagnostic
stability across clinical settings. Third, the duration of the follow-up
period was much longer in the out-patient setting (1992–2004) than in
the emergency and hospitalisation settings (2000–2004). Finally, the
number of psychiatric contacts was different in each setting (data not shown).
Some authors have suggested that the causal relationship between diagnostic
stability and the number of psychiatric contacts is unknown: It is surprising
that diagnostic stability was higher in the emergency department setting than
in the out-patient setting. Other authors
(Segal et al, 1995;
Rufino et al, 2005)
have noted that psychiatric diagnoses assigned in an emergency department may
be less accurate than diagnoses assigned in other settings. In emergency
department settings, time is usually limited, frequently there is no
additional information from relatives, and in most cases, there is a need for
immediate intervention (Segal et
al, 1995; Rufino et
al, 2005).

`Patients who have many psychiatric contacts may present with more unstable
psychiatric illness leading to more diagnostic variation. On the other hand,
it may be that clinicians have problems with diagnosing some patients
accurately and that this may lead to less effective treatment and more
psychiatric contacts for these patients.'
(Kessing, 2005b).

The temporal consistency of mental disorders in our study is lower than
that found in other longitudinal studies. The relative lack of diagnostic
stability over time is striking given that there is likely to be a bias
towards maintaining the same diagnosis over time. Psychiatrists treating the
patients in this study often had access to past records and diagnoses, and may
have been inclined to keep the previous diagnosis rather than assign a
different one. It should be noted that the view that disorders may not be
discrete `disease entities' but rather dimensions of continuous variations has
gained currency (Kendell & Jablensky,
2003). The categorical approach to psychiatric diagnostic
classification has been criticised in favour of other classification systems,
such as symptom-cluster dimensions (Kendell & Jablensky, 2003). The possibility of alternative approaches
to diagnoses also raises questions about the value of diagnostic stability as
an indicator of the validity of the diagnoses. Krishnan
(2005) has recently stated
that `the limits of the nominalist tradition have been reached' and has
suggested four criteria for defining disease: clinical symptoms; course and
outcome; familial pattern; and treatment response.

The results of our investigation raise worrisome concerns regarding the
validity of results of epidemiological, clinical and pharmacological
psychiatric research, particularly in studies of chronic disorders with short
follow-up periods that may not allow enough time to reach the right diagnosis
or in studies that do not take setting into account. This underscores the
inherent weaknesses in our diagnostic system, leading to instability of
diagnoses which could reflect limitations of the nosology and result in
inappropriate treatment recommendations or interventions.

Future research

It is likely that psychiatric diagnostic categories require revision. This
can only be determined definitively with a large-scale study using structured
or semi-structured interviews. Such a project may be feasible, but we believe
that it might not accurately reflect the conditions of psychiatric practice in
the real world.