The basic information required for the study of comorbidity
is not difficult to collect, and any study that collects information
on more than one diagnosis has already done so. What the researchers
may not have done, however, is publish their data in a form that makes
the calculation of comorbidity rates possible. These are laid out in
detail elsewhere (Angold et al., 1999), but basically consist of (1)
the base rates of disorders X and Y; (2) the prevalence of X|Y and X|not-Y;
(3) the prevalence of Y|X and Y|not-X. Thus, a simple 2x2 table provides
the data needed. However, most researchers, while having the data readily
available to construct such tables, have seen no reason to publish them
in this form.

At a more complex level, one needs to account for other
comorbidities in examining the one of interest. Thus, one needs to account
for comorbidities among psychiatric disorders in examining comorbidity
of psychiatric disorders and substance use/abuse/dependence. This requires
that the data are laid out in the form of a multiplex table of the kind
needed to calculate, for example, a Mantel-Haenzsel chi-square. Once
again, researchers have the data necessary for this purpose; they just
don't normally present them in this format. Similarly, to examine risk
factors and correlates of various types of comorbidity, one needs the
basic 2x2 tables broken out by age, sex, race/ethnicity, poverty, etc.,
and possibly by all of these simultaneously. Problems with working from
the published data increase when studies use complex sampling designs
rather than simple random sampling, or both. It then becomes difficult
to combine reports unless the relevant variance estimates are also available.
Many studies also have the potential to examine comorbidity longitudinally
using repeated data waves. Meta-analysis of longitudinal data is more
complex, but still possible given the power of recent analytic software.

In this section we (1) review the data sets that have
potential for this activity, (2) suggest some key questions that could
be addressed, and (3) suggest an approach to answering these questions.

The three main sources of information about potentially
useful data sets were (1) the National Institutes of Health's database
of currently and previously funded grants (Computer Retrieval of Information
on Scientific Projects [CRISP]), (2) the literature review described
earlier, and (3) personal contact with researchers, especially those
in other countries. The Principal Investigator (PI) of each study was
identified and an e-mail address sought for each one. Over 60 studies
were identified that might possibly be able to provide relevant data.

The goal was to collect information to answer three kinds
of questions about each data set: (a) Does it meet the basic requirement
for comorbidity analyses? (b) If so, what are the characteristics of
the data set relevant to these core requirements (sample size, etc.)?
and (c) Does the data set have other characteristics that would make
it valuable for additional analyses (e.g., repeated measures, risk and
protective factors)?

The basic requirements were those discussed earlier; we
were mainly interested in representative population samples, with reliably
collected DSM or ICD diagnoses, and enough information to permit a determination
of any substance use, substance abuse, and substance dependence, separately
for alcohol, tobacco, and other drugs. Beyond these basic data, we were
interested in knowing what other information might be available across
several data sets. We were also interested in the potential for analyses
using (1) longitudinal data, (2) different race/ethnic groups, (3) a
range of putative risk and protective factors, (4) information on treatment
for drug abuse or psychiatric disorders, and on the effectiveness of
treatment, and (5) a range of "real world" outcomes, such
as school dropout, arrest, incarceration, unwanted pregnancy, or suicide.
However, these were not criteria for inclusion in the list of useful
studies, but rather additional information for exploring what kinds
of analysis might be possible.

Table 3 presents a summary of the potentially usable data
sets on which information has been collected so far. This is an ongoing
project; as new studies reach an analyzable stage they can be added
to the list. At this point we can say that at least 16 studies, collecting
information on at least 17,000 children and adolescents, contain the
minimum necessary data (psychiatric diagnoses, substance use and abuse,
onset dates, demographic and risk factor data). What is even more important
for NIDA's purposes is that most of these are panel studies, with repeated
assessments of the same subjects. This provides the opportunity to examine
the timing and precipitators of the onset of drug use, and progressions
from use to abuse, prospectively, in large, ethnically diverse
samples of children and adolescents.

All the studies include approximately equal numbers of
male and female subjects. Several studies contain sizable samples of
minority participants. There are more than 3,500 African American youth
contributing over 11,000 person-observations, and 2,600 Hispanic participants
contributing some 6,000 person-observations. However, data on American
Indians (N = 450, person-observations = 2,000), Asians, and other minorities
in the United States are sparse.

All the data sets contain information on a range of correlates
and risk factors such as age, sex, school performance, urban/rural residence,
family income, family structure and functioning, and neighborhood and
community resources, although not all studies contain all the variables.
A few provide information on service use for drug and mental health
problems.

Data to examine the development of drug abuse comorbidity
havealready been collected on some 17,000 children and
adolescents. With repeated assessments in many studies these data sets
provide over 84,000 person-observations. At a very rough estimate, the
dozen usable data sets have cost Federal and other agencies at least
$60 million over the years since the early 1970s, when the first of
these studies began. However, few of them have used their data to address
the specific question that NIDA wants answered (exceptions are Costello
et al., 1999; Newman, Silva, & Stanton, 1996). Additionally, the
combined strength of this resource has certainly not been exploited
to address this issue.

There are different methods of using data from multiple
sources. Meta-analyses of the type used in the first part of this report
are one approach. A second is for researchers to carry out cooperative
projects, in which they agree to carry out parallel studies using common
sets of variables (e.g., Costello, 1998, #11041]. A third approach is
to combine the relevant variables from each study into a common data
set. Programs for data analysis are much more flexible than was the
case even a few years ago, and any or all of these approaches might
be feasible, depending on the questions to be answered. Inevitably,
problems would arise and considerable expertise would be needed to use
any of these approaches.

Clearly there are many questions that further analysis
of existing data will not answer. The core question of this conferencethe
impact of early treatment on later drug abuseneeds answering in new studies with
different designs. But it would be helpful to be able to base those
new studies on a firm foundation of knowledge about prevalence, comorbidity,
and development.