Large Data Set Compendium for Mental Health Research

Welcome to our compendium! Use it to learn about the many different population-based surveys and service system data bases available to researchers who study mental health and illness. It’s designed to direct investigators to sources of high quality data without first having to peruse protocols and documentation to determine content.

How it is organized​

The compendium covers two types of national and regional data sets.

Population-based surveys of U.S. youth and adults that include information on mental health status, diagnoses, and symptoms, along with other relevant data.

System-level data sets that contain information on U.S. health and mental health service utilization, treatment episodes, service providers, provider organizations, costs, and payments.

Ways to use it​

If you’re a researcher seeking to test hypotheses at national or regional levels…

use the compendium to determine which large data sets contain the variables you need for your planned analyses.

If you’ve identified a target population or topic of interest, but haven’t decided what questions to ask…

use the compendium to see which large data sets contain information about your group or topic in order to stimulate your thinking about potential lines of inquiry.

If you’re new to a mental health topic or have broad interests that you’re seeking to narrow down…

use the compendium to conceptualize your subject at person- and systems-levels in order to bring focus to your research interests.

If you’re unfamiliar with the use of large data sets, especially to study mental health…

use the compendium to learn more about this exciting approach to inquiry in the behavioral health field.

What’s in the compendium​

Each data set is described by its name, content, methodology, and sponsoring agency.

A web link to the data set’s home page is provided, along with a sample article reporting the results of mental health research using the data.

A description of the types of mental health information included in the data set.

For population-based surveys, this includes: specific diagnoses and how they’re derived; psychiatric symptoms and how they’re measured; nature of self-report information gathered from respondents and providers; and types of case records and extraction methods.

For system-level data, this includes: how services are classified; what diagnostic systems are used; how costs are measured; and how payors are categorized (e.g., public, private). ​

​For many researchers, this third component is difficult to discern without extensive investigation. This was a major impetus for creating the compendium.