The cohorts ascertained by automated EHR phenotyping also exhibited substantial genetic correlations with large samples ascertained through GWAS

High-throughput phenotyping using the large data resources available in EHRs is a feasible way to advance genetic research in psychiatry

The use of electronic health records (EHRs) has the power to accelerate genetic research in bipolar disorder (BD). Currently the major rate-limiting step for genome-wide association studies (GWAS) is the need for ever-larger sample sizes to detect both common modest-effect variants and rarer large-effect variants.

Psychiatrists at Massachusetts General Hospital co-founded the International Cohort Collection for Bipolar Disorder (ICCBD), a consortium that applies high-throughput phenotyping methods to EHRs at sites in the U.S., U.K. and Sweden. The ICCBD previously demonstrated that automated algorithms could feasibly identify BD cases and controls from an EHR system.

In an earlier study, BD cases and controls were identified by creating a "datamart" of 52,235 patients from the EHR system at Mass General, which spans more than 20 years of data from 4.6 million patients. Eligible patients had at least one diagnostic code for BD or manic disorder.

Creation and Clinical Validation of the Algorithms

The researchers created four automated phenotyping algorithms to identify cases and one to identify controls:

Coded-strict: A rules-based algorithm that required at least three diagnostic codes for BD, a predominance of BD diagnoses in the longitudinal record and either (a) treatment with lithium or valproate within a year of BD diagnosis or (b) treatment at a bipolar specialty clinic

Coded-broad: Required at least two ICD codes for BD, a predominance of BD diagnoses and treatment with at least two medications commonly used for BD

Coded-broad-SV: Same as coded-broad except that two or more BD diagnoses could occur during the same episode of illness

Controls: Defined controls as patients at least 30 years of age with no diagnostic code or medication history related to a psychiatric or neurological condition

DNA Sample Collection and Genotyping

For the new study, the algorithms were applied to the EHR system to ascertain case and control DNA samples by linking de-identified phenotypic data to discarded blood samples. Genotyping was performed using a high-throughput genome-wide genotyping array that includes ~250,000 common variants, ~250,000 rare variants and ~50,000 additional markers.

The researchers limited further analysis to DNA samples of European ancestry. The final dataset included 3,330 BD cases and 3,952 controls.

SNP-based Heritability

The highest heritability (0.24, P = .015) was seen with the 95-NLP algorithm. That figure, the researchers note, is nearly identical to heritability in GWAS studies the ICCBD and the Psychiatric Genomics Consortium (PGC) conducted on large, traditionally ascertained BD cohorts.

The coded-strict and coded-broad algorithms also yielded significant, although relatively lower, heritability estimates (0.09–0.12). The coded-broad-SV algorithm did not exhibit significant heritability.

Even so, the overall heritability of the EHR-based BD sample was 0.12 (P = .004). The EHR-based BD definitions were nearly perfectly genetically correlated with each other, with pairwise correlations ranging from 0.98 to 1.0.

SNP-based Genetic Correlations

Overall, the correlation between the EHR-based BD case/control samples and the ICCBD + PGC samples was 0.83. Thus, the algorithms captured genetic influences that strongly overlap with those acting on BD in traditionally ascertained samples. The finding also suggests, according to the authors, that EHR-defined DNA samples can be combined with other samples to enhance the power of genetic discovery.

Associate Chief for research in the Department of Psychiatry, Director of the Psychiatric and Neurodevelopmental Genetics Unit, Professor of Psychiatry at Harvard Medical School and Professor in the Department of Epidemiology at the Harvard School of Public Health

Related topics

Previous research has shown that diagnostic codes routinely collected in electronic health records can help predict domestic abuse an average of two years in advance. Could EHR systems also be used to predict suicidal behavior?

A key quest in psychiatric research is the search for objective ways to diagnose major mental illnesses. Mass General researchers are exploring whether functional MRI can be used to distinguish between bipolar disorder and unipolar depression.