Failing to include people from Asian, African, Latino and other non-European ancestries in DNA databases is causing errors in identification of disease genes

The Exome Aggregation Consortium, or ExAC, is a simple idea. It combines sequences for the protein-coding region of the genome — the exome — from more than 60,000 people into one database, allowing scientists to compare them and understand how variable they are. But the resource is having tremendous impacts in biomedical research. As well as helping scientists to toss out spurious disease–gene links, it is generating new discoveries. By looking more closely at the frequency of mutations in different populations, researchers can gain insight into what many genes do and how their protein products function.

Many disease-association studies, particularly in recent years, have identified mutations as pathogenic simply because scientists performing analyses on a group of people with a disorder found mutations that looked like the culprit, but didn't see them in healthy people. But it's possible that they weren't looking hard enough, or in the right populations. Baseline 'healthy' genetic data has tended to come mainly from people of European descent, which can skew results.

In August this year, MacArthur's group published its analysis of ExAC data in Nature, revealing that many mutations thought to be harmful are probably not. In one analysis, the group identified 192 variants that had previously been thought to be pathogenic, but turned out to be relatively common. The scientists reviewed papers about these variants, looking for plausible evidence that they actually caused disease, but could find solid evidence for only nine of them. Most are actually benign, according to standards set by the American College of Medical Genetics and Genomics, and many have now been reclassified as such.

Those found to have a genetic risk for sudden heart attacks are sometimes counselled to get an implanted defibrillator, which delivers electrical shocks to the heart if it seems to be beating abnormally. Watkins checked the ExAC database for information on genes that have been associated with these heart conditions, and found that many mutations are much too common among healthy people to be pathogenic. About 60 genes had been implicated as harboring pathogenic mutations that cause one form of the disease; Watkins' analysis revealed that 40 of these probably bear no link.

ExAC is revealing a lot about genes through the frequency of mutations. MacArthur and his team found 3,200 genes that are almost never severely mutated in any of the ExAC genomes — a signal that these genes are important. And yet 72% of them have never before been linked to disease. Researchers are eager to study whether some of these genes play unappreciated parts in illness.

Conversely, the group has found nearly 180,000 instances of mutations so severe that they should render their protein products completely inactive. Scientists have long studied genes by knocking them out in animals such as mice, so that they don't work. By looking at the symptoms that develop, they can study what the genes do. But that has never been possible in humans. Now, researchers are eager to study these natural human knockouts to understand what they can reveal about how diseases develop or may be cured. MacArthur and other researchers are gearing up to prioritize which human knockout genes to study and how best to contact the people carrying them for further study.

The second phase of ExAC, to be unveiled this month, will double the data set's size to 135,000 exomes and include some 15,000 whole-genome sequences, which should allow researchers to explore mutations in regulatory regions of the genome that are not captured by exome sequencing.

ExAC is quietly becoming a standard tool in medical genetics. Clinical labs around the world now check it before telling a patient that a particular glitch in their genome might be making them ill

ExAC has also driven home a point that Goldstein and other researchers have made repeatedly: that failing to include people from Asian, African, Latino and other non-European ancestries is holding back understanding of how genes influence disease by limiting the view of human genetic diversity. There is now a fresh impetus to include under-represented groups in planned studies linking genetics and health information on large numbers of people, such as the US Precision Medicine Initiative.