The scene is set by a brief but useful review of progress in genome-wide studies of human disease, which is worth reading if you need to get yourself up to speed on the scope of progress in modern disease genomics. The main course, however, is provided by three opinion pieces tackling the recent results of genome-wide association studies from quite different angles. I’ll discuss two of these articles in this post, and will hopefully have time to tackle the third article – and the implications of this debate for the future of personal genomics – in a second post.

Before I get to the opinion pieces, however, here’s a whirlwind tour of the history of research into the genetic basis of common diseases (such as type 2 diabetes): prior to 2005, the field was largely a scientific wasteland scattered with the embarrassing and wretched corpses of unreplicated genetic association studies, with barely a handful of well-validated genetic risk factors peeking above the noise; in 2005, the first genome-wide association studies (GWAS) emerged from the combination of the hugely successful HapMap project with new technology for testing hundreds of thousands of single-base genetic variants simultaneously; from 2005 until today, GWAS have rapidly grown in scale and complexity, with studies now looking at over a million genetic markers in cohorts approaching a hundred thousand individuals.

From the outset the aim of genome-wide association studies has been two-fold: (1) identifying markers that can be used to predict individual disease risk; and (2) highlighting the molecular pathways underlying disease, providing potential targets for therapy.

There’s little disagreement in the scientific community that the appearance of GWAS has changed the face of common disease genetics: from that handful of genuine associations in 2005, we now have somewhere in the vicinity of 400 regions of the genome displaying replicated associations with around 70 common diseases or complex traits. Where the experts differ, however, is on the issue of whether continuing to increase the scale of GWAS to ever-larger sample sizes is worth the substantial costs, and whether personal genomics companies like 23andMe – who use GWAS results to provide personalised genetic disease risk predictions – are providing a valuable service or a scam.

The first two NEJM opinion pieces represent two poles of the diversity of views within the genetics community on these issues, and illustrate both the important lessons that have emerged from GWAS and the many questions that are yet to be answered.

Goldstein’s argument is straightforward: although GWAS have been “strikingly successful” in identifying sites of common genetic variation associated with complex diseases, the variants they have found – both individually and altogether – explain a small fraction of the overall genetic contribution to common disease risk. These common variants typically only increase risk by 10 to 70% – so for a disease that affects 2% of the population, carrying one of these variants means your disease risk increases to 2.2 to 3.4%. For even heavily studied diseases the variants found to date typically explain less than 20% of the heritable variance in disease risk.

This point is non-controversial, and has been apparent for quite a while – over a year ago, for instance, I wrote a post listing the places where the missing heritable risk could be hiding. The major response from researchers performing GWAS has been to continually increase sample sizes, giving them power to reveal variants with ever-smaller effect sizes.

However, Goldstein argues that this approach is doomed to failure. Based on what we know about the distribution of effect sizes of risk variants, he argues that if common risk variants underlie the totality of genetic risk there must be a ridiculously large number of them; and that means that these variants will provide little useful insight into the biology of disease:

If common variants are responsible for most genetic components of type 2 diabetes, height, and similar traits, then genetics will provide relatively little guidance about the biology of these conditions, because most genes are “height genes” or “type 2 diabetes genes.”

Goldstein’s negative tone is balanced by a buoyant review from Joel Hirschhorn celebrating the successes of the GWAS era. Hirschhorn argues that despite the failure to uncover the majority of the genetic disease risk, GWAS have in fact contributed substantially to our understanding of disease mechanisms. Here he has two striking examples to back him up: the revelations of the involvement of the complement pathway in age-related macular degeneration (AMD), and of the autophagy and IL23 pathways in Crohn’s disease. These pathways weren’t known to play a role in these diseases prior to GWAS, but the evidence for their involvement from GWAS (particularly in the case of AMD) is unambiguous.

For what it’s worth, I think Hirschhorn’s examples demonstrate that Goldstein is overstating his point here; clearly common variants can be highly informative about biology, and it seems likely that we will find plenty more such examples as we dig into the the hundreds of genes uncovered by GWAS. Hirschhorn is prepared to bet on it:

“…by the 2012 [American Society of Human Genetics] meeting, genomewide association studies will have yielded important new biologic insights for at least four common diseases or polygenic traits — and efforts to develop new and improved treatments and preventive measures on the basis of these insights will be well under way.”

Of course the question of whether such revelations will be common and powerful enough to justify the fiendishly high costs of ever-larger GWAS remains an open one.

Where I think Goldstein does have the upper hand is in his critique of the success of the second goal of GWAS: identifying markers that can be used to predict disease risk. At the current time, the common variants identified by GWAS contribute very little of value to individual disease risk predictions over existing clinical markers for most common diseases. Hirschhorn’s response to this argument is rather muted and speculative; in fact, Goldstein himself provides the best counter-examples to this trend (GWAS for some drug response and infectious disease susceptibility traits, where common large-effect variants have been uncovered). But these counter-examples aside, Goldstein is perfectly correct that common variants have proved disappointing from a clinical predictive standpoint.

Some have interpreted this as meaning that personal genomics is dead; this is not the case. It simply means that health predictions from the current incarnation of personal genomics (with its single-minded focus on common variants) should not be relied on too heavily by consumers. Over the next few years, personal genomics will move with the science towards increasingly better predictions.

Moving beyond common variants

Somewhat surprisingly, one of the major themes of Goldstein’s review goes uncontested by Hirshhorn: the notion that it’s highly unlikely that more common variants explain the majority of the remaining genetic risk. Instead, Goldstein bets (and I completely agree) on a substantial role for rarer variants with substantially larger effect sizes. I’m planning to expand on the theoretical argument for the importance of rare variants in an upcoming series of posts, but for now I’ll simply repeat Goldstein’s summary: “the efficiency of natural selection in prohibiting increases in disease-associated variants in the population” means that the variants with the largest individual effects on disease will tend to be rare.

I think it’s important to note here that while Goldstein has been one of the most public voices noting the disappointing yield from GWAS, he is by no means a lone voice in the wilderness. Most of the researchers working on GWAS that I’ve spoken to are not increasing their sample sizes because they think common variants are the only source of disease risk; they’re doing it because the technology for surveying rare variants is only just becoming feasible, while GWAS technology is extremely well-established and reliable, and there are still plenty of common variants out there to be discovered.

Nonetheless, as sequencing technology becomes cheaper you can expect to see an explosion of targeted gene sequencing studies looking for rarer risk variants (and finding them soon, I hope, since this was my top prediction for 2009!). At the same time, a second generation of GWAS will be performed using new chips targeting variants throughout the genome at ever-lower frequencies. The two strategies will finally converge on the holy grail of genetic analysis: complete genome sequencing of hundreds to thousands of disease patients and controls.

These studies will hopefully yield a fine harvest of rare disease-associated variants with much stronger effects on risk than the common variants uncovered to date – increases in risk of between two- and ten-fold. Such variants will provide fine-grained insights into disease pathways, but more importantly they will be much more useful for individual risk prediction – if an individual is unlucky enough to be carrying just one or two of them they will instantly have a substantially higher risk than non-carriers. The emergence of these variants will make personal genomics vastly more useful for health predictions.

That’s enough for now – hopefully I’ll have time over the weekend to discuss the third NEJM article, and expand on its implications for personal genomics.