There is a common view that the human genome has two different parts – a “constant” part and a “variable” part. According to this view, the bases of DNA in the constant part are the same across all individuals. They are said to be “fixed” in the population. They are what make us all human – they differentiate us from other species. The variable part, in contrast, is made of positions in the DNA sequence that are “polymorphic” – they come in two or more different versions. Some people carry one base at that position and others carry another. The idea is that it is the particular set of such variations that we inherit that makes us each unique (unless we have an identical twin). According to this idea, we each have a hand dealt from the same deck.

The genome sequence (a simple linear code made up of 3 billion bases of DNA in precise order, chopped up onto different chromosomes) is peppered with these polymorphic positions – about 1 in every 1,250 bases. That makes about 2,400,000 polymorphisms in each genome (and we each carry two copies of the genome). That certainly seems like plenty of raw material, with limitless combinations that could explain the richness of human diversity. This interpretation has fuelled massive scientific projects to try and find which common polymorphisms affect which traits. (Not to mention personal genomics companies who will try to tell you your risk of various diseases based on your profile of such polymorphisms).

The problem with this view is that it is wrong. Or at least woefully incomplete.

The reason is it ignores another source of variation: very rare mutations in those bases that are constant across the vast majority of individuals. There is now very good evidence that it is those kinds of mutations that contribute most to our individuality. Certainly, they are much more likely to affect a protein’s function and much more likely to contribute to genetic disease. We each carry hundreds of such rare mutations that can affect protein function or expression and are much more likely to have a phenotypic impact than common polymorphisms.

Indeed, far from most of the genome being effectively constant, it can be estimated that every position in the genome has been mutated many, many times over in the human population. And each of us carries hundreds of new mutations that arose during generation of the sperm and egg cells that fused to form us. New mutations may spread in the pedigree or population in which they arise for some time, depending in part on whether they have a deleterious effect or not. Ones that do will likely be quickly selected against.

A new paper from the 1000 genomes project consortium shows that:

“the vast majority of human variable sites are rare and that the majority of rare variants exhibit, at most, very little sharing among continental populations”.

This is a much more fluid picture of genetic variation than we are used to. We are not all dealt a genetic hand from the same deck – each population, sub-population, kindred, nuclear family has a distinct set of rare genetic variants. And each of these decks contains a lot of jokers – the new mutations that arise each time a hand is dealt.

Why have such rare mutations generally been ignored while the polymorphic sites have been the focus of intense research? There are several reasons, some practical and some theoretical. Practically, it has until recently been almost impossible to systematically find very rare mutations. To do so requires that we sequence the whole genome, which has only recently become feasible. In contrast, methods to survey which bases you carry at all the polymorphic sites across the genome were developed quite some time ago now and are relatively cheap to use. (They rely on sampling about 500,000 such sites around the genome – because of unevenness in the way different bits of chromosomes get swapped when sperm and eggs are made, this sample actually tells you about most of the variable sites across the whole genome). So, there has been a tendency to argue that polymorphic sites will be major contributors to human phenotypes (especially diseases) because those have been the only ones we have been able to look at.

Unfortunately, the results of genome-wide association studies, which aim to identify common variants associated with traits or diseases, have been disappointing. This is especially true for disorders with large effects on fitness, such as schizophrenia or autism. Some variants have been found but their effects, even in combination are very small. Most of the heritability of most of the traits or diseases examined to date remains unexplained. (There are some important exceptions, especially for diseases that strike only late in life and for things like drug responses, where selective pressures to weed out deleterious alleles are not at play).

In contrast, many more rare mutations causing disease are being discovered all the time, and the pace of such discoveries is likely to increase with technological advances. The main message that emerges from these studies has been called by Mary-Claire King the “Anna Karenina principle”, based on Tolstoy’s famous opening line:

“Happy families are all alike; every unhappy family is unhappy in its own way”

But can such rare variants really explain the “missing heritability” of these disorders? Some people have argued that they cannot, but this seems to me to be based on a pervasive misconception of how the heritability of a trait is measured and what it means. According to this misconception, if a trait is heritable across the population, that heritability cannot be accounted for by rare variants. After all, if a mutation only occurs in one or a few individuals, it could only minimally (nearly negligibly) contribute to heritability across the whole population. That is true. However, heritability is not measured across the population – it is measured in families and then averaged across the population.

In humans, it is usually derived by comparing phenotypes between people of different genetic relatedness (identical versus fraternal twins, siblings, parents, cousins, etc.). The values of these comparisons are then averaged across large numbers of pairs to allow estimates of how much genetic variance affects phenotypic variance – the population heritability. While a specific rare mutation may only affect the phenotype within a single family, such mutations could, collectively, explain all of the heritability. Completely different sets of mutations could be affecting the trait or causing the disease in different families.

The next few years will reveal the true impact of rare mutations. We should certainly expect complex genetic interactions and some real effects of common polymorphisms. But the idea that our traits are determined simply by the combination of variants we inherit from a static pool in the population is no longer tenable. We are each far more unique than that.

11 Comments

My compliments to the writer of this piece, you have done a fine job of explaining genome diversity and I now understand it in a new light. Excellent science writing and I encourage you to expand this brief piece as more becomes known.

Henry, I am not sure what your mean by a very large “neutral” population – could you explain please? (And yes, it’s no surprise that rare variants are population-specific and also more responsible for disease, although there is still quite some resistance to the idea among many medical geneticists who have been pursuing genome-wide association studies).

Sorry to be obscure, I just mean under neutrality. Standard coalescent theory predicts that the number of loci with i copies of a mutant will be proportional to 1/i. So the number of loci with 1, 2, and 3 copies of a mutant should be in the ratio of 1 to 1/2 to 1/3, etc.

I don’t know where I dug out that 1/2 in my note above: senescence perhaps.

Ha! Well, I saw that paper (http://www.ncbi.nlm.nih.gov/pubmed/21826061) and have read it and am trying to make sense of it. Seems like they have conclusively proved that intelligence is heritable. And nothing more. Of course we knew that already. I do not think their other conclusion – that it is massively polygenic – is supported by their analyses, but I’ve spent a good part of the last couple days trying to figure out exactly what they did and see if I am mistaken. What did you think of it?

[...] to it) distributed trait like height. When I first mention it to Kevin Mitchell at GNXP classic he dissented, saying they failed to establish the polygenic nature of the trait. He has yet to make a post of [...]

I would interpret these findings very differently. What the authors do is analyse GWAS data in a very unusual way – they are not interested in finding specific SNPs affecting the trait, they simply use the SNPs to measure genetic relatedness between individuals. (Well, they were interested in findings specific SNPs but once they didn’t find any significant ones they turned to this other kind of analysis). Razib, you say that the people in the study are unrelated but they are not – they are all from the same population and are distantly related. The study uses SNPs across the genome to measure this relatedness and then shows it correlates with phenotypic similarity – i.e., the trait is heritable. We knew that already.

What they claim is that you can break down this effect by chromosome or by subregion. When they use the SNPs along longer chromosomes they seem to get a bigger effect – “explaining more of the phenotypic variance”. The inference is that thousands of SNPs, scattered across the whole genome, contribute to the trait or, more specifically to variance in the trait across the population (the implication is that they contribute to the value of the trait in individuals).
There is an alternative explanation for this effect, however, which is that using more SNPs simply gives a better estimate of genetic relatedness. So, the SNPs on chromosomes 1 (the longest) give a better estimate than those on chromosome 21 (the shortest) – they index relatedness with more precision. As a result, they correlate better with phenotypic similarity – this looks like you have “explained more of the variance”. In fact, getting such a signal from SNPs on chromosome 1 does not mean that any of the causal variants are actually on chromosome 1. Nor does the fact that such signals can be derived from anywhere in the genome mean that there are thousands of variants across the genome affecting the trait.
In fact, the authors can conclude very little from this study beyond a replication of the known fact that IQ is heritable.

They can say nothing about how many variants are involved across the population or how many affect the trait in each individual. Note that those could be very very different from each other – you could have hundreds or thousands of genes affecting a trait across the population, but only one, two or a handful of variants affecting the phenotype in any individual. Nor can they say whether the causal variants are common or rare. One could expect different combinations of small numbers of different rare variants to be determining phenotype in different individuals. In fact, I would say that is exactly what one should expect. Not the picture they try to sell in this paper, which is that the phenotype in any individual is determined by the combination of thousands of common variants.