Link List

Thursday, August 28, 2014

I’m sitting here on a smooth, quiet train from Zurich to Innsbruck, a few days after the mini-course that we taught in Helsinki. In this post I want to make a few reflections on things said by people reacting to Facebook or Twitter messages about the course, comments that were too short to do justice to what we actually said.

In particular, the issues have to do with the nature of
genome mapping strategies and what they are or mean.There seems to be a good bit of confusion in
this area, perhaps because of a lack of proper explanation of what these
methods do, and why and how they work.

First, nobody should be doing mapping, looking for genes causally responsible for traits, unless they have some legitimate
reason for believing that a trait is substantially affected by genes—that is,
that variation in the trait or risk of a trait like a disease is causally
associated with variation in a particular spot in the genome.Such a reason, atbest, would be that the trait seems to segregate in families as if caused by a
single Mendelian factor.If the evidence
is weaker than that—as it so often is—then mapping becomes the more
problematic.

If we don’t know the part of the genome that affects the
trait, then we use many measured variable sites, called markers, that span the genome with the idea that wherever the
causal site is, it will be near one of our markers.Essentially, that is, we are searching for
statistically significant associations between the marker and trait, based on
some basically subjectively chosen measure, like a p-value, in samples that we believe are appropriate for detecting
causal effects.

What is perhaps not widely appreciated, is the nearly essential
way that such searches rely on evolutionary assumptions.We say ‘nearly’ because if one happens by
huge luck to genotype the causal site itself, the test for association may be a
bit more direct, as we’ll try to explain.

Mapping is based on
evolutionary history

Evolution, or population history, generates the variation
that causes the trait effect, and the variation we use as markers.Mutational events generating these variants
occur when they occur, and we choose markers based on the idea that they vary
in our chosen type of sample, and that the instances of a given marker allele
(variant) are descendant copies of some original mutation.These instances of the same allele are said
to be identical by descent (IBD) from
that common ancestral copy.Sets of
instances of the marker also mark nearby chromosomal regions that have been
passed down the same chain of descent.That shared region is called a haplotype,
and it gradually shortens over the post-mutation generations by a process
called recombination.

If at some later time in the history of the haplotype
‘tagged’ by the marker variant another mutation occurs in a gene and alters
that gene’s effects to generate the trait we are interested in, then the marker
variant will be present in subsequent descendant copies of that twice-hit
haplotype, and the causal signal will be associated with the presence of the
marker variant.This is called linkage
disequilibrium (LD), and is the reason that mapping works.That is, mapping works because of shared
evolutionary (population) history of the marker and causal variants.

An hypothetical,
simple example

[I’m continuing this post a couple of days from when I
started it on the train to Innsbruck, and now finishing it in a nice hotel in
Old Town, overlooking the Inn river.Beautiful!]

Let’s say that we have a marker at which some people have a
G nucleotide and others a T.And let’s
say the disease causal site, D, is near the G/T site, and that the D mutation,
wherever it is on the chromosome, is near a copy of the chromosome that has the
G on it at the marker site.Then, what
we hope is that the disease will be associated with the G—that enough more
people with the disease will have the G than people without the disease.This is the kind of association between
trait-cause and markerthat mapping is
looking for.But what can make it
happen?

If we’re lucky everyone with the D allele at the causal site
will have the trait (the ‘D’ mutation is fully penetrant, as we’d say).And if there has been no recombination, and
no other way to get the trait, then nobody with a T at the marker will also
have the D variant—none of the T-bearers will have the disease.Cases will have the G, controls the T.

This sort of perfect association depends on when the
D-mutation, wherever it is on the chromosome, occurred relative to the mutation
that produced the T at the marker.We
usually pick marker sites because we know that the variation (here, G vs T) is
common in the population, and that means that the mutation is rather old.Enough generations have passed for there to
be a substantial fraction of T-bearing, and G-bearing people in the population.

If the ‘D’ mutation occurred right after this G-T marker’s
mutation, then all copies of the G variant at the marker will also have the
trait.But if the trait-mutation
occurred much later, then only a few of the G-bearing chromosomes will have the
D-causing trait.The association, even
if true, will be weak.If the D-site is
far from the G-T marker site, then if the D-causing mutation occurred long
enough ago for most G-bearers also to have the trait, but there’s a trap: in
this case there will have been enough time for recombination to switch
the D-site onto a T-bearing marker chromosome.The G-D association will no longer be perfect.

Likewise, if there are many different causes of the trait, then some cases will not be due to
the D-variant (tagged by the G-allele at the nearby marker), even if thelatter really is also a cause.We’ll have cases with the T-marker variant,
and in this case it’s not because of recombination.The more causes of the trait the weaker the
association between a specific marker, like the G-T one.

Science or cold fusion?

So mapping is a multiple-edged sword.Now, there are several ways to try to find
trait-associated parts of the genome.One is called linkage mapping, the other association mapping (genomewide
association, or GWAS).And one can also
think that causal sites can be found notby relying on linkage-disequilibrium, but simply by looking for causal
variants directly.

These various strategies have their strong and weak points,
and there is just as strong disagreement as to which to apply when.That’s why someone can, sometimes sneeringly,
claim that this or that approach is ‘cold fusion’—that it’s imaginary, and
won’t or can’t work.But since mapping
for complex traits is not doing very well—as we’ve posted many times (and many
others have repeatedly observed), we are usually explaining only a rather small
if not trivial fraction of causation by mapping, the issues are serious,
regardless of the vested interests of those contending with these issues.

Comments

We always welcome comments, but we moderate them to reduce spam, gratuitous unkindness and so forth. Because we moderate comments, they won't appear on the blog until one of us publishes them, but we try to do that in a timely way.

We've had to make a change to the commenting page. People had told us that Blogger was eating their comments, so now, rather than embedding comment editing with the posts, it has to be done on a separate, full page. Unfortunately, the 'reply' option has disappeared so comments will just follow one another. We'll see how this goes.