Subscribe to this blog

Follow by Email

On genetic causality: forwards and backwards

Genetics is getting more complicated.
Previously clear and strong links between particular mutations and particular
diseases are becoming muddied and weaker with increasing knowledge. Such
mutations were usually initially identified in families with a heavy burden of
illness, where the mutation segregated clearly with illness. But with our
increasing ability to sequence large numbers of people, we are now seeing that
many such mutations have a much more variable presentation.

Even classically “Mendelian” mutations,
such as those causing cystic fibrosis and Huntington’s disease, are subject to
modifying effects in the genetic background. The same mutation in one person
may not cause the same symptoms or disease progression in another. And for more
complex “disorders”, such as autism, epilepsy or schizophrenia, these effects
are far more endemic. Even in cases where a primary mutation is identifiable,
there may often be additional genetic factors that strongly influence the
phenotype (not to mention intrinsic developmental variation, environmental
factors and personal experiences, which may all also have a very large
influence). Many such mutations can often be found in individuals without any
clinical diagnosis. And in many cases, a disease may emerge due to non-additive interactions between multiple mutations, none of which can be said to be
primary.

Given these complexities (even for
Mendelian disorders), several commentators, including Anne Buchanan and Ken
Weiss, here, and Gholson Lyon, here, have recently questioned the validity of
the whole idea of making definitive, categorical genetic diagnoses based on
single mutations. Both pieces make excellent and valid points.

Buchanan and Weiss have argued,
convincingly, that the highly variable effects of many specific mutations make
them almost useless for prediction of disease based on genotype. While I agree
completely about the inherent complexity of relating single genotypes to
phenotypes (as discussed here), I think it is important not to throw the baby
out with the bathwater. In particular, a clear distinction should be drawn between
explanation and prediction, as the probability relationships are entirely
different in these two directions.

This can be illustrated with a couple of
examples of specific mutations that increase risk of neurodevelopmental
disorders. Most mutations associated with these conditions show “incomplete penetrance” – that simply means that not everyone who carries the mutation
develops the disease (or, more accurately, not all carriers are given the
diagnosis). For example, about 30% of carriers of a chromosomal deletion at 22q11.2 develop psychosis and would meet criteria for a diagnosis of
schizophrenia. This is a hugely increased risk over the baseline population
rate of ~1%, but obviously still far from a majority of carriers.

[As an aside, it is important to note that the
value determined for the penetrance depends entirely on what phenotype we are
assessing. If it is whether the individual has been given a diagnosis of
schizophrenia, then it is around 30% for 22q11.2 deletions. But if it includes
clinically determined intellectual disability, developmental delay or autism,
then the penetrance approaches 100%. Indeed, a recent study found general effects on cognition even in clinically “unaffected” carriers of this and many
other recurrent chromosomal aberrations only sometimes associated with frank
disease].

What can we say, based on these numbers? For
prediction, we are asking, given the presence of mutation X, what is the
likelihood of disease Y? The only thing we can currently base that on is the
frequency of disease in carriers of a given mutation. To follow the example
above, given the presence of a 22q11.2 deletion, the risk of developing
schizophrenia is 30%. Other known mutations associated with neurodevelopmental
disorders have differing penetrance – for example, only ~6% of carriers of a
NRXN1 deletion develop schizophrenia and only a third are clinically affected
overall (versus nearly 100% of 22q11 deletion carriers).

Those numbers make predictions of the
prognosis of individual mutation-carriers pretty fuzzy. With a disease like
schizophrenia, this kind of prediction is clinically important as there may be
methods to intervene during pre-morbid or prodromal phases of the illness,
prior to the onset of frank psychosis and the full clinical syndrome. But current
medical interventions in individuals at high risk of developing psychosis
employ the crude hammer of antipsychotic medication, with all the attendant
downsides and potentially serious side-effects – not something to be taken
lightly or administered without strong justification.

On the other hand, risks of the magnitude
referred to above may well represent actionable information in terms of
prenatal screening and reproductive decisions.

Nevertheless, predictions based on genetic
information will remain drastically underpowered until we reach a point where
the risk associated with an individual’s entire genome-type, and not just with
a single mutation, can be assessed. Making predictions is hard, especially
about the future (Niels Bohr or Yogi Berra, depending on who you ask).

But what about going in the opposite
direction? This is really a very different situation. If we find an individual
with disease Y and with mutation X, can we infer that the mutation is the cause
of the disease? Here, we start with two givens (two rare events) and want to
infer the likely relationship between them (based on their known contingency).
So, if we have a patient with schizophrenia and a test shows they carry a
22q11.2 deletion, how strongly can we infer that that deletion is the primary
cause of their illness?

I suppose there is a fancier statistical
way to do this, but naïvely, we can say that if that person did not have that
mutation, their likelihood of having schizophrenia would only have been ~1%
(given no other relevant information). So, I think it right to say,
intuitively, that it is 30-fold more likely that their disease was caused by
the 22q11 deletion than by some other, unknown factor.We can put more definite numbers on this as
follows:

The P(A|B) notation means the probability
of A, given B, which we are going to compare to the prior probability of A,
given no knowledge of B. Because we take the presence of the mutation as a
given, these calculations should be independent of the frequency of the
mutation (I think). For 22q11 deletions, this odds ratio comes to 29/30, which
corresponds to about a 96.7% probability. For NRXN1 deletions, the penetrance
is much lower – 6.4% vs 1% baseline – but the inference of causality still
comes out to 84.4%. (Another way to word this is, if we take 1000 individuals
with NRXN1 deletions, we would expect 64 to have schizophrenia. But 10 of those
would be expected anyway, so we can say the increased burden in this group,
which we can equate to the likelihood of causality of the NRXN1 mutation in any
individual is 54/64 = 84.4%).

I feel like I may have just committed some egregious
statistical sin with the way that last statement is worded, but it’s not that
important. Those calculations are very naïve (and not something any clinical
geneticist actually carries out), but I think they capture the general
intuition – if the known penetrance of a mutation for a particular disease is
higher, then the inference of causality is stronger when you find someone with
both the disease and the mutation. They also illustrate a surprising result:
even in cases where predictive power is quite low (only about 6%), post hoc
explanatory power may still be quite high – because now we’re given the
presence of disease, an otherwise rare event.

[This is somewhat analogous to interpreting
medical tests in a Bayesian framework, by comparing the false positive rate to
the underlying prevalence of the condition being screened for (the prior
probability) – see here for a great example of this counter-intuitive effect, in the context of autism].

Now, when we use a word like “cause” we are
wading into some treacherous philosophical waters. When I use it here, I do not
mean that the presence of the mutation is a sufficient
cause of the illness, nor is it a complete explanation of the person’s
phenotype. But calculations of the type shown above give a value to the strength
of the inference that a particular mutation was a necessary condition for the emergence of illness in that
individual. They allow us to assign a probability to the idea that, of all the
factors and events that led to illness in this person, the presence of the
mutation was a difference-maker. It
was the main culprit, even if there were multiple accomplices.

This is not causality in a reductive sense
(where a single cause fully explains the entire phenotype), but in a
counterfactual sense (where a single difference explains a difference in the
phenotype – in this case, developing disease versus not developing it). It
says, if cause X had not been the case, then phenotype Y would not have arisen.
For cases like cystic fibrosis and Huntington’s disease, this inference is rock
solid – these disorders do not arise without mutations in the CFTR gene or the
Htt gene (even if the disease symptoms and progression can be affected by
modifying mutations in other genes). For examples like the mutations listed
above that lead to common neurodevelopmental disorders, where there are
multiple causes across the population, the best we can do is assign a
probability of causal involvement for any particular potentially pathogenic
mutation discovered, based on rates of illness across many carriers of that
mutation, compared to the baseline rate.

At least, that’s usually the best we can do
for humans – we can do a lot better in animal models that are amenable to
experimental manipulation. When worm or fly or mouse geneticists map and
identify a mutation that they think is causing a particular phenotype, they can
do two different experiments to test that hypothesis. First, they can introduce
the same mutation into a different animal and see if it reproduces the
phenotype. And second, they can repair the mutation in the initial line of
animals and see if it rescues the phenotype.

Obviously we can’t do those kinds of things
in humans, but we can approach those kinds of experimental tests of causality
in two ways. First, we can introduce the putatively causal mutation into an animal
and see if it recapitulates known aspects of the disease phenotype (in an animal sense). This is very indirect and suffers from many caveats (especially
in knowing which phenotypes to look for and in interpreting negative results)
but a positive result in some validated assay does give some confidence that
the suspect mutation is having an important and relevant effect.

The second approach relies on two fairly
new technologies – the first is the development of induced pluripotent stem
cells (iPS cells) from human patients. These can be differentiated in a dish
into many different cell types and tissues, which can be tested for
cellular-level phenotypes relevant to the function of the damaged gene. This
system is obviously highly simplified and far from ideal, especially for
disorders that manifest at a physiological or even psychological level, but
even in those cases, they must arise initially from changes in the way cells
function and these may be definable if we can assay the right cell types in the
right ways.

Testing causality of a particular mutation
for any such phenotype in a patient’s cells can now be achieved using an even
newer technology: the CRISPR method of genome editing. This uses an RNA guide
molecule to direct an enzyme to cut the DNA in the genome at a specific
position (with astonishingly, game-changingly high efficiency). If a non-mutant
template is supplied, this break will be repaired in such a way as to change
the sequence of DNA in that region, providing the means to revert a mutation to
the “wild-type” version. Then one can determine whether it was really that
single mutation that led to the cellular phenotype or, alternatively, if it was
not involved at all or only one of many factors contributing. (Exciting proof
of principle of this approach was recently provided in a mouse model of cataracts and in cultured intestinal stem cells from cystic fibrosis patients).

Now, for most diseases, we don’t currently have
good animal models or proxies at the cellular level. But there is an analogous
approach to the rescue experiment that can be performed in humans for some
conditions – that is to treat with a medication that targets the candidate
pathogenic molecular mechanism. If the patient improves, then we can conclude
that that mutation was in fact making a major contribution to their illness. This
is the “House, M.D.” method of confirming a diagnosis (it’s never lupus).

Of course, for most mutations, no such specifically
tailored medication currently exists. But there are a few exceptions for
neurodevelopmental disorders. Fragile X syndrome is one – this condition is a
common cause of autism, accounting for 2-3% of cases. Research over several decades has established the nature of the molecular defect in Fragile X
patients and the cellular consequences in how nerve cell synapses work, and is
beginning to elucidate the emergent physiological consequences on neural
networks and brain systems. This detailed knowledge has led to the
identification of candidate cellular components that can be targeted to restore
the balance of the biochemical pathway affected by the Fragile X mutation. This
approach shows great promise in animal models of the disorder and is currently
in clinical trials.

Tuberous sclerosis is another genetic
condition also often associated with symptoms of autism. It is caused by
mutations in either one of two other genes, which also encode proteins that
function in synapses. However, when these genes are mutated the biochemical
defect is the opposite of that when the Fragile X gene is mutated. It turns out
that if this pathway is either too active or not active enough, the functions
of neural synapses are impaired, especially in how they change in response to
activity. Either situation can lead to autism. In mice, crossing Fragile X
mutants with tuberous sclerosis mutants actually restores the balance of this
pathway and the resultant double mutants are much more normal than either
single mutant alone.

So, if a child comes into a clinic with
symptoms of autism, it is important to know if they have mutations in Fragile X
or the tuberous sclerosis genes because the medication that may prove
beneficial for Fragile X patients would be likely to exacerbate symptoms in those
with tuberous sclerosis mutations. (And, of course, there are hundreds of other
potential causes of autistic symptoms that may also respond differently or not
at all).

But even for cases where no targeted
medication exists, the identification of a putatively pathogenic mutation can
still inform clinical treatment. Once a large enough database is generated,
clinicians will be able to ask how patients with different mutations respond to
currently available medications. Perhaps schizophrenic people with 22q11.2
deletions respond better to typical antipsychotics than people with NRXN1
deletions. Or maybe some medications should be avoided in the presence of
certain mutations – that is the case for mutations in a sodium channel gene,
which are associated with Dravet syndrome, a common form of epilepsy. Patients
with these mutations should not be treated with traditional anticonvulsants as
this is known to worsen their seizures.

To avoid semantic arguments, we should just
probably not use the term “genetic diagnosis” and replace it with “genetic
information”. I agree completely that a genetic diagnosis will often be too
categorical and definitive, conferring a label based only on one component of a
person’s genetic make-up, which may in turn be only one factor in their
disease. But despite these complexities, the identification of major mutations still
provides very useful genetic information that will often be relevant to the
patient’s prognosis and treatment.

Comments

Nice post, Kevin. We agree with you that genetics is getting more complicated. Though, as we said the other day in a post, while Mendelian diseases have gotten more complex, people too often treat complex diseases as though they should be simple. And, yes, asking whether an individual will get a particular disease given a particular genotype can be very different from asking why an individual has a particular set of symptoms. Both questions are sometimes answerable, but for only a subset of alleles and diseases, usually rare and often applicable to multiply affected families. The second is unlikely ever to be answerable for diseases that involve a strong environmental component.

Yes, I agree. The post deals with cases where we do find some reasonably penetrant mutation. (But does illustrate that even ones with fairly low penetrance can be informative in patients with an otherwise rare disease). Of course, in many cases we may not find any such allele, and it certainly may not always be the case that there is a single, "primary" mutation. My own view on complex disorders is that they are indeed complex, but not in the way that many people have thought. See here, What is Complex about Complex Disorders? http://genomebiology.com/2012/13/1/237/

I just want to thank you for this wonderful blog. As an undergraduate student in psychology (with a background in philosophy) I appreciate the approach to psychology this blog represents - that is an approach that exists across the levels of explanation. By making genetics accessible you have changed the way I think about psychology and the brain improving how I approach topics. Moreover, your articles are just thoroughly enjoyable to read. Thanks again.

Post a Comment

Popular posts from this blog

Can molecular memories of our ancestors’
experiences affect our own behaviour and physiology? That idea has certainly
grabbed hold of the public imagination, under the banner of the seemingly
ubiquitous buzzword “epigenetics”. Transgenerational epigenetic inheritance is
the idea that a person’s experiences can somehow mark their genomes in ways
that are passed on to their children and grandchildren. Those marks on the
genome are then thought to influence gene expression and affect the behaviour
and physiology of people who inherit them. The way this notion is referred to – both in
popular pieces and in the scientific literature – you’d be forgiven for
thinking it is an established fact in humans, based on mountains of consistent,
compelling evidence. In fact, the opposite is true – it is based on the
flimsiest of evidence from a very small number of studies with very small
sample sizes and serious methodological flaws. [Note that there is, by contrast,
very good evidence for this kind…

I recently wrote a blogpost examining the
supposed evidence for transgenerational epigenetic inheritance (TGEI) in
humans. This focused specifically on a set of studies commonly cited as
convincingly demonstrating the phenomenon whereby the experiences of one
generation can have effects that are transmitted, through non-genetic means, to
their offspring, and, more importantly, even to their grandchildren. Having
examined what I considered to be the most prominent papers making these claims,
I concluded that they do not in fact provide any evidence supporting that idea,
as they are riddled with fatal methodological flaws. While the scope of that piece was limited
to studies in humans, I have also previously considered animal studies making
similar claims, which suffer from similar methodological flaws (here and here).
My overall conclusion is that there is effectively no evidence for TGEI in
humans (contrary to widespread belief) and very little in mammals more
generally (with one very…

GWAS (genome-wide association studies) for
psychiatric illnesses may be about to become a victim of their own success. The
idea behind these studies is that common genetic variation – ancient mutations
that segregate in the population – may partly underlie the high heritability of
common psychiatric and neurological disorders, such as schizophrenia, autism,
epilepsy, ADHD, depression, and so on. The accumulating evidence from over ten
years of GWAS strongly supports that idea, with many hundreds of such risk
variants now having been identified. The problem is it’s not at all clear what
to do with that information. GWAS are a method to carry out a kind of
genetic epidemiology, based on a simple premise – if a particular genetic
variant at some position in the genome (say an “A” base, as opposed to a “T” at position 236,456 on chromosome 9) – is associated with an increased risk
of some condition, then the frequency of the “A” version should be higher in
people with the condition than pe…