Link List

Friday, March 20, 2015

Science, causality and belief

Questions about cause and effect in the natural world have been bandied about since long before Aristotle's time. Aristotle, though, believed that the use of the idea of causality must be supported by theory. Without theory, he thought, a thorough and systematic investigation of the world, answering the 'why' questions, was not possible.

The material cause: “that out of which”, e.g., the bronze of a statue.

The formal cause: “the form”, “the account of what-it-is-to-be”, e.g., the shape of a statue.

The efficient cause: “the primary source of the change or rest”, e.g., the artisan, the art of bronze-casting the statue, the man who gives advice, the father of the child.

The final cause: “the end, that for the sake of which a thing is done”, e.g., health is the end of walking, losing weight, purging, drugs, and surgical tools.

The efficient cause may be the only of Aristotle's four types of cause that actually addresses our contemporary idea of cause and effect, explaining what produced an end result. While Aristotle's theorizing about causality is important in the history of science, when we're thinking about cause and effect we aren't generally thinking anymore in terms of what a trait is made of, what it looks like, or what it is 'for', even if these may be descriptive of our trait. 'Because X' may answer why questions, but it doesn't generally lead to understanding of how one event necessarily follows another, or allow us to predict future events, which is a large part of what we're interested in in science. So, Aristotle's efficient cause is probably the one category we still spend time and money on, 'the primary source' of the thing we're interested in.

This week's episode of BBC Radio 4 program, In Our Time is a discussion of the life and thought of 12th century Islamic thinker Al-Ghazali. Al-Ghazali wrote about theology, jurisprudence, mysticism and philosophy, interesting in their own right, but what struck us was the discussion of his thinking about causality. His book, The Incoherence of the Philosophers, changed Islamic thinking about epistemology. Al-Ghazali vehemently rejected the material explanations of causation of Aristotle and others, instead proposing that all causality was due to the Will of God.

The example given on In Our Time was of cotton burning when it touches fire. Must the cotton burn? The philosophers who Al-Ghazali rejected would have said yes, but Al-Ghazali believed that the material world must be seen as based on God's relationship to the world. So, causality has to be viewed in relation to God, and from this perspective, the cotton could fail to burn because miracles are always possible. Or, because God makes everything happen, he could stop the burning. Or, yes, fire causes cotton to burn, but only because God intervenes in everything, and makes the cotton burn when it touches fire. It only looks as though there are physical causes to everything. In reality, God is the cause.

Of course, this view is easily dismissible by scientists, who believe in material causes. But science doesn't, in fact, always do much better. As was noted on In Our Time, David Hume, 18th century Scottish philosopher who played a central role in the Enlightenment, the approach to material questions about the world based on empirical, objective observation, had read Al-Ghazali and was influenced in part by what he read.

Hume would have said that, yes, we think that fire must burn cotton, but this is something we believe only out of habit. We've always seen cotton burn when it touches fire, but it's sloppy thinking to assume it will always burn. Indeed, he was a critic of this kind of inductive reasoning, the assumption that nature will always behave as we expect, based on our past observations, and will continue to be regular. The "all the swans we've seen are white, therefore all swans are white" pure induction kind of thinking. The problem, of course, is that the first black swan we see destroys our generality about swans, and brings us back to square one. Or, in practice, we try to cling to our theory and explain the exception (e.g., all white swans are white, the black one is a different species or type).

An alternative to induction is deduction -- "All men are mortal, Socrates is a man, Socrates is mortal." But, deduction relies on a starting generality that is assumed to be correct, and if it is not, the conclusion will be false as well. So, both forms of reasoning have fundamental problems, but both are still in use, in part for lack of better alternatives, and in part because in practice they are often useful. If gravity is universal and based on distance and mass, then we can predict the motion of objects we have never seen before, or we can understand the motion of multiple objects. We use deduction in genetics by reasoning from assumptions about what genes are and do to infer what genetically has caused some outcome (e.g., an instance of a disease). But here we too often don't have strong deductive rigor in our premises.

Since the 19th century and the birth of statistical approaches to evaluating the strength of association between risk factors and outcomes, though, in some ways science has sidestepped the problems with determining causality by relying on statistical tests rather than logical reasoning. But it can be problematic if we use statistics to formalize our beliefs about the world, and lend them a rigor they don't deserve, which can happen when the underlying assumptions being tested are wrong.

Ancestry testing is an example that comes to mind. Our DNA carries signatures of our geographic ancestry. So, you can send a saliva sample to an ancestry testing company, and get back an estimate of the proportion of your ancestors who were African, or European or Native American, say. This is based on the assumption that genetic variants that you carry are from these 'parental populations'. If instead the parental populations used were, say, Vanuatu, Mongolia and Tahiti, you'd be told the proportion of your ancestry that was from each of these ancestral populations.

If done correctly, both sets of results would be statistically robust, but thoroughly dependent on the starting assumptions. In most cases, depending on your ancestry, you'd believe the first and not the second. This is because what we decide is true is often based on our a priori assumptions; this is either, as Hume would have said, a lazy way to identify truth, or it's the correct way, or at least the only way we have. But the important thing is to recognize that our choice of scientific truth is often based on belief, not rigorous scientific methods. Indeed, in the ancestry example, the methods were the same in each case.We still do use inductive and deductive reasoning in science, of course. We might work under the assumption that, say, all complex traits have a genetic cause, asthma/heart disease/schizophrenia/etc. is a complex disease, asthma/heart disease/schizophrenia/etc. must have a genetic cause. And we design our experiments and our analysis based on this inductive reasoning.
Or, we believe that all observed cases of disease X are caused by a variant in gene Y, John has disease X, so John must have the causal variant in gene Y. And this informs how we carry out our work. And so on. Even though the starting, or concluding, generalizations may be false, this kind of decision-making about causation is done all the time, based on our beliefs about the arguments. We believe the causal link because we want to believe it.
Essentially the statistical test we choose to apply, if passed by the data, is routinely interpreted as telling us that the putative factor (e.g., a genetic variant) causes our outcome....except when it doesn't. We don't put it this way; instead we say it influences the risk of the outcome, and sometimes we can defend this by a known mechanism but all too often statistical association is assumed to mean causation. Does 'affects risk significantly' mean that it affects risk all the time, but only sometimes does the outcome occur? Or do we mean it sometimes affects risk or even causes the outcome, but we don't know which times that is? Or something different from these alternatives? Statistical tests are subjective, correlation is not the same as causation, and we have very solid reasons to know that many traits have so many individually weak causes that will therefore not pass statistical tests and hence remain elusive.
So, in some ways, our understanding of causality isn't measurably better than Al-Ghazali's, when he invoked the invisible hand of God. Ours invokes generalities that are based on our beliefs about cause and effect, with statistical testing lending them a formality that may or may not be valid. The point is not that our working assumptions are false, but that we don't have a way to reliably determine whether or not they are. This also shows why the widely touted criterion of 'falsification' is itself faulty (or false?). We can seem to falsify an hypothesis because of problems in our sample, or data, or the computer program we used to analyze it. Maybe the black swan really isn't a swan!
Indeed, the weaknesses of our approaches suggest that they are false or at least misleading in some fundamental way that we have not yet understood -- or perhaps we're not asking the right sorts of questions, because what we want to know is determined by us, not by Nature. We can never know when there's a black swan we haven't yet seen. Statistical sophistication gives us the illusion that we're past this, and we all know that correlation doesn't imply causation. But sometimes correlation does in fact imply causation, and we still don't really know how to determine when.

4 comments:

The 'environment' of every point along the genome includes the rest of the genome, the cell, the organ, the body, and the outside. So in that sense, no, every site depends on the environment for what it does.

But if you really mean just the external environment then, yes, most of the genome's functions have to do with things inside the cell or among cells, that are not directly related to the external environment, except in the very generic sense that there must be oxygen, water, food, and so on.

Polygenic traits are almost always modeled as if they are additive. But we know that is generally not true, in that interactions are the basis of all the effects. But it is essentially intractable to deal with more than one or a very few, basically strong and consistent interactions. And interactions with the environment clearly happen and probably between a given dose of environmental effect and a genetic variant (here, one needs to be clear what 'environment' means, since one gene's environment includes the rest of the genome, the cell, the organ, the individual and the external environment.

Comments

We always welcome comments, but we moderate them to reduce spam, gratuitous unkindness and so forth. Because we moderate comments, they won't appear on the blog until one of us publishes them, but we try to do that in a timely way.

We've had to make a change to the commenting page. People had told us that Blogger was eating their comments, so now, rather than embedding comment editing with the posts, it has to be done on a separate, full page. Unfortunately, the 'reply' option has disappeared so comments will just follow one another. We'll see how this goes.