We describe a method that infers whether statistical dependences between two observed variables X and Y are due to a \direct" causal link or only due to a connecting causal
path that contains an unobserved variable of low complexity, e.g., a binary variable. This problem is motivated by statistical genetics. Given a genetic marker that is correlated with a phenotype of interest, we want to
detect whether this marker is causal or it only correlates with a causal one. Our method is based on the analysis of the location of the conditional distributions P(Y jx) in the simplex of all distributions of Y . We report encouraging results on semi-empirical data.

This work addresses the following question: Under what assumptions on the data generating process can one infer the causal graph from the joint distribution? The approach
taken by conditional independencebased causal discovery methods is based on two assumptions: the Markov condition and faithfulness. It has been shown that under these assumptions the causal graph can be identified up to Markov equivalence (some arrows remain undirected) using methods like the PC algorithm. In this work we propose an alternative by Identifiable Functional Model Classes (IFMOCs). As our main theorem we prove that if the data generating process belongs to an IFMOC, one can identify the complete causal graph. To the best of our knowledge this is the first identifiability result of this kind that is not limited to linear functional relationships. We discuss
how the IFMOC assumption and the Markov and faithfulness assumptions relate to each other and explain why we believe that the IFMOC assumption can be tested more easily on given data. We further provide a practical algorithm that recovers the causal graph from finitely many data; experiments on simulated data support the theoretical fndings.

Inferring the causal structure of a set of random variables from a finite sample of the joint distribution is an important problem in science. The case of two random variables is particularly challenging since no (conditional) independences can be exploited. Recent methods that are based on additive noise models suggest the following principle: Whenever the joint distribution {\bf P}^{(X,Y)} admits such a model in one direction, e.g., Y=f(X)+N, N \perp\kern-6pt \perp X, but does not admit the reversed model X=g(Y)+\tilde{N}, \tilde{N} \perp\kern-6pt \perp Y, one infers the former direction to be causal (i.e., X\rightarrow Y). Up to now, these approaches only dealt with continuous variables. In many situations, however, the variables of interest are discrete or even have only finitely many states. In this work, we extend the notion of additive noise models to these cases. We prove that it almost never occurs that additive noise models can be fit in both directions. We further propose an efficient algorithm that is able to perform this way of causal inference on finite samples of discrete variables. We show that the algorithm works on both synthetic and real data sets.

Conditional independence testing is an important problem, especially in Bayesian network learning and causal discovery. Due to the curse of dimensionality, testing for conditional independence of continuous variables is particularly challenging. We propose a Kernel-based Conditional Independence test (KCI-test), by constructing an appropriate test statistic and deriving its asymptotic distribution under the null hypothesis of conditional
independence. The proposed method is computationally efficient and easy to implement. Experimental results show that it outperforms other methods, especially when the conditioning set is large or the sample size is not very large, in which case other methods encounter difficulties.

Inferring the causal structure of a set of random
variables from a finite sample of the
joint distribution is an important problem
in science. Recently, methods using additive
noise models have been suggested to approach
the case of continuous variables. In
many situations, however, the variables of interest
are discrete or even have only finitely
many states. In this work we extend the notion
of additive noise models to these cases.
Whenever the joint distribution P(X;Y ) admits
such a model in one direction, e.g. Y =
f(X) + N; N ? X, it does not admit the
reversed model X = g(Y ) + ~N ; ~N ? Y as
long as the model is chosen in a generic way.
Based on these deliberations we propose an
efficient new algorithm that is able to distinguish
between cause and effect for a finite
sample of discrete variables. We show that
this algorithm works both on synthetic and
real data sets.

We propose two kernel based methods for detecting the time direction in empirical time series. First we apply a Support Vector Machine on the finite-dimensional distributions of the time series (classification method) by embedding these distributions into a Reproducing Kernel Hilbert Space. For the ARMA method we fit the observed data with an autoregressive moving average process and test whether the regression residuals are statistically independent of the past values. Whenever the dependence in one direction is significantly weaker than in the other we infer the former to be the true one. Both approaches were able to detect the direction of the true generating model for simulated data sets. We also applied our tests to a large number of real world time series. The ARMA method made a decision for a significant fraction of them, in which it was mostly correct, while the classification method did not perform as well, but still exceeded chance level.

2009

In Proceedings of the 26th International Conference on Machine Learning, pages: 801-808, (Editors: A Danyluk and L Bottou and ML Littman), ACM Press, New York, NY, USA, ICML, June 2009 (inproceedings)

Abstract

We propose a method that detects the true
direction of time series, by fitting an autoregressive
moving average model to the data.
Whenever the noise is independent of the previous
samples for one ordering of the observations,
but dependent for the opposite ordering,
we infer the former direction to be the
true one. We prove that our method works
in the population case as long as the noise of
the process is not normally distributed (for
the latter case, the direction is not identificable).
A new and important implication of
our result is that it confirms a fundamental
conjecture in causal reasoning - if after regression
the noise is independent of signal for
one direction and dependent for the other,
then the former represents the true causal
direction - in the case of time series. We
test our approach on two types of data: simulated
data sets conforming to our modeling
assumptions, and real world EEG time series.
Our method makes a decision for a significant
fraction of both data sets, and these
decisions are mostly correct. For real world
data, our approach outperforms alternative
solutions to the problem of time direction recovery.

The discovery of causal relationships between a set of observed variables is a fundamental problem in science. For continuous-valued data linear acyclic causal models are often used because these models are well understood and there are well-known methods to fit them to data. In reality, of course, many causal relationships are more or less nonlinear, raising some doubts as to the applicability and usefulness of purely linear methods. In this contribution we show that in fact the basic linear framework can be generalized to nonlinear models with additive noise. In this extended framework, nonlinearities in the data-generating process are in fact a blessing rather than a curse, as they typically provide information on the underlying causal system and allow more aspects of the true data-generating mechanisms to be identified. In addition to theoretical results we show simulations and some simple real data experiments illustrating the identification power provided by nonlinearities.

We propose a method for inferring the existence
of a latent common cause ("confounder")
of two observed random variables.
The method assumes that the two effects of
the confounder are (possibly nonlinear) functions
of the confounder plus independent, additive
noise. We discuss under which conditions
the model is identifiable (up to an arbitrary
reparameterization of the confounder)
from the joint distribution of the effects. We
state and prove a theoretical result that provides
evidence for the conjecture that the
model is generically identifiable under suitable
technical conditions. In addition, we
propose a practical method to estimate the
confounder from a finite i.i.d. sample of the
effects and illustrate that the method works
well on both simulated and real-world data.

In Proceedings of the 26th International Conference on Machine Learning, pages: 745-752, (Editors: A Danyluk and L Bottou and M Littman), ACM Press, New York, NY, USA, ICML, June 2009 (inproceedings)

Abstract

Motivated by causal inference problems, we
propose a novel method for regression that
minimizes the statistical dependence between
regressors and residuals. The key advantage
of this approach to regression is that it does
not assume a particular distribution of the
noise, i.e., it is non-parametric with respect
to the noise distribution. We argue that the
proposed regression method is well suited to
the task of causal inference in additive noise
models. A practical disadvantage is that the
resulting optimization problem is generally
non-convex and can be difficult to solve. Nevertheless,
we report good results on one of the
tasks of the NIPS 2008 Causality Challenge,
where the goal is to distinguish causes from
effects in pairs of statistically dependent variables.
In addition, we propose an algorithm
for efficiently inferring causal models from
observational data for more than two variables.
The required number of regressions
and independence tests is quadratic in the
number of variables, which is a significant improvement
over the simple method that tests
all possible DAGs.

2008

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems