Σχόλια 0

Το κείμενο του εγγράφου

Experimental Indistinguishability of Causal Structures

Frederick Eberhardt

fde@cmu.edu

Abstract:Using a variety of differentresults

from the literature,

we show how causaldiscovery with experiments is limited unless substantive assumptions about theunderlying causal structure are made. These results undermine the view that experiments,such as randomized controlled trials,can independently provide a gold

standard forcausal discovery. Moreover,we present a concrete example in which

causalunderdetermination

persists

despite exhaustive experimentation, and

arguethat suchcases undermine

the appeal of an interventionist account of causation

as its dependenceon other assumptions is not spelled out.

1.Introduction

Causalsearch

algorithms based on the causal Bayes net representation

(Spirtes et al.

2000;

Pearl

2000)

have primarily focused on the identification of causal structure usingpassive observational data.The algorithms build on assumptionsthat connect the causalstructure represented by a directed (acyclic) graph among a set of verticeswith

theprobability

distribution of the data generated by the causal structure. Two of the mostcommon such bridge principles are the causal Markov assumption and the causalfaithfulness assumption. Thecausal Markov assumption

states that each causal variable isprobabilistically independent of its

(graphical)

non-descendents given its(graphical)parents.Causal Markov

enables the inference from

aprobabilisticdependence betweentwo variablesto

a causal connection andfrom

a causal separationto

a statisticalindependence.Theprecisenature of such causal separation and connection relationsis

fully characterized by the notion of d-separation

(Geiger et al. 1990;

Spirtes et al. 2000,3.7.1).

Thecausal faithfulness assumption

can be seen as theconverse to the Markovassumption. It states that all and only the independence relationstruein the probabilitydistribution over the set of variables are a consequence of the Markov condition.

Thus,faithfulness permits the inference from probabilisticindependence to causal separation,and from causal connection to probabilistic dependence.

Togethercausal Markov andfaithfulness

provide the basis for causal search algorithms based on passive observationaldata.For the simplest case they are combined withthe assumption that the causalstructure is acyclic and that the measured variables arecausally sufficient, i.e. that thereare no unmeasured common causes of the measured variables.For example,given

three

variablesx,y

andz, if we find that the only (conditional or unconditional) independencerelation that holds among the three variables is thatx

is independent ofz

giveny, thencausal Markov and faithfulness allow us to infer that the true causal structure isone ofthose

presented in Figure 1.

Figure 1

x



y



z

x



y



z

x



y



z

Causal Markov and faithfulness do notdetermine

which of the three causal structures istrue, but thisunderdetermination is well understood

for causal structures in general. It ischaracterized by so-called “Markov equivalence classes” of causal structures. Theseequivalence classesconsist of

sets of causal structures (graphs) that have the sameindependence and dependence relations amongthe variables.The three structures inFigure 1 are one such equivalence class. There are causal

search algorithms, such as

thePC-algorithm

(Spirtes et al. 2000), that are consistent withrespect

tothe Markov

equivalence classes

over causal structures. That is, in the large sample limit theyreturnthe Markov equivalence class that contains the true causal structure.

To identify the true causal structure

uniquely

there are two options: One can makestronger assumptions about the underlying causal model, or one can run experiments.Here wewill first focus on the latter

to then show that one cannot really do without theformer.

We will take anexperiment

toconsist

of an intervention

on a subset of the variablesunder consideration. While there are a variety of different

the error distributionson the variables are non-Gaussian, then the causal structures can also be uniquelyidentified. A set of causal variables is relatedlinearlywhenthe value of each variable isdetermined by

aswould be obtained by not making the assumptions about the causal relations, but insteadrunning a set of single-intervention experiments.

In either case, whether by strengthening assumptions or using experiments, theresultsrely

on the assumption of causal sufficiency

–

that there are no unmeasured commoncauses.In many discovery contexts

itis

implausible that such an assumption isappropriate. Moreover, part of the rationale for randomized controlled trialsin thefirstplacewas that a randomization makes the intervened variable independent of its normalcauses,whether those causes were measured or not. Thus, if there is an unmeasuredcommon causeu–

a confounder–

ofx

andz,

then randomizingx

would break the(spurious) correlation observed betweenxandz

that is due to the confounderu.However,without the assumption of causal sufficiency,underdetermination returnsdespite thepossibility of experiments.

Figure 2

Structure 1

Structure 2

In Figure 2,x,y

andz

are observed (and can be subject to intervention), whileu

andv

areunobserved.If only causal Markov, faithfulness and acyclicity are

assumed,the

two

causal structures

in Figure 2

cannot bedistinguished

by any

set of experimentsthat

intervene

on onlyone

variable

in each experiment

(or by

a passive observation).

Sinceu

andv

are not observed, no variable is

(conditionally)

independentof

any other variable

under passive observation. The same is true whenx

is

subject to an intervention,

eventhough the surgical intervention

would break the influence ofu

onx:

x

is not independentofz

conditional ony, since conditioningony

induces a dependence viav

(conditioning ona common effectmakes the parentsdependent). In

anexperiment intervening

ony

only,xandy

are independent, butx

andz

remain dependent for both

causal structures(becauseofu

in Structure 2 and because

ofu

and the direct effectxz

in Structure 1).

In anexperiment

intervening onz, the edgexz

that distinguishes the two causal structures isbroken, so both structures

inevitably

have the same independence and dependencerelations.The problem is thatno

set of single-intervention experiments issufficient toisolate thexz

edge in

Structure 1, and so the underdetermination remains.

This underdetermination can, of course, be resolved: If one could intervene onx

andy

simultaneously, thenx

will be independent ofz

if thesecond

structure is true, butdependent if thefirst

is true.So,

assuming only causal Markov, faithfulness andacyclicity, the two causal structures are experimentallyindistinguishable

forsingle

intervention experiments, butdistinguishable

fordouble

intervention experiments.

How does this generalize to arbitrary causal structures?

The resolution of theunderdetermination of the causal structures in Figure 2 depended on an experiment thatintervened onall but one

variable simultaneously.This is true in general: AssumingcausalMarkov, faithfulness and

acyclicity, butnot

causal sufficiency,

there exist at least

two causal structuresoverN

variablesthat are indistinguishable on the basis of theindependence and dependence structurefor all

experiments that intervene on at mostN-2

variables, whereN

is the number of observed variables. That is, at least one experimentintervening on all but one variable is necessary to uniquely identify the true causalstructure.In fact, the situation is worse, because a whole set of experiments, eachintervening on

at leastN-i

variables, foreach integeri

in0<i<n, isin the worst casenecessary to ensure the underdetermination is resolved

(see Appendix1for

a

proof). So,even when multiple simultaneous interventions are possible, a large number ofexperiments each intervening on a large number of variables simultaneously

arenecessary to resolve the underdetermination.

Again, one need not pursue this route. One could instead strengthen the search spaceassumptions. Part of why single-intervention experiments were not

sufficient to resolvethe underdetermination of the causal structures in Figure 2 is that independence tests are ageneral, but crude tool of analysis. Combined with causal Markov and faithfulness,independence tests indicate whether or not there is a causal connection, but do not permita more quantitative comparison that can separate the causal effect along differentpathways.

If one could separate the causal effect of thexyz

pathway fromthe directcausal effect ofxz

in thestructuresinFigure 2,

then the two causal structures could bedistinguished.In general such a separation of the causal effect along different pathways isnot possible, since causal relations can beinteractive.When causal effects interact, thecausal effectof variableA

on another

variableB

depends on the values ofB’s

othercauses.As a trivial example, a full gas tank has no effect on the motor starting when the

battery is empty. But when the battery is full, it makes a big difference whether or not thetank is empty.

edge isc.So-called trek-rulesstate that the correlation between two variables in a linear model is given by the sum-product of the correlations along the (active) treks that connect the variables. That is,ifthesecond

structure is true,thenin an experiment that intervenes on x,

we havecor(x,z)=ab, while if thefirst

structure is true, then cor(x,z)=ab+c

in the same experiment. Wecan measure the correlations and compare the result to the predictions:In an experimentthatintervenes

ony,we can determineb

by measuring cor(y,z). In an experimentintervening onx,we can determinea

by measuring

cor(x,y),andwe canmeasure cor(x,z).If cor(x,z)=cor(x,y)cor(y,z)=ab,

then thesecond

structure is true,

while if thefirst

structure is true, then cor(x,z)cor(x,y)cor(y,z),and we can determinec=cor(x,z)-cor(x,y)cor(y,z).

Thus, on the basis of single-intervention experiments alone weare

ableto resolve the underdertermination.

But we had to assume linearity.

Eberhardt etal. (2010)

show

thatthis approach generalizes:if the causal model is linear

(with any non-degenerate distribution on the error terms), but causal sufficiency does not

hold, then there is a set of single-intervention experiments that can be used to uniquelyidentify the true causal structure among a set of variables. This results holds evenwhenthe assumptions

of acyclicity and faithfulness are

dropped.

It showsjust how

powerfulthe

assumptionof linearity is.

Linearity is sufficient toachieve identifiabilityeven for single intervention experiments,butit is knownnot

tobe necessary.Hyttinen et al. (2011)

have shown that similar resultscan be achieved for particular types of discrete models–

(unsurprisingly) for an experiment interveningonlyonz. Thatis,the two parameterized models

arenot only indistinguishableon the basis ofindependence and dependence tests

for any single-intervention experiment or passiveobservation. They are indistinguishablein principle, that is,for any statistical tool, given

onlysingle-intervention experiments (and passive observation),

becausethose

(experimental) distributions

are identical

for the two models.

Thisunderdetermination

exists

despite the fact that all (experimental) distributions are faithful to the underlyingcausal structure.

The models

are, however,

distinguishable in

a double-interventionexperimentintervening on

x

andy

simultaneously. Only for such an experimentdo theexperimental distributions differso thatthe presence of thexz

edge in PM1isinprincipledetectable.

We do not know, butconjecture

that thisin-principle-underdetermination

(rather than just the underdetermination based on the (in-)dependencestructure, as shown inAppendix 1)can be

generalized

to arbitrary numbers of variablesand will hold for any set

of experiments that at most intervene onN-2

variables.

The example shows that in order toidentify the causal structure by single-interventionexperiments

some

additional parametric assumption

beyond Markov, faithfulness

andacyclicity

is necessary.Alternatively,

without additional assumptions,

causal discoveryrequires

a large set of very demanding experiments, each intervening on a large numberof variables simultaneously. For many fields of study it is not clear that such experimentsare feasible, let alone affordable or ethically acceptable.Currently, we do not know

howcommoncases like PM1 and PM2

are. It is possible that in practice such cases are quiterare.When the assumption of faithfulness was subject to philosophical scrutiny, oneargument in its defense was that a failure of faithfulness was for certain types ofparameterizations a measure-zero event

(Spirtes et al. 2000, Thm 3.2).

While this defenseof faithfulnesshas not received much philosophical sympathy,such assessments of thelikelihood of trouble

are of interest when one is willing or forced to make the antecedentparametricassumptionsanyway.

The

exampleheredoes not involve a violation of

faithfulness, but a

similar analysisof thelikelihood

ofunderdetermination despiteexperimentation

is possible.

PM1 and PM2

cast

a rather dark shadowon the hopes thatexperimentson their owncanprovide a gold

standard forcausal discovery.Theysuggest that causal discovery, whetherexperimental or observational, depends crucially on the assumptions one

makes about thetrue causal model. Asthe earlier

examples

show, assumptions interact with each other

and with theavailable experiments

to yield insights about the underlying causal structure.Different sets of assumptions and different sets of experiments resultin different degreesof insight

and underdetermination, but there is no clear hierarchy either within the set ofpossible assumptions, or between experiments and assumptions

about the model space orparameterization.

3.Interventionism

On the interventionist account of causation,“X

is adirectcause ofY

with respect to somevariable setV

if and only if there is a possible intervention onX

that will changeY

(or theprobability distribution ofY) when all other variables inV

besidesX

andY

are held fixedat some value by interventions.”(Woodward

2003).

The

intuition is easy enough:InFigure 2,x

is a direct cause ofz

because

x

andz

are dependent in the double-interventionexperiment intervening onx

andy

simultaneously.

According tothis

definition of a direct cause itis true by definition

thatN

experimentseachintervening onN-1

variables aresufficient

toidentify

the causal structure among a

set ofN

variablesevenwhen causal sufficiency does not hold. (Above we had onlydiscussed necessary conditions.)If each of theN

experimentsleaves out a

differentvariablefrom its intervention set, then each experiment can be used to determine thepresence of the direct effects from theN-1

intervened variables to the one non-intervenedone. Together the experiments determine the entire causal structure.

An

interventionist should therefore have no problem with the results discussed so far,sincethe cases ofexperimentalunderdetermination

that we have considered wereall

restricted

to experimentsintervening on at mostN-2

variables. The causal structurescouldalwaysbe distinguished by an experiment interveningon all but one variable.

But there are unusual cases.InAppendix 3 we provide another parameterization

(PM3)

for thefirst

causal structure in Figure 2

(the one with

the extraxz

edge).The

exampleand its implications are

discussed more thoroughlythan can be done herein Eberhardt(unpublished).

PM3

is verysimilar toPM1 and PM2.

In fact, for a passive observationand a single intervention onx,y

orz

they all imply

the exact same distributions.However,PM3

is alsoindistinguishable

fromPM2

for adouble-intervention experimentonx

andy

(and similarly, of course, for all other double-intervention experiments).

Thatis,PM3 and PM2 differ in their causal structure with regard to thexz

edge, but areexperimentally indistinguishable for all possible experiments on the observed variables.

In what sense, then, is the direct arrow from xz in PM3 justified? After all, in a double-intervention experiment onx

andy,x

will appearindependent

ofz. Given Woodward’s

definition of a direct cause,x

is not a direct causerelativeto the set ofobserved

variables{x, y, z}.However,

ifone includedu

andv

as well,x

would become a direct cause

ofz,sincex

changes the probability distribution of

z

in an experiment that changesx

and holds

y,u

andv

fixed.

So,

the interventionist

can avoidthe apparent

contradiction. The definition of a directcause isprotected from the implications of PM3 since

itisrelativized

to the set ofvariables under consideration.

But one may find a certain level of discomfort that thisinterventionist definitionpermits the possibility that a variable

andz.Thus, we are faced here with a violation of faithfulness that does not follow the well-understood

case

of canceling pathways. But like those cases, it shows that the

interventionist account of causation either misses certain causal relations or implicitlydepends on additional assumptions about the underlying causal model.Theinterventionist

need not assume faithfulness. As indicated earlier

the assumption oflinearity guarantees identifiability using only single-intervention experimentsevenif wedo not assume faithfulness. In other words,a linear parameterizationof Structure 1cannot be made indistinguishable from a linear parameterization of Structure 2.

Part of the appeal of the interventionist accountis its sensitivity to the set of variablesunder consideration when defining causalrelations.This

helped enormously todisentangle direct from total and contributing causes. Examples like

PM3 suggests thatthe relativity may be toogeneral

for definitional purposesunless one makes additionalassumptions:I may measure one set of variables in an experiment and say there is nocausal connection between two variables. You may measure astrictsuperset ofmy

variables and intervene on astrictsuperset of myintervenedvariablesandcome to theconclusion that

the samepair ofvariables stand in adirect

causal relation.Moreover, theclaim would hold when all

the interventionswere

successfully surgical, i.e.breaking

causal connections.

The other part of theinterventionistappealwas

the

apparent

independence of theinterventionist account from substantive assumptionssuch as faithfulness that havereceived little sympathy despite their wide application. This paper suggests that youcannot have both.

P(A | B || B) to mean the conditional probability of A given B in an experimentwhere B has been subject to a surgical intervention)

PM1:P(Y, Z | X || X)= sum_uv P(U) P(V) P(Y | V, X) P(Z | U, V, X, Y)

PM2: P(Y, Z | X || X) = sum_uv P(U) P(V) P(Y | V, X) P(Z | U, V, Y)

Experimental distribution wheny

is subject to an intervention

PM1: P(X, Z | Y || Y) = sum_uv P(U) P(V) P(X | U) P(Z | U, V, X, Y)

PM2: P(X, Z | Y || Y) = sum_uv P(U) P(V) P(X | U) P(Z | U, V, Y)

Experimental distribution whenz

is subject to an intervention

PM1: P(X, Y | Z || Z) = sum_uv P(U) P(V) P(X | U) P(Y | V, X)

PM2: P(X, Y | Z || Z) = sum_uv P(U) P(V) P(X | U) P(Y | V, X)

By substituting the termsof PM1 and PM2

in the above equationsit can be verified thatPM1 and PM2have identical passive observational and single-intervention distributions,but that they differ for thefollowingdouble-intervention distribution onx

andy.

Experimental distribution whenx

andyare subject to an intervention

PM1:P(Z| X, Y

|| X, Y) = sum_uv P(U) P(V) P(Z | U, V, X, Y)

PM2: P(Z | X, Y || X, Y) = sum_uv P(U) P(V) P(Z | U, V, Y)

PM1 and PM2 (unsurprisingly) have identical distributions for the other two doubleintervention distributions, since thexz edge is broken

and the remaining

parameters areidentical in the parameterizations:

Experimental distribution whenx

andz

are subject to an intervention

PM1: P(Y| X, Z || X, Z) = sum_v

P(V) P(Y | V, X)

PM2: P(Y| X, Z || X, Z) = sum_v

P(V) P(Y | V, X)

Experimental distribution wheny

andz

are subject to an intervention

PM1: P(X | Y, Z || Y, Z) = sum_u P(U) P(X | U)

PM2: P(X | Y, Z || Y, Z) = sum_u P(U) P(X | U)

Appendix 3:

Parameterization PM3 for Structure 1 in Figure 2

p(u=1)=0.5

p(v=1)=0.5

p(x=1|u=1)=0.8

p(x=1|u=0)=0.2

p(y=1|v=1,x=1)=0.8

p(y=1|v=1,x=0)=0.8

p(y=1|v=0,x=1)=0.8

p(y=1|v=0,x=0)=0.2

p(z=1|u=1,v=1,x=1,y=1)=0.825

p(z=1|u=1,v=1,x=1,y=0)=0.8

p(z=1|u=1,v=1,x=0,y=1)=0.8

p(z=1|u=1,v=1,x=0,y=0)=0.8

p(z=1|u=1,v=0,x=1,y=1)=0.775

p(z=1|u=1,v=0,x=1,y=0)=0.8

p(z=1|u=1,v=0,x=0,y=1)=0.8

p(z=1|u=1,v=0,x=0,y=0)=0.8

p(z=1|u=0,v=1,x=1,y=1)=0.7

p(z=1|u=0,v=1,x=1,y=0)=0.8

p(z=1|u=0,v=1,x=0,y=1)=0.8

p(z=1|u=0,v=1,x=0,y=0)=0.8

p(z=1|u=0,v=0,x=1,y=1)=0.9

p(z=1|u=0,v=0,x=1,y=0)=0.2

p(z=1|u=0,v=0,x=0,y=1)=0.8

p(z=1|u=0,v=0,x=0,y=0)=0.2

Substituting the parameters of PM3 in the equations forthe passive observational orany

experimental

distributions of PM1

in Appendix 2, it can be verified thatPM2 and PM3are experimentally indistinguishable for all possible experiments on {x, y, z}.Nevertheless, it should be evidentthat in an experiment intervening onx, y, u

andv, thedifference between the bold font parameters will indicate thatx

is a direct cause ofz.

References

Eberhardt,

Frederick,

Clark Glymour, and Richard

Scheines.2005. “On the Number ofExperiments Sufficient and in the Worst Case Necessary to Identify all Causal Relationsamong n Variables.”

Proceedings of the21st Conference on Uncertainty and ArtificialIntelligence,