simulateWitnessModel: Generates Synthetic CausalFX Problems

Description

This function generates simple synthetic problems that can be used to test
the methods in the CausalFX package. CausalFX problems are objects
of class cfx, and specify a causal inference task of estimating the effect
of a given treatment X on a given outcome Y, with a corresponding dataset.
This function generates only binary data from a multinomial distribution.

Usage

1

Arguments

p

number of background variables (besides X and Y).

q

number of sink variables.

par_max

maximum number of parents in the background set.

M

sample size.

no_sol

if TRUE, then latent variables are parents of both X
and Y, meaning no adjustment set will theoretically be found
(barring sampling variability) if a method such as covsearch
is applied.

Details

The function first generates a directed acyclic graph with a given number of variables which have no latent common
parents with treatment X and outcome Y, which we call "background variables". Conditioning
on a subset of the background variables will block all measured confounding in this problem.
The function then generates a set of "sink" variables K which have one common latent parent with
either X or Y, but are otherwise not adjacent to any observed variable. Conditioning on the sink variables
will generate confounding paths between treatment and outcome. Latent variables are
a pool of independent variables with no parents. If no_sol is FALSE, they are parents of either X or Y but not both.
If no_sol is TRUE, then all latent variables are parents of both X and Y and as
such no adjustment set with observed variables will remove unmeasured confounding between treatment and outcome.
Remaining parents for observed variables are sampled uniformly at random from the pool of background
variables obeying the constraint on the maximum number of parents given by par_max.

Given a graph structure, each variable i is given a binary conditional distribution, defining the probability
of i being equal to 1 given its parents in the graph. This conditional distribution is generated
randomly by a logistic regression model with pairwise interactions, where coefficients are generated by
samples from independent Gaussians with zero mean and standard deviation 10 / number of parents.