The aim of dynamic causal modeling (DCM) is to infer the causal architecture of coupled or distributed dynamical systems. It is a Bayesian model comparison procedure that rests on comparing models of how time series data were generated. Dynamic causal models are formulated in terms of stochastic or ordinary differential equations (i.e., nonlinear state-space models in continuous time). These equations model the dynamics of hidden states in the nodes of a probabilistic graphical model, where conditional dependencies are parameterised in terms of directed effective connectivity. Unlike Bayesian Networks the graphs used in DCM can be cyclic, and unlike Structural Equation modelling and Granger causality, DCM does not depend on the theory of Martingales, i.e., it does not assume that random fluctuations are serially uncorrelated.

DCM was developed for (and applied principally to) estimating coupling among brain regions and how that coupling is influenced by experimental changes (e.g., time or context). The basic idea is to construct reasonably realistic models of interacting (cortical) regions or nodes. These models are then supplemented with a forward model of how the hidden states of each node (e.g., neuronal activity) map to measured responses. This enables the best model and its parameters (i.e., effective connectivity) to be identified from observed data. Bayesian model comparison is used to select the best model in terms of its evidence (inference on model-space), which can then be characterised in terms of its parameters (inference on parameter-space). This enables one to test hypotheses about how nodes communicate; e.g., whether activity in a given neuronal population modulates the coupling between other populations, in a task-specific fashion.

In functional neuroimaging, the data may be functional magnetic resonance imaging (fMRI) measurements or electrophysiological (e.g., in magnetoencephalography or electroencephalography; MEG/EEG). Brain responses are evoked by known deterministic inputs (experimentally controlled stimuli) that embody designed changes in sensory stimulation or cognitive set. These experimental or exogenous variables can change hidden states in one of two ways. First, they can elicit responses through direct influences on specific network nodes. This would be appropriate, for example, in modelling sensory evoked responses in early visual cortex. The second class of inputs exert their effects vicariously, through a modulation of the coupling among nodes, for example, the influence of attention on processing of sensory information. The hidden states cover any neurophysiological or biophysical variables needed to form observed outputs. These outputs are measured (hemodynamic or electromagnetic) responses over the sensors considered. Bayesian inversion furnishes the marginal likelihood (evidence) of the model and the posterior distribution of its parameters (e.g., neuronal coupling strengths). The evidence is used for Bayesian model selection (BMS) to disambiguate between competing models, while the posterior distribution of the parameters is used to characterise the model selected.

DCM for fMRI

DCM for fMRI uses a simple (deterministic) model of neural dynamics in a network or graph of n interacting brain regions or nodes (Friston et al. 2003). It models the change of a neuronal state-vector x in time, where each region is represented by a single hidden state, using the following bilinear differential equation:

where \(\dot{x}= dx/dt\ .\) This equation, results from a bilinear Taylor approximation to any dynamic model of how changes in neuronal activity in one node \(x_i\) are caused by activity in the others. More precisely, this bilinear form is the simplest low-order approximation that accounts both for endogenous and exogenous causes of system dynamics. The matrix A represents the fixed (or Average) coupling among nodes in the absence of exogenous input \(u(t)\ .\) This can be thought of as the latent coupling in the absence of experimental perturbations. The B matrices are effectively the change in latent coupling induced by the j-th input. They encode context-sensitive changes in A or, equivalently, the modulation of coupling by experimental manipulations. Because \(B^{(j)}\) are second-order derivatives they are referred to as Bilinear. Finally, the matrix C embodies the influences of exogenous input that Cause perturbations of hidden states. The parameters \(\theta \supset \{A,B,C\}\) are the connectivity or coupling matrices that we wish to identify. These define the functional architecture and interactions among brain regions at a neuronal level. Figure 1 summarises this bilinear state-equation and shows the model in graphical form. DCM for fMRI does not account for conduction delays in either inputs or inter-regional influences. This is not necessary because, due to the large regional variability in hemodynamic response latencies, fMRI data do not contain enough temporal information to estimate inter-regional axonal conduction delays, which are typically in the order of 10-20 ms (Friston et al., 2003). In contrast, conduction delays are an important part of DCM for ERPs (see below).

Figure 1: (A) The bilinear state equation of DCM for fMRI. (B) An example of a DCM describing the dynamics in a simple hierarchical system of visual areas. This system consists of two areas, each represented by a single state variable \((x_1, x_2)\ .\) Black arrows represent connections, grey arrows represent exogenous inputs and thin dotted arrows indicate the transformation from neural states (blue colour) into hemodynamic observations (red colour); see Figure 1 for the hemodynamic forward model. The state equation for this particular model is shown on the right. Adapted from (Stephan et al., 2007a).

DCM for fMRI combines this bilinear model of neural dynamics with an empirically validated hemodynamic model that describes the transformation of neuronal activity into a BOLD response. This so-called “Balloon model” was initially formulated by (Buxton et al., 1998) and later extended (Friston et al., 2000; Stephan et al., 2007c). Briefly, it comprises differential equations that describe the coupling among four hemodynamic state variables, using six parameters \(\vartheta\subset\theta\ .\) More specifically, changes in neural activity elicit a vasodilatory signal that leads to increases in blood flow and subsequently to changes in blood volume and deoxyhemoglobin content. The predicted BOLD signal is a non-linear function of blood volume and deoxyhemoglobin content. This hemodynamic model is summarised schematically in Figure 2 and is described in detail in Friston et al. (2000) and Stephan et al. (2007c).

Figure 2: Schematic of the hemodynamic model used by DCM for fMRI. Neuronal activity induces a vasodilatory and activity-dependent signal s that increases blood flow f. Blood flow causes changes in volume and deoxyhemoglobin (\(v\) and \(q\)). These two hemodynamic states enter an output nonlinearity, which results in a predicted BOLD response y. In recent versions, this model has six hemodynamic parameters (Stephan et al., 2007c): the rate constant of the vasodilatory signal decay (\(\kappa\)), the rate constant for auto-regulatory feedback by blood flow (\(\gamma\)), transit time (\(\tau\)), Grubb’s vessel stiffness exponent (\(\alpha\)), capillary resting net oxygen extraction (\(E_0\)), and ratio of intra-extravascular BOLD signal (\(\epsilon\)). \(E\) is the oxygen extraction function. This figure encodes graphically the transformation from neuronal states to hemodynamic responses; adapted from (Friston et al., 2003).

Together, the neuronal and hemodynamic state equations furnish a deterministic DCM. For any given combination of parameters \(\theta\) and inputs \(u\ ,\) the measured BOLD response \(y\) is modelled as the predicted BOLD signal (the generalised convolution of inputs; \(h(x,u,\theta)\)) plus a linear mixture of confounds \(X\beta\) (e.g. signal drift) and Gaussian observation error \(\epsilon\ :\)

\(y=h(x,u,\theta) + X\beta + \epsilon\)

A schematic representation of the hierarchical structure of DCM is

\( u \overset{f}{\longrightarrow} x \overset{g}{\longrightarrow} y \)

where u influences the dynamics of hidden (neuronal) states of the system x, through the evolution f function; x is then mapped to the predicted data y through the observation function g. The combined neural and hemodynamic parameters \(\vartheta \supseteq \{A,B,C,\vartheta\}\) are estimated from the measured BOLD data, using a Bayesian scheme with empirical priors for the hemodynamic parameters and conservative shrinkage priors for the coupling parameters (see below). Once the parameters of a DCM have been estimated, the posterior distributions of the parameters can be used to test hypotheses about connection strengths (e.g., Ethofer et al., 2006; Fairhall and Ishai, 2007; Grol et al., 2007; Kumar et al., 2007; Posner et al., 2006; Stephan et al., 2006; Stephan et al., 2007b; Stephan et al., 2005).

DCM for evoked responses

DCM for evoked responses was developed as a biologically plausible model to understand how event-related responses result from the dynamics of coupled neural populations. It rests on neural mass models, which use established connectivity rules in hierarchical brain systems to describe the dynamics of a network of coupled neuronal sources (David and Friston, 2003; David et al., 2005; Jansen and Rit, 1995). In contrast to the low-order approximations used in DCM for fMRI, the state equations of DCM for ERPs are far more detailed and realistic, describing interactions between different neural masses (subpopulations). This increased biophysical specificity and realism is possible because M/EEG data contain much more information about underlying neuronal dynamics than the BOLD signal (Daunizeau et al., 2007).

The DCM developed by (David et al., 2006) uses the connectivity rules described in (Felleman and Van Essen, 1991) to assemble a network of coupled sources. Each source is modelled using a neural mass model described in (David and Friston, 2003), based on the model of (Jansen and Rit, 1995). This model emulates the activity of a cortical area using three neuronal subpopulations, assigned to granular and agranular layers. A population of excitatory pyramidal (output) cells receives inputs from inhibitory and excitatory populations of interneurons, via intrinsic connections (which are confined to the cortical sheet). Within this model, excitatory interneurons can be regarded as spiny stellate cells found predominantly in layer four and in receipt of forward connections. Excitatory pyramidal cells and inhibitory interneurons are considered to occupy agranular layers and receive backward and lateral inputs.

Figure 3: Schematic of the DCM used to model evoked electrophysiological responses. This schematic shows the state equations describing the dynamics of sources or regions. Each neuronal source is modelled with three subpopulations (pyramidal, spiny stellate and inhibitory interneurons) which are connected by four intrinsic connections with weights \(\gamma_{1,2,3,4}\ ,\) as described in (Jansen and Rit, 1995) and (David and Friston, 2003). These have been assigned to granular and agranular cortical layers which receive forward \(A^{F}\)', backward \(A^B\) and lateral \(A^L\) connections respectively. Adapted from (Kiebel et al., 2008).

To model event-related responses, the network receives exogenous inputs via input connections. These connections are exactly the same as forward connections and deliver inputs to the spiny stellate cells. In the present context, inputs \(u(t)\) model sub-cortical auditory inputs. The vector \(C\subset\theta\) controls the influence of the input on each source. The lower, upper and leading diagonal matrices \(A^{F},A^{B},A^{L}\subset\theta\) encode forward, backward and lateral connections, respectively. The DCM here is specified in terms of the state equations and a linear output equation

\[ \dot{x}=f(x,u,\theta) \]
\[ y= L(\theta)x_0+\epsilon \]

where \(x_0\) represents the trans-membrane potential of pyramidal cells and \(L(\theta)\) is a lead field matrix coupling electrical sources to the EEG channels (Kiebel et al., 2006).

Within each subpopulation the evolution of neuronal states rests on two operators. The first transforms the average density of pre-synaptic inputs into the average postsynaptic membrane potential. This is modelled by a linear transformation with excitatory and inhibitory kernels parameterised by \(H_{e,i}\) and \(\tau_{e,i}\ .\) \(H_{e,i}\subset\theta\) control the maximum post-synaptic potential, and \(\tau_{e,i}\subset\theta\) represent lumped rate-constants. The second operator S transforms the average potential of each subpopulation into an average firing rate. This is assumed to be an instantaneous process that follows a sigmoid function (Marreiros et al., 2008b). Interactions, among the subpopulations, depend on constants \(\gamma_{1,2,3,4}\ ,\) which control the strength of intrinsic connections and reflect the total number of synapses expressed by each subpopulation.

Model evidence and selection

Bayesian model selection (BMS) is a powerful method for determining the most likely among a set of competing hypotheses about the mechanisms that generated observed data. In the context of DCM, BMS is used to distinguish between different system architectures. Model comparison and selection rests on the model evidence \(p(y|m)\ ;\) i.e. the probability of observing the data y under a particular model m. The model evidence is obtained by integrating out dependencies on the model parameters

\(
p(y|m)=\int p(y|\theta,m)p(\theta|m)d\theta
\)

In many cases, this integration is analytically intractable and numerically difficult to compute. Usually, it is therefore necessary to use computationally tractable approximations to the model evidence (or the log-evidence). Commonly used lower-bound approximations include the Akaike information criterion (AIC), Bayesian information criterion (BIC; Schwarz 1978), and variants such as Akaike’s Bayesian information criterion (ABIC; Akaike 1985). All these approximations can be decomposed into an accuracy term (i.e., log likelihood) and a complexity term. For AIC and BIC, the complexity term is simply a function of the number of model parameters; these criteria can be used both for frequentist and Bayesian models (Penny et al., 2004). They are blind, however, both to interdependencies amongst the parameters and to the form of the prior densities used. For this reason, a different approximation to the log-evidence is preferred in DCM, i.e., the (negative) free-energy F, which handles posterior and priors dependencies properly (see equation for F below).

F is also the objective function that is optimised during model inversion in DCM. For a given DCM, say model m, inversion corresponds to approximating the moments of the posterior or conditional distribution given by Bayes rule

\(
p(\theta|y,m)= \frac{ p(y|\theta,m)p(\theta|m)}{p(y|m)}
\)

Inversion of a DCM involves minimizing the free energy in order to maximize the model evidence or marginal likelihood (c.f. “type-II likelihood”; Good 1965). Details of this procedure are described in (Friston et al., 2003). The posterior moments (mean and covariance) are updated iteratively using Variational Bayes under a fixed-form Laplace, (‘‘i.e.’’, Gaussian), approximation \( q(\theta) \) to the conditional density. This can be regarded as an Expectation-Maximization algorithm; EM (Dempster et al., 1977) that employs a local linear approximation of the predicted responses around the current conditional expectation. This Bayesian method was developed for dynamic system models based on differential equations. In contrast, conventional inversions of state space models typically use maximum likelihood methods and operate in discrete time (c.f. Valdes et al., 1999). Generalisations of this Variational (Laplace) scheme extend the scope of DCM to cover models based on stochastic differential equations and difference equations (Friston et al. 2008; Daunizeau et al. 2009a).
The basic Variational scheme for DCM can be summarized as follows (where λ is the error variance and q is the conditional density):

The free-energy is the Kullback–Leibler divergence (denoted by KL), between the real and approximate conditional density minus the log-evidence. This means that when the free-energy is minimised, the discrepancy between the true and approximate conditional density is suppressed. At this point the free-energy approximates the negative log-evidence\[ F \approx -ln \Big ( p(y|\lambda,m) \Big ) \] (Friston et al., 2007; Penny et al., 2004). Model selection is based on this approximation; where the best model is characterised by the greatest log-evidence (i.e. the smallest free-energy). Pairwise model comparisons can be conveniently described by Bayes factors (Kass and Raftery, 1995)\[
BF_{i,j} = \frac {p(y|m_i)}{p(y|m_j)}
\]

Just as a culture has developed around the use of P values in classical statistics (e.g., P < 0.05), conventions have developed for the use of Bayes factors (BF). Raftery (1995), for example, presents an interpretation of the BF as providing weak (BF < 3), positive (3 ≤ BF < 20), strong (20 ≤ BF < 150) or very strong (BF ≥ 150) evidence for preferring one model over another. Strong evidence in favor of one model thus requires the difference in log-evidence to be three or more (Penny et al. 2004). Under flat priors on models, this corresponds to a conditional confidence that the winning model is exp(3) = 20 times more likely than the alternative. From the equations above, it can be seen that the Bayes factor is simply the exponential of the difference in log-evidences.

The search for the best model precedes (and is often more important than) inference on the parameters of the model selected. Many studies have used BMS to adjudicate among competing DCMs for fMRI (Acs and Greenlee, 2008; Allen et al., 2008; Grol et al., 2007; Heim et al., 2009; Kumar et al., 2007; Leff et al., 2008; Smith et al., 2006; Stephan et al., 2007c; Summerfield and Koechlin, 2008) and EEG data (Garrido et al., 2008; Garrido et al., 2007). This approach, to search for a single best model (amongst those deemed plausible a priori) and then proceed to inference on its parameters, is pursued most often and could be complemented with diagnostic model checking procedures as, for example, suggested by Box (1980). However, alternatives to this single-model approach exist. For example, one can partition model space and making inferences about model families (Stephan et al. 2009; Penny et al. 2010). Alternatively, one can use Bayesian model averaging, where the parameter estimates of each model considered are weighted by the posterior probability of the model (Hoeting et al. 1999; Penny et al. 2010).

In short, DCM rests on two components: biophysical modelling using differential equations and Bayesian statistical methods for model inversion (parameter estimation) and comparison. For a critical review on these biophysical and statistical foundations of DCM, see (Daunizeau et al., 2010). In the next section, some practical applications are presented; see also (Stephan et al., 2010) for good practice recommendations on using DCM.

Applications: fMRI

Here, we briefly describe, as a practical example, the use of DCM for fMRI by analysing data acquired under a study of attentional modulation during visual motionprocessing (Büchel and Friston, 1997). These data have been used previously to validate DCM (Friston et al., 2003) and are available from http://www.fil.ion.ucl.ac.uk/spm/data. The experimental manipulations were encoded as three exogenous inputs: A photic stimulation input indicated when dots were presented on a screen, a motion variable indicated that the dots were moving and the attention variable indicated that the subject was attending to possible velocity changes. The activity was modelled in three regions V1, V5 and superior parietal cortex (SPC).

Three different DCMs are specified, each of which embody different assumptions about how attention modulates connectivity between V1 and V5. Model 1 assumes that attention modulates the forward connection from V1 to V5, model 2 assumes that attention modulates the backward connection from SPC to V5 and model 3 assumes attention modulates both connections. Each model assumes that the effect of motion is to modulate the connection from V1 to V5 and uses the same reciprocal hierarchical intrinsic connectivity. The models were fitted and the Bayes factors provided consistent evidence in favour of the hypothesis embodied in model 1, that attention modulates the forward connection from V1 to V5.

Figure 4: DCM applied to data from a study on attention to visual motion by (Büchel and Friston, 1997). In all models, photic stimulation enters V1 and motion modulates the connection from V1 to V5. All models have reciprocal and hierarchically organised connectivity. They differ in how attention (red) modulates the connectivity to V5; with model 1 assuming modulation of the forward connection (V1 to V5), model 2 assuming modulation of the backward connection (SPC to V5) and model 3 assuming both. The broken lines indicate the modulatory connections, adapted from (Penny et al., 2004).

Figure 5: Nonlinear DCM for fMRI applied to the attention to motion paradigm. Left panel: Numbers alongside the connections indicate the maximum a posteriori (MAP) parameter estimates. Right panel: Posterior density of the estimate for the nonlinear modulation parameter for the V1→V5 connection. Given the mean and variance of this posterior density, we can be 99.1% confident that the true parameter value is larger than zero or, in other words, that there is an increase in gain of V5 responses to V1 inputs that is mediated by parietal activity. Adapted from (Stephan et al., 2008).

Note that this model does not specify the source of the attentional top-down effect. This becomes possible with nonlinear dynamic causal models (Stephan et al. 2008). Nonlinear DCM for fMRI enables one to model how activity in one population gates connection strengths among others. Figure 5 shows an application to the previous example where parietal activity, induced by attention to motion, modulates the connection from V1 to V5.

Applications: Evoked responses

To illustrate DCM for event-related responses (ERPs) we will use data acquired under a mismatch negativity (MMN) paradigm (http://www.fil.ion.ucl.ac.uk/spm/data). In this example, various models over twelve subjects are compared. The results shown are a part of a program that considered the MMN and its underlying mechanisms (Garrido et al., 2007). Three plausible models were specified under an architecture motivated by electrophysiological and neuroimaging MMN studies (Doeller et al., 2003; Opitz et al., 2002). Each has five sources, modelled as Equivalent Current Dipole (ECDs); (Kiebel et al., 2006), over left and right primary auditory cortex (A1), left and right superior temporal gyrus (STG) and right inferior frontal gyrus (IFG). An exogenous (auditory) input enters bilaterally at A1, which are connected to their ipsilateral STG. Right STG is connected to the right IFG. Inter-hemispheric (lateral) connections are placed between left and right STG. All connections are reciprocal (i.e., connected with forward and backward connections or with bilateral connections).

Three models were tested, which differed in the connections which could show putative repetition-dependent changes, i.e., differences between listening to standard or deviant tones. Models F, B and FB allowed changes in forward, backward and both, respectively. All three models were compared against a baseline or null model, which had the same architecture but precluded any coupling changes between standard and deviant trials.

Figure 7: Bayesian model selection among DCMs for the three models, F, B and FB, expressed relative to a null model in which no connections were allowed to change across conditions. The graphs show the negative free-energy approximation to the log-evidence. (Left) Log-evidence for models F, B and FB for each subject (relative to the null). The diamond attributed to each subject identifies the best model on the basis of the subject’s highest log-evidence. (Right) Log-evidence at the group level, i.e., pooled over subjects, for the three models, adapted from (Garrido et al., 2007).

Bayesian model selection based on the increase in log-evidence over the null model was performed for all subjects. The log-evidences of the three models, relative to the null model (for each subject), reveal that they are substantially better than the null model in all subjects. In particular, the FB-model was best in seven out of eleven subjects. The sum of the log-evidences over subjects (which is equivalent to the log group Bayes factor, see below) showed that there was very strong evidence in favour of model FB at the group level.

When the functional architecture is unlikely to differ across subjects, the conventional GBF is both sufficient and appropriate. However, subjects may exhibit different models or functional architectures; for example, due to different cognitive strategies or pathology. In this case, a hierarchical random effects procedure is required (Stephan ‘‘et al.’’, 2009). This rests on treating the model as a random variable and estimating the parameters of a Dirichlet distribution describing the probabilities of all models considered. These probabilities then define a multinomial distribution over model-space, allowing one to compute how likely it is that a specific model generated the data of a randomly chosen subject (and the exceedance probability of one model being more likely than any other).

DCM developments

In contrast to many causal models, DCM does not look for statistical dependencies among measured time-series directly. Instead, it combines a biophysical model of the hidden (latent) dynamics with a forward model that translates hidden states into predicted measurements; to furnish an explicit generative model how observed data were caused (Friston, 2009). This means the exact form of the DCM changes with each application and speaks to their progressive refinement:

In relation to model selection, a hierarchical variational Bayesian framework (Stephan et al., 2009) accounts for random effects at the between-subjects level, e.g. when dealing with group heterogeneity or outliers. This work was extended by (Penny et al., 2010) to allow for comparisons between model families of arbitrary size and for Bayesian model averaging within model families.