Figure 1: The BCM Synaptic Modification Rule. \(y\) denotes the output activity of the neuron, \(\theta_M\) is the modification threshold.

BCM theory (Bienenstock, Cooper, and Munro 1982) refers to the theory of synaptic modification first proposed by Elie Bienenstock, Leon Cooper, and Paul Munro in 1982 to account for experiments measuring the selectivity of neurons in primary sensory cortex and its dependency on neuronal input. It is characterized by a rule expressing synaptic change as a Hebb-like product of the presynaptic activity and a nonlinear function, \(\phi(y;\theta_M)\ ,\) of postsynatic activity, \(y\) (see Figure 1 ). For low values of the postsynaptic activity (\(y<\theta_M\)), \(\phi\) is negative; for \(y>\theta_M\ ,\) \(\phi\) is positive. The rule stabilizes by allowing the modification threshold, \(\theta_M\ ,\) to vary as a super-linear function of the previous activity of the cell. Unlike traditional methods of stabilizing Hebbian learning, this "sliding threshold" provides a mechanism for incoming patterns, as opposed to converging afferents, to compete.

A detailed exploration can be found in the book Theory of Cortical Plasticity (Cooper et al., 2004). For an open-source implementation of the BCM, amongst other synaptic modification rules, see the Plasticity package.

In the mid 1970s there had already been years of experimentation in visual cortex that led to two (sometimes controversial) conclusions.

In normal animals, visual cortical neurons are selective and binocular.

These properties depend on the visual experience of the animal.

In order to account for these two experimental conclusions, Bienenstock et al. proposed three postulates. The BCM theory of synaptic plasticity, as described briefly in the previous section, and in more detail in the sections below, is based on these three postulates.

The change in synaptic weights (\(d w_i/dt\)) is proportional to presynaptic activity (\(x_i\)).

The change in synaptic weights is proportional to a non-monotonic function (denoted by \(\phi\)) of the postsynaptic activity (\(y\)):

for low \(y\ ,\) the synaptic weight decreases (\(dw_i/dt <0\))

for larger \(y\ ,\) it increases (\(dw_i/dt>0\))

The cross over point between \(dw_i/dt <0\) and \(dw_i/dt >0\) is called the modification threshold, and is denoted by \(\theta_M\ .\)

The modification threshold (\(\theta_M\)) is itself a super-linear function of the history of postsynaptic activity \(y\ .\)

Mathematical forms of BCM

BCM as specified by the three postulates in the introduction is under-determined: there are many mathematical forms which satisfy them. Traditionally, it has been the approach of researchers to use the simplest form that is still consistent with the experiments. Some of the common forms are below.

Original BCM (Bienenstock et al. 1982)

In the original form, the neuron is assumed to be linear, a uniform weight decay (\(-\epsilon w_i\)) is present, and the modification threshold is calculated as the power of the mean of the neuron output (possibly scaled by a constant, \(y_o\)). The average used for the threshold is assumed to be an average over all input patterns, although a temporal average will usually be equivalent.

Intrator and Cooper (Intrator and Cooper, 1992)

This form of BCM can be derived by minimizing the following loss function:
\[
R=-\frac{1}{3}E[y^3] + \frac{1}{4}E^2[y^2]
\]
which measures the sparseness or bi-modality of the output distribution. Similar rules can be derived from objective functions based on kurtosis and skewness, although not all of them have the stability properties of this one. (Blais et al., 1999, Cooper et al., 2004)

In order to have stable fixed points, the average used for the modification threshold is calculated with the square of the output, as opposed to the original form which squared the average of the output itself. If the mean of the output is zero, then the original form would have only trivial stable fixed points.

The Intrator and Cooper form, also called the IBCM rule, has some nice mathematical properties (see Intrator and Cooper, 1992, Cooper et al. 2004):

It is an exploratory projection index that emphasizes deviation from a Gaussian distribution at the center of the distribution, in the form of multi-modality.

The formulation naturally extends to a lateral inhibition network (with a non-linear saturation transfer function), which can find several projections at once.

The number of calculations of the gradient grows linearly with the number of projections sought, thus it is very efficient in high dimensional feature extraction.

The search is constrained by seeking projections that are orthogonal to all but one of the clusters (in the original space). Thus, there are at most \(K\) optimal projections and not \(K(K-1)/2\) separating hyper-planes as in discriminant analysis methods. This property is very important as it suggests why the "curse of dimensionality" is less problematic with this learning rule -- every minima is an optimal one.

Most importantly, the neuronal output (or the projection) of an input \(x\) (or a cluster of inputs) is proportional to \(1/P(x)\ ,\) where \(P(x)\) is the a-priori probability of the input \(x\ .\) This property, which directly results from the analysis (see Intrator, 1996) is essential for creating coincidence detectors, and it also indicates the optimality of the learning rule in terms of energy (or code) conservation. If a biologically plausible log saturation transfer function is used as the neuronal non-linearity, it follows that the amplitude or code length associated with the input \(x\) is proportional to \(-log(P(x))\ ,\) which is optimal from information theoretic considerations.

The IBCM rule has been used in many applications. In speech, it was used to provide discrimination between phonetic features (Intrator, 1992, Seebach et al., 1994). In vision, it was used to model 3D object recognition and produce a model that compared favorably with experimental results. The experiment was constructed to enable distinction between different 3D model representations (Intrator and Gold, 1993, Intrator et al., 1995). The IBCM was also proposed in a general framework of combining unsupervised learning with supervised learning (Intrator, 1993) and in many other pattern classification tasks (Dotan and Intrator, 1998, Huynh et al, 1998).

Law and Cooper (Law and Cooper, 1994)

The Law and Cooper form has all of the same fixed points as the Intrator and Cooper form, but speed of synaptic modification increases when the threshold is small, and decreases as the threshold increases. The practical result is that the simulation can be run with artificially high learning rates, and wild oscillations are reduced. This form has been used primarily when running simulations of networks, where the run-time of the simulation can be prohibitive.

Experimental verification

In order to make comparison between theory and experiment, various simplifying assumptions must be made. In general these involve

the synaptic modification postulate

the network architecture

the nature of the visual environment.

Results at all levels of complexity that have been examined are in good agreement with experiment. Highly selective oriented receptive fields evolve for natural image environments (Law and Cooper, 1994, Shouval et al., 1996).

Figure 2: The BCM Rule develops orientation selectivity in a natural image environment. Shown are the responses to an oriented stimulus versus time, for the left and right eye (bottom) and the final synaptic weight configuration (top).

When a two eye visual environment is used, receptive fields with varying degrees of ocular dominance evolve (Shouval et al., 1996, Shouval et al., 1997). The effect of network interactions has been analyzed (Cooper and Scofield, 1988) and simulated (Shouval et al., 1997). Simulations reveal the same type of receptive fields as in the single cell case but with ocular dominance patches and slowly varying orientation selectivity. Deprivation experiments have been simulated as well (Clothiaux et al., 1991, Blais et al., 1996, Blais et al., 1999) and the development of direction selectivity has been shown (Blais et al., 2000). All types of experimental results can be replicated by BCM neurons for the same set of parameters.

Throughout these simulations it is assumed that the input channel (or channels) originating from the closed eye (or eyes) provides an input of uncorrelated noise to the cortical cell. The results obtained are critically dependent on the level of this noise. The time it takes oriented receptive fields to decay in deprivation experiments such as MD, RS and BD all depend on the level of noise -- the higher the noise level the faster the decay time. This happens because noise from the deprived eye seldom create activities that are higher than the threshold \(\theta_M\) and thus mostly contribute to the decay; the stronger they are the faster the decay. Such results are contrary to what would be obtained using models that employ explicit competition between the eyes (Blais et al., 1999) where the decay time, typically, increases as the level of noise increases.

Figure 3: Dynamics of monocular deprivation using the BCM Rule. Shown are the responses to oriented stimuli versus time for the left and right (deprived) eyes. Above is deprivation with high noise in the deprived channel, analogous to lid suture, and below is deprivation with low noise in the deprived channel, analogous to TTX injection in the deprived eye. BCM predicts that deprivation effects occur faster in the presence of larger, uncorrelated, deprived-eye activity.

The level of noise might be experimentally manipulated in deprivation experiments by using different methods of deprivation. It was thought that retinal activity in lid sutured animals should be higher than in those with a dark patch placed on the eyes and should be reduced close to zero in animals that have TTX (a sodium channel blocker) injected to the eye. The relevant parameter, for the models, is LGN activity. If the level of LGN activity indeed depends on the retinal activity we could use these different protocols to manipulate the noise level in the LGN and thus to determine experimentally which of the different proposed models agrees better with experimental results.

A set of experiments performed by Rittenhouse et al. (1999) seem to show that the level of the noise in the deprived eye could be manipulated by comparing normal MD with lid suture to MD in which TTX was injected into the eye. TTX abolishes action potentials in the retina and was thought to significantly reduce the spontaneous rate in LGN.

The interpretation of Rittenhouse was based on the assumption that TTX reduces the noise level in LGN. Recent experimental results may question this assumption. This has led to an ongoing reexamination of both experiment and theory. Whatever the outcome, this is an excellent illustration of the fruitfulness of intersection between theory and experiment.

Cellular basis

Postulate 1 states that plasticity will occur only in synapses that are stimulated presynaptically. This is what biologists refer to as synapse specificity. Synapse specificity has strong support for both LTP and LTD (Dudek and Bear, 1992). In addition this assumption is consistent with the observation that more presynaptic activity results in a higher degree of plasticity, although this might not be linear.

There is now substantial evidence both in hippocampus and neocortex (Dudek and Bear, 1992, Mulkey and Malenka, 1992, Artola and Singer, 1992, Kirkwood and Bear, 1994, Mayford et al., 1995) in support of postulate 2. There is significant evidence that active synapses undergo LTD or LTP depending on the level of postsynaptic spiking or depolarization in a manner that is consistent with the BCM theory, as shown in Figure 4.

A direct test of the postulate of the moving threshold -- that after a period of increased activity \(\theta_M\) increases, promoting synaptic depression, while after a period of decreased activity \(\theta_M\) decreases, promoting synaptic potentiation -- has been tested by studying LTD and LTP of layer III synaptic responses in slices of visual cortex prepared from 4-6 week-old light-deprived and control rats (Kirkwood et al., 1996). This experiment shows that in deprived animals \(\theta_M\) is lower than in normal animals. In control slices from the hippocampus no change in \(\theta_M\) is observed.

An additional experiment by Wang and Wagner in 1999 has produced similar results in hippocampal slices in which the postsynaptic activity was controlled directly by different stimulation protocols (that did not induce plasticity) directly in the slice. Here too they observed that the threshold in highly stimulated slices is higher than in control slices.

From rates to spikes

BCM, as originally formulated, is a phenomenological rate-based model (see models of synaptic plasticity). As such, the concept of spike-timing plays no role in the theory. A more detailed treatment of the biological mechanisms should be able to account for both spike-timing dependent plasticity (STDP) and BCM. Under certain assumptions the qualitative features of the BCM rule can be seen as a consequence of a spiking model (Izhikevich and Desai, 2003, Pfister and Gerstner, 2006) or as a consequence of a more complete biophysical model (Castellani et al., 2001, Shouval et al., 2002, Yeung et al., 2004)). These more biophysical models have not been formally shown to be equivalent to BCM, but they do capture some of the features of BCM.

Biophysics of BCM theory

The BCM theory can be formulated by a matrix equation on synaptic weights or neuronal activities (Castellani et al., 1999):
\[\dot W= X^T\Phi\] \(\dot Y= XX^T\Phi\)
where the matrices \(W\ ,\) \(Y\ ,\) \(\Phi\) and \(X\) contain the weights, the output, the activation functions, and the input, respectively. This formalism allows for studying stability properties in the case of networks of connected non-linear neurons. The stability properties are highly influenced by the connectivity schema. It has been proposed that one of the elementary mechanisms governing the induction of LTP and LTD is the conductance change of the AMPA ion channel as a result of a bidirectional phosphorylation or dephosphorylation in an activity dependent way (Kameyama et al., 2000).

This cycle can be described by the following kinetic equation (Castellani et al., 2001):
\[\dot A= R\Phi\]
where the \(R\) matrix is a function of the dynamical variables and \(A\) indicates the four state of the AMPA receptor \(A=(A, A^P, A_P, A^P_P)\ .\)

The stability properties, as well as the equivalence with the BCM model, strongly depends on the type and number of involved enzymes, and ultimately on the structure of the \(R\) matrix (Castellani et al., 2005).

BCM and scaling

One very interesting consequence of BCM is a possible explanation of what is called synaptic scaling. Recently, homeostatic forms of synaptic modification, called synaptic scaling, have been observed (Turrigiano and Nelson, 2004). In scaling, chronic increases (decreases) in the global levels of cellular activity weaken (strengthen) the synaptic weights. At first sight, scaling seems to contradict previous results on experience-dependent synaptic modifications, such as LTD and LTP, where synaptic strengths increase with high, and decrease with low levels of neural excitation. But scaling is reminiscent of metaplasticity (the sliding modification threshold). The key to the BCM explanation of homeostasis and scaling is that modifications that lead to LTD and LTP take place in seconds, while those that yield the sliding modification threshold or scaling require a much longer time period (hours) as shown in Figure 6 (Yeung et al. 2004). This time difference plays an important role in understanding the genetic basis of these events.

Selectivity

The outputs to the 5 patterns are shown with colored vertical lines, and the modification \(\phi(\cdot)\) function is drawn with the modification threshold labeled. If the activities start below the threshold, the modification threshold decreases and "picks up" some of the patterns. Likewise, if the responses are above the threshold, the threshold increases and stabilizes the output. The result, in this case with independent patterns, is that the neuron becomes responsive to just one of the patterns, and is highly selective.

In a natural image environment, the neurons become selective to orientation.

Figure 7: Examples of deprivation protocols in the natural image environment, using the BCM Rule.

Competition: monocular versus binocular deprivation

The experimental result that the speed of the response reduction to the closed eye in monocular deprivation (MD) is faster than the speed of the response reduction in binocular deprivation (BD) has led researchers to suggest that a competitive mechanism is present in the visual cortex. BCM contains such competition as a natural consequence, without imposing another competitive constraint on the synaptic modification. In essence, with binocular deprivation, the neuron's modification threshold drops, making reductions in the synapses more difficult to obtain. In contrast, the open eye in monocular deprivation keeps the activity high, and thus the threshold does not drop significantly, making the reduction of synapses easier to obtain.

Figure 8: Monocular deprivation (MD) versus Binocular deprivation (BD). The structured presynaptic input into the open eye in monocular deprivation keeps the activity high, and thus keeps the modification threshold higher than in the case of binocular deprivation. As a result, the negative part of the \(\phi\) function is larger in MD, and deprivation occurs much more rapidly than in BD.

One consequence of this is the noise-dependence discussed in the previous section.

Hebbian learning and BCM

BCM differs significantly from Hebbian modification in its consequences. These differences are briefly summarized here, but a more thorough comparison can be found in (Cooper et al., 2004) and the software package Plasticity.

Hebbian learning

BCM

Statistics

Depends only on 2nd-order statistics. Fails to converge in whitened (sphered) environments.

Depends on higher-order statistics, and can find structure in nearly any environment.

Noise dependence of deprivation

More input of uncorrelated noise into the closed eye for deprivation slows the deprivation effect.

More input into the closed eye for deprivation increases the deprivation effect.

Orientation and direction selectivity

Requires an asymmetry, either in the neural architecture or the temporal structure, to achieve direction selectivity. Both strong orientation and direction selectivity cannot be achieved simultaneously (Blais, 1999).

Spatio-temporal correlations on the order of the receptive field size are required to achieve both strong orientation and direction selectivity, such as a movie environment.