Introduction

Leabra stands for Local, Error-driven and Associative, Biologically Realistic Algorithm, and it implements a balance between associative (Hebbian) and error-driven learning on top of a biologically-based point-neuron activation function with inhibitory competition dynamics (either via inhibitory interneurons or an approximation thereof), which produce k-Winners-Take-All (kWTA) sparse distributed representations. Extensive documentation is available from the online textbook: http://ccnbook.colorado.edu which serves as a second edition to the original book:
Computational Explorations in Cognitive Neuroscience: Understanding
the Mind by Simulating the Brain, O'Reilly and Munakata, 2000,
Cambridge, MA: MIT Press. (Computational Explorations..)

As of November 3, 2014: Please see Leabra 710 Update for important information about updating to the 7.1.0 version of emergent.

Overview

The basic activation dynamics of Leabra are based on standard electrophysiological principles of real neurons, and in discrete spiking mode we implement exactly the AdEx (adapting exponential) model of Gerstner and colleagues (Scholarpedia article on AdEx). The rate code mode (which runs faster and allows for smaller networks) provides a very close approximation to the AdEx model behavior, in terms of a graded activation signal matching the actual instantaneous rate of spiking across a population of AdEx neurons. We generally conceive of a single rate-code neuron as representing a microcolumn of roughly 100 spiking pyramidal neurons in the neocortex. There are also short-term plasticity dynamics that produce synaptic depression and potentiation, as described in Hennig (2013).

The excitatory synaptic input conductances (i.e., net input) is computed as an average, not a sum, over connections, based on normalized, sigmoidaly transformed weight values, which are subject to scaling on a connection-group level to alter relative contributions. Automatic scaling is performed to
compensate for differences in expected activity level in the different projections. See Leabra Netin Scaling for details.

Inhibition is computed using a feed-forward (FF) and feed-back (FB) inhibition function (FFFB) that closely approximates the behavior of inhibitory interneurons in the neocortex. FF is based on a multiplicative factor applied to the average net input coming into a layer, and FB is based on a multiplicative factor applied to the average activation within the layer. These simple linear functions do an excellent job of controlling the overall activation levels in bidirectionally connected networks, producing behavior very similar to the more abstract computational implementation of kWTA dynamics implemented in previous versions.

There is a single learning equation, derived from a very detailed model of spike timing dependent plasticity (STDP) by Urakubo et al, 2008, that produces a combination of Hebbian associative and error-driven learning. For historical reasons, we call this the XCAL equation (eXtended Contrastive Attractor Learning), and it is functionally very similar to the BCM learning rule developed by Bienenstock, Cooper & Munro (1982). The essential learning dynamic involves a Hebbian co-product of sending neuron activation times receiving neuron activation, which biologically reflects the amount of calcium entering through NMDA channels, and this co-product is then compared against a floating threshold value. To produce the Hebbian learning dynamic, this floating threshold is based on a long-term running average of the receiving neuron activation. This is the key idea for the BCM algorithm. To produce error-driven learning, the floating threshold is based on a much faster running average of activation co-products, which reflects an expectation or prediction, against which the instantaneous, later outcome is compared.

Weights are subject to a contrast enhancement function, which compensates for the soft (exponential) weight bounding that keeps weights within the normalized 0-1 range. Contrast enhancement is important for enhancing the selectivity of self-organizing learning, and generally results in faster learning with better overall results. Learning operates on the underlying internal linear weight value. Biologically, we associate the underlying linear weight value with internal synaptic factors such as actin scaffolding, CaMKII phosphorlation level, etc, while the contrast enhancement operates at the level of AMPA receptor expression. We also support (optionally) a distinction between fast and slow weights -- biologically fast weights reflect the early LTP dynamics, which rise and fall over roughly 15 minutes after induction, while slow weights reflect the stable long-term weight values that the fast weights decay back to.

There are various extensions to the algorithm that implement special neural mechanisms associated with the prefrontal cortex and basal ganglia (PBWM), dopamine systems PVLV, the Hippocampus, and temporal integration dynamics associated with the thalamocortical circuits (LeabraTI, and CIFER).

Leabra Algorithm Equations

The pseudocode for Leabra is given here, showing exactly how the pieces of the algorithm fit together, using the equations and variables from the actual code. The emergent implementation contains a number of optimizations (including vectorization and GPU code), but this provides the core math in simple form.

See the Matlab directory in the emergent svn source directory for a complete implementation of these equations in Matlab, coded by Sergio Verduzco-Flores -- this can be a lot simpler to read than the highly optimized C++ source code.

Variables

LeabraUnits are organized into LeabraLayers, which sometimes have unit groups (which are now typically purely virtual, not actual Unit_Group objects). The LeabraUnit has the following key parameters, along with a number of others that are used for other non-default algorithms and various optimizations, etc.

avg_l_lrn = how much to use the avg_l-based Hebbian learning for this receiving unit's learning -- in addition to the basic error-driven learning -- this is dynamically updated based on the avg_l factor, so that this Hebbian learning constraint can be stronger as a unit gets too active and needs to be regulated more strongly.

avg_s_eff = effective avg_s value used in learning -- includes a small fraction (.1) of the avg_m value, for reasons explained below.

Units are connected via synapses parameterized with the following variables. These are actually stored in an optimized vector format, but the LeabraCon object contains the variables as a template.

this is the exponential component of AdEx, if in use (typically only for discrete spiking), exp_slope = .02 default

v_m += dt.integ * dt.vm_dt * (I_net - adapt)

in emergent, we use a simple midpoint method that evaluates v_m with a half-step time constant, and then uses this half-step v_m to compute full step in above I_net equation. vm_dt = 1/3.3 default.

v_m is always computed as in discrete spiking, even when using rate code, with v_m reset to vm_r etc -- this provides a more natural way to integrate adaptation and short-term plasticity mechanisms, which drive off of the discrete spiking.

v_m_eq += dt.integ * dt.vm_dt * (I_net - adapt)

the equilibrium version of the membrane potential does not reset with spikes, and is important for rate code per below

the amount of excitatory conductance required to put the neuron exactly at the firing threshold, thr = .5 default.

if(v_m > spk_thr) { spike = 1; v_m = vm_r } else { spike = 0 }

spk_thr is spiking threshold (1.2 default, different from rate code thr), vm_r = .3 is the reset value of the membrane potential after spiking -- we also have an optional refractory period after spiking, default = 3 cycles, where the vm equations are simply not computed, and vm remains at vm_r.

it is important that the time to first "spike" be governed by v_m integration dynamics, but after that point, it is essential that activation drive directly from the excitatory conductance (g_e or net) relative to the g_e_thr threshold -- activation rates are linear in this term, but not even a well-defined function of v_m_eq -- earlier versions of Leabra only used the v_m_eq-based term, and this led to some very strange behavior.

NXX1 = noisy-x-over-x+1 function, which is implemented using a lookup table due to the convolving of the XX1 function with a gaussian noise kernel

XX1(x) = gain * x / (gain * x + 1)

gain = 100 default

act_nd += dt.integ * dt.vm_dt * (new_act - act_nd)

non-depressed rate code activation is time-integrated using same vm_dt time constant as used in v_m, from the new activation value

act = act_nd * syn_tr (or just act_nd)

if short-term plasticity is in effect, then syn_tr variable reflects the synaptic transmission efficacy, and this product provides the net signal sent to the receiving neurons. otherwise syn_tr = 1.

Learning

The XCAL dWt function, showing direction and magnitude of synaptic weight changes (dWt) as a function of the short-term average activity of the sending neuron (x) times the receiving neuron (y). This quantity is a simple mathematical approximation to the level of postsynaptic Ca++, reflecting the dependence of the NMDA channel on both sending and receiving neural activity. This function was extracted directly from the detailed biophysical Urakubo et al. (2008) model, by fitting a piecewise linear function to the synaptic weight change behavior that emerges from it as a function of a wide range of sending and receiving spiking patterns.

After iterating over some number of cycles, with typically the first set of 50 to 75 cycles being in the "minus phase" (no target values clamped on output layers, or other sources of plus phase signals as in LeabraTI), and the final 25 cycles being the "plus phase", where targets or other training signals are present, then learning equations are computed. These are based on running averages of activations as shown first.

Running averages computed continuously every cycle, and note the compounding form (see LeabraUnitSpec.cpp for code)

avg_ss += dt.integ * ss_dt * (act_nd - avg_ss)

super-short time scale running average, ss_dt = 1/2 default -- this was introduced to smooth out discrete spiking signal, but is also useful for rate code

avg_s += dt.integ * act_avg.s_dt * (avg_ss - avg_s)

short time scale running average, s_dt = 1/2 default -- this represents the "plus phase" or actual outcome signal in comparison to avg_m

long-term running average -- this is computed just once per learning trial, not every cycle like the ones above -- max = 1.5, min = .1, dt = .1 by default

this is a basic exponential approach to max / min depending on whether unit is active or not -- having a clear max / min is important for the learning modulation shown next -- x ? y : z terminology is C syntax for if x is true, then y, else z

learning strength factor for how much to learn based on avg_l floating threshold -- this is dynamically modulated by strength of avg_l itself, and this turns out to be critical -- the amount of this learning increases as units are more consistently active all the time (i.e., "hog" units). avg_l.lrn_min = 0.005, avg_l.lrn_max = 0.05.

this depends on having a clear max to avg_l, which is why we use the exponential bounding form above.

avg_s_eff = m_in_s * avg_m + (1 - m_in_s) * avg_s

mix in some of the medium-term factor into the short-term factor -- this is important for ensuring that when neuron turns off in the plus phase (short term), that enough trace of earlier minus-phase activation remains to drive it into the LTD weight decrease region -- m_in_s = .1 default.

this is now done at the unit level -- previously was done at the connection level which is much less efficient!

Learning equation (see LeabraConSpec.h for code) -- most of these are intermediate variables used in computing final dwt value

srs = ru->avg_s_eff * su->avg_s_eff

short-term sender-receiver co-product -- this is the intracellular calcium from NMDA and other channels

weight change is sum of two factors: error-driven based on medium-term threshold (srm), and BCM Hebbian based on long-term threshold of the recv unit (ru->avg_l)

in earlier versions, the two factors were combined into a single threshold value, using normalized weighting factors -- this was more elegant, but by separating the two apart, we allow the hebbian component to use the full range of the XCAL function (as compared to the relatively small avg_l_lrn factor applied inside the threshold computation). By multiplying by avg_l_lrn outside the XCAL equation, we get the desired contrast enhancement property of the XCAL function, where values close to the threshold are pushed either higher (above threshold) or lower (below threshold) most strongly, and values further away are less strongly impacted.

m_lrn is a constant and is typically 1.0 when error-driven learning is employed (but can be set to 0 to have a completely Hebbian model).

XCAL is the "check mark" linearized BCM-style learning function (see figure) that was derived from the Urakubo Et Al (2008) STDP model, as described in more detail in the CCN textbook: http://ccnbook.colorado.edu

The fwt value here is the linear, non-contrast enhanced version of the weight value, while wt is the sigmoidal contrast-enhanced version, which is used for sending netinput to other neurons. One can compute fwt from wt and vice-versa, but numerical errors can accumulate in going back-and forth more than necessary, and it is generally faster to just store these two weight values (and they are needed for the fast weights version show below).

Simple Recurrent Network Context Layer

Scalar, Graded Value Representation

ScalarValLayerSpec -- encodes and decodes scalar, real-numbered values based on a coarse coded distributed representation (e.g., a Gaussian bump) across multiple units. This provides a very efficient and effective way of representing scalar values -- individual Leabra units do not do a very good job of that, as they have a strong binary bias.

Temporal Differences and General Da (dopamine) Modulation

PVLV -- Pavlovian Conditioning

Simulates behavioral and neural data on Pavlovian conditioning and the midbrain dopaminergic neurons that fire in proportion to unexpected rewards (an alternative to TD). It is described in these papers: O'Reilly, Frank, Hazy & Watz, 2007Hazy, Frank & O'Reilly, 2010. The current version (described here) is as described in the 2010 paper. A PVLV model can be made through the LeabraWizard -- under Networks menu.