Thoughts and comments on some of our reading in theoretical neuroscience, systems biology, and network theory.

Menu

Where is the noise? Key features of spontaneous activity and neural variability arise through learning in a deterministic network

Introduction

In this article (which can be found here) the authors try to reproduce in a spiking recurrent neural network key features of the spontaneous activity and neural variability observed physiologically in humans and animals. The authors focus on four features found in the experimental literature:

Trial-to-trial variability […] decreases following the onset of a sensory stimulus… [1]

Spontaneous activity outlines evoked sensory activity [2,3] — meaning that once a (metric) space of neural activity has been defined (for instance [0Hz,50Hz]^N for the firing rates of a population of N neurons), the spontaneous activity falls into similar regions as evoked activity.

Similarity between spontaneous and evoked activity increases during development [4,5]

Spontaneous activity (prior to stimulus onset) can be used to predict perceptual decisions (e.g. a perceptual decision can mean, in this context, classifying an ambiguous Face-Vase image into one of two categories: Face or Vase) [6,7].

SORN networks

The artificial neural network used in the study is called S.O.R.N, which stands for Self-Organizing Recurrent Neural network. Earlier papers from the group described this type of network and its capabilities in detail [8,9]. In particular these networks are more efficient than reservoir computing networks in sequential learning tasks [8], and in learning artificial grammars [9]. Open source (Python) code for simulating this network is available here.

Network units

In the present study, the network is composed of 200 excitatory units and 40 inhibitory units. Each unit performs a weighted sum of its inputs, compares it to a dynamic threshold, T, and applies a heaviside step function Θ in order to produce a binary output. Below are the update equations for the excitatory population x and the inhibitory one, y; the subscripts for the weights, W,stand for: E=Excitatory, I=Inhibitory, U=External inputs.

Plasticity rules

The excitatory units of the network obey three plasticity rules:

(discrete-time) STDP: the weight from neuron i to j is increased if neuron j spikes right after neuron i and decreased if neuron j spikes right before neuron i. Authors claim this rule to be the main learning mechanism in the network.

Synaptic Normalization (SN): all incoming connections are scaled at each time step, for each neuron, so that they sum to 1. The same holds for all outgoing connections from a neuron.This rule controls the weights range and seems to imply that the weight matrix is doubly stochastic.

Intrinsic Plasticity (IP): the threshold of each neuron varies with time in order for the average firing rate of the neuron to track a target firing rate. Those target firing rates are uniformly sampled from a small neighborhood around a fixed value of 0.1 (arbitrary units). This rule ensures stable average activity of the network. Both IP and SN are labeled homeostatic mechanisms by the authors.

Network stimulation

A stimulus for the network is defined as an activation of 10 excitatory units. In the article, stimuli are labeled by letters. Thus, the network receives the stimulus A at time n if the 10 excitatory units corresponding to A receive a 1 as an input with input weight 0.5 at time n. Stimulus B would correspond to another subset of 10 excitatory units. In general the sub-populations of A and B may overlap, but in the inference task presented below, they are chosen to be disjoint as stimulus ambiguity is an independent variable.

Input weights are always 0.5.

Sequence learning task

A first set of experiments involving the SORN network aimed at reproducing facts 2-3 from the Introduction above.

Task timeline

Results

(corresponding to fact 2 from Intro) When both evoked and spontaneous activity are projected onto the first three principal components of the evoked activity, the authors notice two things: a) evoked activity forms 4 distinguishable clusters which ‘represent’ the letter position in the sequence. That is, the letters A and E fall in one cluster, B and F in another one, etc. b) the spontaneous activity ‘hops’ between clusters in an order consistent with the positioning 1->2->3->4->1-> etc. The authors also compared the evoked activity, the spontaneous activity and the shuffled spontaneous activity in a particular 2D-projection of population activity high-dimensional space. This projection technique is called multidimensional scaling (MDS). Here, the authors observe a high overlap of evoked and spontaneous activity, and a high separation from shuffled spontaneous activity.

(corresponding to fact 3 from Intro) Here, the authors modified the timeline presented above in order to observe the effect of learning on the KL-divergence between the distributions of evoked and spontaneous activity. Firstly, different networks were trained with the same stimuli as above but for different training times; and with p=0.5. Secondly, the evoked activity was observed with two types of stimuli: a) the same stimuli as during training (natural condition) b) sequences EDCBA and FGH which were new to the network (control condition). Spontaneous activity was subsequently observed for each network. The results are shown in Fig. 1:

Fig.1: “Spontaneous activity becomes more similar to evoked activity during learning”, and does more so for familiar stimuli than for new stimuli.

Inference task

This task was designed by the authors in order to reproduce facts 1 and 4 from the Introduction above.

Task timeline

Results

Neural variability (measured by the Fano Factor) decreases at stimulus onset, and it decreases more for stimuli that were presented frequently during training, compared to those presented rarely.

The network’s decision statistics can be modeled by sampling-based inference. See below.

Sampling-based inference

The network is assumed to perform Bayesian inference in the following way:

Only two stimuli can be presented, A and B (short version of AXXX___, BXXX___ above). Their relative frequency of presentation during the training phase of the network is the prior probability distribution: P(A), P(B)=1-P(A).

The neurons from the excitatory population that are stimulated at the presentation of A are called the A-population and similarly for the B-population, which is disjoint from the A-population. These populations play the role of sensory neurons (conditionally independent, given the stimulus) that collect evidence. A neuron from the A-population is meant to (if the encoding were perfect) spike at the presentation of A and not spike at the presentation of B. Note that the authors explain why the encoding is imperfect: It is because the neuron’s threshold and inhibitory inputs depend on the history of the network.

Each sensory neuron can:

correctly spike on presentation of the stimulus that it is meant to code

correctly remain silent on the absence of the stimulus that it is meant to code

incorrectly spike on the absence of the stimulus that it is meant to code

incorrectly remain silent on presentation of the stimulus that it is meant to code

The four probabilities corresponding to the four possible events described just above fully characterize the likelihood functions of the network. Note that only two of those probabilities — θ_1 corresponding to the first bullet point above and θ_0 corresponding to the second bullet point — are sufficient to recover the other two, as complementary events sum to 1. These two probabilities together with the stimulus prior P(A) are the only three free parameters of the model. The parameter P(A) is manipulated in the experiments whereas θ_0 and θ_1 are fitted to the network’s realizations (see Fig. 2 and 3)

In the experiment, ambiguity of the stimulus is controlled by the fraction f_A of neurons from the A-population that are actually stimulated at presentation of A, the missing A-population being replaced by a portion of the B-population of size 1-f_A.

Given the assumptions above, the number n_a of A-neurons being active at the time of stimulus (or non-stimulus) presentation follows a distribution that is the sum of two binomials; and similarly for the number n_b of B-neurons active. These numbers represent the respective active evidence collected be each population.

The authors can compute from there the expected posterior distribution over the stimulus, <P(A|n_a,n_b)> and <P(B|n_a,n_b)>=<1-P(A|n_a,n_b)>, where the average is taken over the possible values from n_a and n_b. After the fitting of θ_0 and θ_1, these average posterior distributions were explicitly computed and represented by the gray dotted lines in Fig. 2 and 3 for different values of f_A.

The authors assume that each time the network answers A or B, it is sampling from the posterior distribution. That is, the network will answer A with probability P(A|n_a,n_b), where n_a, n_b now result from a single realization. This is why the authors, and others before them, call this decision strategy sampling-based inference.

Fig. 2: Here p=P(A)=1/3 was fixed. The green and blue curves always add up to 1 and represent an average over 20 experiments of the network’s responses. The dashed gray lines represent the averaged theoretical posterior distributions over the stimulus, after fitting the parameters theta_0 and theta_1 to minimize the mean squared error to the colored curves.

Fig. 3: this figure shows the agreement of the network behavior with the sampling-based inference model along several prior distributions (x-axis). The value on the y-axis correspond to the height of the point of intersection of the two curves in Fig. 2.

Authors argue that their network might be performing Bayesian inference since the dashed line above (optimal decision) is close to the network’s performance. One must not forget that the authors effectively fitted the likelihoods of their model (θ_0 and θ_1) to the data.

On noise…

The authors insist on the fact that noise may not play the important role that we tend to attribute to it in the brain. They argue that several qualitative and quantitative features of neural variability and spontaneous activity were reproduced by the SORN network, which is completely deterministic.

…we propose that the common practice to make heavy use of noise in neural simulations should not be taken as the gold standard.

If the SORN network is presented with input A at some time step, it might produce some output. But if it is stimulated again with A at a later time, the output might be different. This is because the internal state of the network might have changed.

At the end of their discussion section, the authors attempt to formalize a theory. It goes as follows:

Define W to be the state of the world (to be understood as the environment from which stimuli are drawn), S the brain state of the animal (or human) and R the neural response.

Efficient sensory encoding should maximize the mutual information between the neural response and the world state, conditioned on the animal’s current brain state: MI(W;R|S).

This quantity can be rewritten: H(R|S)-H(R|W,S).

Maximizing H(R|S) has the meaning of neural responses keeping a high variability, given that S is known. The authors do not insist too much on this point but say:

Maximizing the first term also implies maximizing a lower bound of H(R), which means that the brain should make use of a maximally rich and diverse set of responses that, to an observer who does not have access to the true state of the world and the full brain state, should look like noise.

Finally, minimizing H(R|W,S) amounts to making R a deterministic function of W and S. In other words, making the neural response a deterministic function of the current brain and world states. This is exactly the case for the SORN network.

As a final personal comment, let us note that the authors themselves mention the efficacy of stochastic modeling in theoretical Neuroscience. Furthermore, it is well known in Mathematics that deterministic chaotic systems and stochastic systems can be statistically indistinguishable. Hence, successes (and failures) of stochastic modeling can continue to guide its use.