Wieland Brendel

The performance of recurrent neural networks in tasks like short-term memory or time-series classification heavily relies on the internal dynamics of the network in response to external stimuli. The dynamics of recurrent neural networks, on the other hand, is largely determined by the connectivity and the synaptic weights. Plasticity on the single synapse level is thought to bias the dynamics contingent on the distribution of stimuli perceived. Many studies have been describing the effects of different plasticity rules on the neural dynamics. Theoretical underpinnings, however, have largely lacked behind due to the complex interactions of local learning rules over the process of training. Here we present how concepts from game theory provide a new set of convergence results for local synaptic learning in recurrent circuits and successfully apply these principles to an example from efficient coding.

Jascha Sohl-Dickstein

Most sampling algorithms rely on a constraint known as detailed balance to guarantee the samples come from the correct distribution. Detailed balance guarantees that as many transitions happen in the forward direction as the backward direction. As a result it tends to encourage random walk behavior and slow mixing. I will present a method, applied to Hamiltonian Monte Carlo, for guaranteeing that samples come from the correct distribution without relying on detailed balance. I will also demonstrate preliminary results where this accelerates mixing.

Charles Cadieu

Redwood Center for Theoretical Neuroscience, Berkeley

November 9, 2011

Talk:
Learning Intermediate-Level Representations of Form and Motion from Natural Movies plus a speculative conjecture on the 'Gabor-function' for secondary visual area V2

Jonathan Victor

A central problem in systems neuroscience is to understand the nature of computations by neural networks. The visual system is an excellent model for addressing this question, and our knowledge of the retina illustrates the kind of success we can hope to achieve. Our understanding of cortical computations is at present much more limited. The standard model consists of a bank of feedforward filters and simple nonlinearities. However, there are substantial differences between the computations performed by real cortical neurons and computations of models based on a feedforward cascade. These differences are present not only for responses to complex real-world natural images, but, as we show, when visual cortex is probed with artificial stimuli with a relatively simple mathematical structure. The common denominator in these stimuli is the presence of high-order spatial correlations. These observations suggest that a strongly recurrent network is an appropriate basic framework for understanding cortical computations.

Tom Putzeys

University of Leuven

September 27, 2011

Talk:
Suboptimal Bayesian decoding of sensory responses

Roland Memisevic

Sparse coding (AKA "Dictionary Learning" or "Feature Learning") has become a standard tool for solving a variety of computer vision tasks, such as recognition of objects, scenes or people. Sparse coding supports recognition by re-representing images, so that their content becomes more explicit, which makes them amenable, for example, to classification. Although object recognition is an important sub-task in many computer vision problems, many interesting vision tasks go beyond static images. These include, for example, the understanding of motion or actions in videos, or of geometric relations between pairs of images. In this talk, I will show how we can turn sparse codes into "relational sparse codes", which model dependencies between images rather than the content of a single image. I will show how this can be achieved by letting sparse codes multiplicatively "gate" the connections between pixels across multiple different images. I will discuss how to efficiently perform learning and inference in the presence of these multiplicative interactions, and I will present results showing how relational codes make it possible to learn stereo features and to infer motion patterns from image pairs.

Ivana Tosic

Finding efficient representations of the 3D structure of the world based on multiple 2D views that we observe, still represents a fundamental challenge in vision both in biology and technology. As a foundation of next generation 3DTV and cinema technologies, multi-view imaging presents many exciting research challenges, mainly in compression and image analysis. Central to these challenges is a need for novel image models that capture the intrinsic structure of intra- and inter-view correlations in multi-view images.

In this talk, I will present a new sparse generative model for stereo and multi-view image representation using over-complete geometric dictionaries, which makes multi-view geometric structure explicit in the representation. Practical application of the proposed model raises two challenges:1) Given stereo (multi-view) images, how can we find sparse representations under the model?2) Given a database of multi-view images, can we learn dictionaries that yield optimal sparse representations under the model?I address both challenges by introducing Multi-View Matching Pursuit (MVMP), a novel algorithm that decomposes multi-view images into sparse representations governed by the proposed model. Subsequently, I will show how MVMP can be used within a maximum-likelihood framework to optimize dictionaries for multi-view image representation. Finally, I will demonstrate the benefits of the proposed approach for camera pose estimation in omnidirectional camera networks and compression of stereo perspective images.

Tom Tetzlaff

Norwegian University of Life Sciences, Ås, Department of Mathematical Sciences and Technology

Correlations in spike-train ensembles can seriously impair the encoding of information by their spatio-temporal structure. An inevitable source of correlation in finite neural networks is common presynaptic input to pairs of neurons. Recent studies demonstrate that spike correlations in recurrent neural networks are considerably smaller than expected based on the amount of shared presynaptic input. Here, we provide an explanation of this contradictory observation by means of a linear network model and simulations of networks of leaky integrate-and-fire neurons. We show that pairwise correlations and, hence, population-rate fluctuations are suppressed by inhibitory feedback. This assigns inhibitory neurons the new role of active decorrelation. To elucidate the effect of feedback, we compare the responses of the intact recurrent system to systems where the statistics of the feedback channel is perturbed. Perturbations of the feedback statistics can lead to a significant increase in power and coherence of the population response. In particular, neglecting correlations within the ensemble of feedback channels or between the external stimulus and the feedback amplifies population-rate fluctuations by orders of magnitude. The fluctuation suppression in homogeneous inhibitory networks is explained by a negative feedback loop in the one-dimensional dynamics of the compound activity. Similarly, a change of coordinates exposes an effective negative feedback loop in the compound dynamics of stable excitatory-inhibitory networks. The suppression of input correlations in finite networks is explained by the population averaged correlations in the linear network model: In purely inhibitory networks, shared-input correlations are canceled by negative spike-train correlations. In excitatory-inhibitory networks, spike-train correlations are typically positive. Here, the suppression of input correlations is not a result of the mere existence of correlations between excitatory (E) and inhibitory (I) neurons, but a consequence of a particular structure of correlations among the three possible pairings (EE, EI, II).

Nicolas Heess

University of Edinburgh, Informatics Forum

January 20, 2011

Talk:
Learning a generative model of images by factoring appearance and shape

Modeling the structure in natural images is a challenging problem. One hallmark of natural images is the variability of visual characteristics across different image regions and the presence of sharp boundaries between regions which arise, for instance, from objects occluding each other. Many generative models of generic natural images have difficulties representing this type of structure. In my talk I will discuss a model that addresses this problem. It builds on insights from the computer vision literature such as the layered representation of images and combines them with ideas from "deep" unsupervised learning. I will first describe the basic building block of the model, the Masked Restricted Boltzmann Machine, which allows occlusion boundaries to be modeled by factoring out the appearance of an image region from its shape, representing each with a separate RBM. The Masked RBM explicitly models the relative depth of image regions and allows for regions to overlap and occlude each other. In the second part of my talk I will describe how this model can be extended to deal with images of realistic size: While a straightforward application of the Masked RBM to large images would be expensive, an efficient extension can be obtained in the form of a field of Masked RBMs. The Field of Masked RBMs models an image in terms of a large number of independent small and partially overlapping "objects", each of which has an associated shape and appearance. Restricting the size of "objects" as well as limiting their number locally keeps inference and learning relatively efficient. Finally I will give an outlook of how the Field of Masked RBMs naturally gives rise to a recursive, hierarchical framework for modeling images at different scales and levels of abstraction, the "Deep Segmentation Network", and I will discuss some of the challenges ahead.

Joint work with Nicolas Le Roux, John Winn, and Jamie Shotton.

Christian Machens

A central goal in sensory neuroscience is to fully characterize a neuron’s input-output relation. However, strong nonlinearities in the responses of sensory neurons have made it difficult to develop models that generalize to arbitrary stimuli. Typically, the standard linear-nonlinear models break down when neurons exhibit stimulus-dependent modulations of their gain or selectivity. We studied theseissues in optic-flow processing neurons in the fly. We found that the neurons’ receptive fields are fully described by a time-varying vector field that is space-time separable. Increasing the stimulus strength, however, strongly reduces the neurons’ gain and selectivity. To capture these changes in response behavior, we extended the linear-nonlinear model by a biophysically motivated gain and selectivitymechanism. We fit all model parameters directly to the data and show that the model now characterizes the neurons’ input-output relation well over the full range of motion stimuli.

[Joint work with Franz Weber and Axel Borst]

Joerg Luecke

In the nervous system of humans and animals, sensory data arerepresented as combinations of elementary data components.While for data such as sound waveforms the elementary componentscombine linearly, other data can better be explained by non-linearforms of component super-positions. Using examples of visual data andauditory spectrogram data, I will motivate and define probabilisticgenerative models of super-position non-linearities. In benchmarkapplications the non-linear approaches are quantitatively comparedto state-of-the-art approaches for component extraction.

Crucial for the applicability of the models are efficient learningprocedures. I briefly introduce a novel learning scheme beforediscussing two main application domains of non-linear models:(A) Computational Neuroscience and (B) Computer Vision

In the first application I study predictions of non-linear models forinformation processing in primary visual cortex. New results onpredicted response properties of cortical neurons are presentedand are compared to predictions of linear models and experimentalfindings.

In Computer Vision, applications of non-linear models to theautonomous learning of objects are discussed, and recent resultsare presented.

Henning Sprekeler

We develop a group-theoretical analysis of slow feature analysis for the case where the input data are generated by applying a set of continuous transformations to static templates. As an application of the theory, we analytically derive nonlinear visual receptive fields and show that their optimal stimuli, as well as the orientation and frequency tuning, are in good agreement with previous simulations of complex cells in primary visual cortex (Berkes and Wiskott, 2005). The theory suggests that side and end stopping can be interpreted as a weak breaking of translation invariance. Direction selectivity is also discussed.

Valero Laparra

The conventional approach in Computational Neuroscience in favor of the efficient encoding hypothesis goes from image statistics to perception. It has been argued that the behavior of the early stages of biological visual processing (e.g. spatial frequency analyzers and their non-linearities) may be obtained from image samples and the efficient encoding hypothesis using no psychophysical or physiological information.In this work we address the same issue in the opposite direction, from per- ception to image statistics: we show that psychophysically fitted image representation in V1 has appealing statistical properties, e.g. approximate PDF factorization and substantial mutual information reduction, even though no statistical information is used to fit the V1 model. These results are an additional evidence in favor of the efficient encoding hypothesis.

Tom Putzeys

University of Leuven

March 16, 2010

Talk:
A new perceptual bias reveals aspects of Bayesian population decoding in the early visual system

Perceptual decisions ultimately rely on visual information encoded by a population of neurons in the visual cortex, whose reliability is limited by response variability and varies across different perceptual tasks. To reconstruct as much information as possible, the visual system has to decode population responses into more reliable sensory representations. The statistically optimal strategy is to decode the population response into a likelihood function and combine this function with prior knowledge of task characteristics to form a decision variable.Various psychophysical and neurophysiological studies have investigated to what extent such Bayesian decoding strategies are used in a variety of 2-AFC discrimination tasks. Although seemingly simple, these tasks are challenging since the visual system has to exploit the knowledge that only two stimuli are involved in the task to achieve maximal performance. More specifically, a ratio of two discrete likelihoods has to be constructed by reading out the likelihood function at two specific locations specified by infinitely small prior probability functions. To date, it is unclear whether the visual system has access to such elaborate prior information.In the present study, we infer the width of the prior functions involved in a typical 2-AFC discrimination task by measuring and modelling a new perceptual bias. Observers are presented with two known gratings of different spatial frequencies and asked to choose the grating of the highest spatial frequency. One grating is embedded in white noise, while the other is presented in either white noise, low-pass filtered or high-pass filtered noise. We find a strong perceptual bias: the perceived spatial frequency of a known grating embedded in low-pass or high-pass filtered noise is biased to respectively lower or higher spatial frequencies.To account for our results, we formulated a Bayesian population code model consisting of a physiologically-plausible encoding front-end, followed by a Bayesian decoding stage. An optimal Bayesian decoder takes the presence of filtered noise into account when decoding population responses and does not display a bias. Surprisingly, even a simplified Bayesian decoder that fails to account for filtered noise but uses small priors reaches similar high performance levels. Therefore, the only model that can account for our results discards prior information to a large extent. Fitting the model to our behavioural measurements allows us to recover the exact width of the prior functions used by our observers.

Pierre Chainais

During the last ten years, various works have provided a variety of stochastic processes with nice properties such as scale invariance and non Gaussian statistics. Such processes belong to the family of multifractal processes...They have been generalized to N dimensions, which gives rise to the possibility of image (2D) or porous media (3D) modeling for instance. Moreover, these processes can be numerically synthesized which results very useful for applications. We will present the family of multifractal processes and focus on the construction of infinitely divisible cascades in particular. We will show that they share most of the statistical properties of natural images and present an application to the modeling of images of the Sun taken by the spatial telescope EIT. We will also evoke an application to texture synthesis.

Mario Dipoppa

Ecole Normale Superieure de Lyon, France

December 14, 2007

Nima Keshvari

I will talk about a novel method for analyzing protein sequences that we developed at the MPI for Molecular Plant Physiology. The method is based on estimating the mutual information between amino acid positions in different organisms. It provides new insights into C2H2 Zinkfingers, a certain class of transcription factors. I will also give a short introduction to our web-based tool which we developed on the basis of this method.

?

Greg Stevens

A major challenge in analyzing animal behavior is to discover some underlying simplicity in complex motor actions. Here we show that the space of shapes adopted by the nematode C. elegans is surprisingly low dimensional, with just four dimensions accounting for 95% of the shape variance, and we partially reconstruct ‘equations of motion’ for the dynamics in this space. These dynamics have multiple attractors, and we find that the worm visits these in a rapid and almost completely deterministic response to weak thermal stimuli. Stimulus-dependent correlations among the different modes suggest that one can generate more reliable behaviors by synchronizing stimuli to the state of the worm in shape space. We confirm this prediction, effectively "steering" the worm in real time.

Misha Ahrens

Gatsby Unit, UCL, UK

October 17, 2007

Talk:
A new class of compact neural encoding models that capture nonlinearities and dependence on stimulus context

The fitting of meaningful models to the stimulus-response functions of neurons is often hampered by several factors. Compact models might lack the flexibility to adequately capture the nonlinear dynamics of the neural responses, but elaborate models may be hard to estimate due to a lack of good estimation algorithms and large numbers of model parameters. Here we describe a class of nonlinear neural encoding models based on multilinear (tensor) mathematics, which share many of the conveniences of linear models -- such as robust estimation algorithms and low numbers of parameters -- yet are able to capture nonlinear effects such as short-term stimulus-specific adaptation. They achieve this through an (interpretable) multiplicative factorization in an extended stimulus space. The effectiveness of the methods is illustrated on firing rate (PSTH) data from primary auditory cortex.I will also briefly discuss extensions of the fitting algorithms: first, joint regularization in the various factorized stimulus dimensions using a variational approximation, and second, a variant of IRLS (iteratively re-weighted least squares) that can be used to efficiently fit the models to spike trains through the point-process likelihood.

Michael Bach

Visual acuity (VA) is the most basic and widely used measure of visual function. Much hinges on it, for instance the outcome of clinical studies or medico-legal issues of RENTE. Thus a reliable (goal 1) and speedy (2) measure is required, and both the influence of the examiner (3) and the examinee (4) needs to be minimized. Signal detection theory helps towards goals 1 & 2: Adaptive staircase procedures optimize speed and accuracy, forced choice reduces the observer criterion. Goal 3 is addressed by automating the procedure. These ideas are embodied in the Freiburg Acuity Test (FrACT), which will be demonstrated with members of the audience as subjects. For patients who cannot or will not fully cooperate (goal 4), visual evoked potentials provide non-invasive objective assessment of VA. We have advanced that methodology employing signal statistics, Fourier techniques and fully automatic algorithms to a state where we close in on subjective acuity within ± 1 octave in 95% of the cases. Plans will be outlined to address the question to what degree subjective perception coincides with the visual resolution of V1.

Jonathan Pillow

A central problem in systems neuroscience is to understand how ensembles of neurons convey information in their collective spiking activity. Correlations, or statistical dependencies between neural responses, are of critical importance to understanding the neural code, as they affect both the amount of information carried by population responses and the manner in which downstream brain areas are able decode it. I will show that multi-neuronal correlations can be understood using a simple, highly tractable computational model. The model captures both the stimulus dependence and detailed spatio-temporal correlation structure in the light responses of a complete population of parasol retinal ganglion cells (27 cells), making it possible to assess how correlations affect the encoding of stimulus-related information. We find that correlations strongly influence the precise timing of spike trains, explaining a large fraction of trial-to-trial response variability in individual neurons that otherwise would be attributed to intrinsic noise. We can assess the importance of correlations by performing Bayesian decoding of multi-neuronal spike trains; we find that exploiting the full correlation structure of the population response preserves 20% more stimulus-related information than decoding under the assumption of independent encoding. These results provide a framework for understanding the role that correlated activity plays in encoding and decoding sensory signals, and should be applicable to the study of population coding in a wide variety of neural circuits.

Andreas Steimer

Institute of Neuroinformatics, ETH, Zürich

May 25, 2007

Talk:
Implementing the Belief-Propagation Algorithm with Networks of Spiking Neurons - An approach based on Liquid-State-Machines

In many real world situations living beings have to deal with incomplete knowledge about their environment and still have to be able to act reasonably. For example, think of a herbivore who sees just parts of a predator hiding in high grass. Based on this incomplete visual information, the animal has to infer the 'true' stimulus (the predator) in order to initiate an appropriate behavioral response (to flee). A large variety of such problems, e.g. in the general framework of object recognition, can be described by so called 'graphical models' like Bayesian Networks or Markov Random Fields [Löliger(2004),Weiss&Freeman(2001)]. These models describe statistical relationships between a set of variables and give rise to algorithms computing probabilities about states of unknown variables based on the observed information. 'Belief-Propagation' is an efficient method for this task [Kschischang et al.(2001), Rao(2006)] and is also a potential candidate for a biological implementation in the brain, because it is entirely based on local information processing [Rao(2006)]. Computational units (the nodes of a graphical model) communicate with each other by distributing so called 'messages' exclusively to their neighbors in the graph. Within this contextual framework, our working hypothesis is that the experimentally observed patches of synaptic boutons, prominent in layer 2/3 all over the cortex [Angelucci et al.(2002)], are a physical representation of nodes in a graphical model. At the same time we assume that a patch constitutes a canonical microcircuit of neurons which is able to calculate the Belief-Propagation message update equations. For that, each patch is interpreted as a collection of 'Liquid State Machines' [Maass et al.(2002)] consisting of a common liquid-pool of recurrently connected neurons, and several readout units. Each combination of the liquid-pool and any readout, realizes a particular message signal transmitted from a node to one of its neighbors. Messages arriving at a node are all fed into the common liquid-pool, whose main task is to implement nonlinear projections of the low-dimensional input into a high-dimensional space [Maass et al.(2002)]. This operation is crucial for the Belief-Propagation algorithm which utilizes highly nonlinear message update rules [Löliger(2004)]. 'pDelta' learning [Auer et al.(2005)] is used for the supervised training of the readouts. It is based on populations of perceptron units, however it can also be applied to spiking neurons when message signals are represented in space rate code [Maass et al.(2002)]. Therefore, our aim is to formulate microcircuits with a modular character, i.e. with the same input and output coding of spikes.

Michael Schnabel

Bernstein Center for Computational Neuroscience, Göttingen

May 11, 2007

Talk:
A symmetry of the visual world in the architecture of the visual cortex

We provide evidence that signatures of shift-twist symmetry (STS), a fundamental symmetry of visual cortical architecture [6,7], are present in the layout of tree shrew V1 orientation maps. On the theoretical side, we investigate the possible effects of STS on orientation map layout by modeling OPMs within two different frameworks, Gaussian random fields and complex planforms and find in both cases that STS leads to a specific coupling of orientation map to the visuotopic map. This coupling can occur in two different ways, related to "even" and "odd" solutions previously introduced in [6,7]. However, our data analysis reveals that just one case - the "odd" one - is realized in tree shrew OPMs. Due to the prevalence of collinear contours in natural images [9] STS is an inherent feature of natural image statistics. However, in terms of the two symmetry classes, natural images belong to the opposite - the "even" - class.

In order to address the question of whether STS observed in tree shrew OPMS may reflect natural image statistics, i.e. collinearity, we simulated map development using a modified elastic net model [4]. In this model, instead of presenting isolated, pointlike contour elements (as originally done in [4]) we used elongated collinear arrangements of contour elements for network training. Resulting maps showed an increasing degree of STS with increasing degree of collinearity of the stimuli used.Their correlation functions as well as geometric arrangement of separate columns with given orientation preferences were found to be consistent with the layout of OPMs in tree shrew V1. These results suggest that the signatures of STS observed in tree shrew OPMs might originate from the structure of natural scene statistics.

Hans-Peter Frey

Bottom-up stimulus features and top-down control both contribute to the allocation of visual attention. The aim of my research was to gain a better understanding of how and when these processes control attention. The talk comprises 2 parts that illuminate the problem from somewhat different, but complementary angles.In the first part, the most prominent model in the field of bottom-up control, the saliency map, is analyzed. We compared its predictions to eye-movements made by human observers in different categories of grayscale as well as colored images. We analyzed the models performance for different numbers of fixations, with an unexpected result.In the second part, different image features at fixated image regions are considered. I show that color-contrast influences eye-movements, but that this is not a sign for a causal relation. Top-down influences override the features' effect, as soon as we manipulate the stimuli.

Human depth perception involves combining multiple, possibly conflicting, sensory measurements.Previous work with slightly conflicting cues has shown that this process is performed by statistical optimal weighted averaging.Here we ask whether the brain has a mechanism to be robust to large cue conflicts.We investigated how disparity and texture are combined in estimating slant as a function of their conflict. When the two cues only had a small conflict, we found evidence for optimally weighted averaging. At larger conflicts, we observed robust behavior in which one of the discrepant cues was rejected. Interestingly, the ignored cue could be either disparity or texture, and was not necessarily the less reliable cue. Optimally weighted averaging has previously been modeled as the combination of Gaussian sensory estimates. We show that both weighted averaging and robustness are predicted if the tails of the sensory estimates are heavier than a Gaussian. Lastly, we probed to see whether access to single-cue estimates determined robustness behavior. We found no evidence for access, suggesting nearly full cue fusion. We used this data to estimate a 'coupling prior' for disparity-texture combination.

Andreas Kotowicz

University of Würzburg

April 11, 2007

Talk:
Modeling feature based top-down attention during visual search

Christian Machens

During short-term memory maintenance, different neurons in prefrontal cortex (PFC), recorded under identical conditions, show a wide variety of temporal dynamics and response properties [1]. These data are a specific example of the more general finding that neural recordings from frontal cortices often reveal that different neurons have very different response characteristics. Modeling this complexity of responses has been difficult. Most commonly, some features of the responses are focused on, and models that fit those reduced features are built (e.g., [2]). But can the full complexity of responses be easily captured?

We have previously reported that the complex responses in PFC during short-term memory can be summarized in 5 dimensions (i.e., 5 parameters suffice to capture most of the variance in the data across neurons; Machens et al., COSYNE ’06). Olasagasti, Goldman, and colleagues have described a method to fit experimentally-obtained steady-state firing rates (that is, no dynamics) in a network model of persistent activity (Olasagasti et al., COSYNE ’05, COSYNE ’06). We now combine and extend these two approaches, and show how a simple linear fitting procedure leads to a model that describes the data in few dimensions yet captures most of the complexity and dynamics of the neural responses.

Let us assume we have observed, experimentally, M timepoints in the firing rates of N neurons– a total of M ⋅ N data points. Let us model this data in a recurrent network of N neurons, with full connectivity. Such a network will have Nˆ2 weights (i.e., as yet undetermined connection strengths). If N > M we have more unknowns than data points, and we could in principle solve the system exactly, reproducing all of the measured neural firing rates. The fitting procedure we use to achieve this requires the inversion of a matrix D representing all the data. To avoid overfitting the data, we use the singular value decomposition to represent, and then easily invert, the data matrix D: setting small singular values to zero corresponds to reducing the dimensionality of the model, which avoids overfitting. For the PFC data during short-term memory that we have previously analyzed, we find, in accordance with our previous results, that five dimensions suffice to describe the data (Machens et al., COSYNE ’06). The current approach now maps these data directly onto a neural network model, reproducing the dynamics of the data with most of their experimentally-observed richness and variety.