journal articles

A fundamental function of the visual system is to encode the building blocks of natural scenes-edges, textures and shapes-that subserve visual tasks such as object recognition and scene understanding. Essential to this process is the formation of abstract representations that generalize from specific instances of visual input. A common view holds that neurons in the early visual system signal conjunctions of image features, but how these produce invariant representations is poorly understood. Here we propose that to generalize over similar images, higher-level visual neurons encode statistical variations that characterize local image regions. We present a model in which neural activity encodes the probability distribution most consistent with a given image. Trained on natural images, the model generalizes by learning a compact set of dictionary elements for image distributions typically encountered in natural scenes. Model neurons show a diverse range of properties observed in cortical cells. These results provide a new functional explanation for nonlinear effects in complex cells and offer insight into coding strategies in primary visual cortex (V1) and higher visual areas.

Capturing statistical regularities in complex, high-dimensional data is an important problem in machine learning and signal processing. Models such as PCA and ICA make few assumptions about the structure in the data, have good scaling properties, but are limited to representing linear statistical regularities and assume that the distribution of the data is stationary. For many natural, complex signals, the latent variables often exhibit residual dependencies as well as non-stationary statistics. Here we present a hierarchical Bayesian model that is able to capture higher-order non-linear structure and represent non-stationary data distributions. The model is a generalization of ICA in which the basis function coefficients are no longer assumed to be independent; instead, the dependencies in their magnitudes are captured by a set of density components. Each density component describes a common pattern of deviation from the marginal density of the pattern ensemble; in different combinations, they can describe non-stationary distributions. Adapting the model to image or audio data yields a non-linear, distributed code for higher-order statistical regularities that reflect more abstract, invariant properties of the signal.

The theoretical principles that underlie the representation and computation of higher-order structure in natural images are poorly understood. Recently, there has been considerable interest in using information theoretic techniques, such as independent component analysis, to derive representations for natural images that are optimal in the sense of coding efficiency. Although these approaches have been successful in explaining properties of neural representations in the early visual pathway and visual cortex, because they are based on a linear model, the types of image structure that can be represented are very limited. Here, we present a hierarchical probabilistic model for learning higher-order statistical regularities in natural images. This non-linear model learns an efficient code that describes variations in the underlying probabilistic density. When applied to natural images the algorithm yields coarse-coded, sparsedistributed representations of abstract image properties such as object location, scale and texture. This model offers a novel description of higher-order image structure and could provide theoretical insight into the response properties and computational functions of lower level cortical visual areas.

conference papers + abstracts

Natural sounds exhibit complex statistical regularities at multiple scales. Acoustic events underlying speech, for example, are characterized by precise temporal and frequency relationships, but they can also vary substantially according to the pitch, duration, and other high-level properties of speech production. Learning this structure from data while capturing the inherent variability is an important first step in building auditory processing systems, as well as understanding the mechanisms of auditory perception. Here we develop Hierarchical Spike Coding, a two-layer probabilistic generative model for complex acoustic structure. The first layer consists of a sparse spiking representation that encodes the sound using kernels positioned precisely in time and frequency. Patterns in the positions of first layer spikes are learned from the data: on a coarse scale, statistical regularities are encoded by a second-layer spiking representation, while fine-scale structure is captured by recurrent interactions within the first layer. When fit to speech data, the second layer acoustic features include harmonic stacks, sweeps, frequency modulations, and precise temporal onsets, which can be composed to represent complex acoustic events. Unlike spectrogram-based methods, the model gives a probability distribution over sound pressure waveforms. This allows us to use the second-layer representation to synthesize sounds directly, and to perform model-based denoising, on which we demonstrate a significant improvement over standard methods.

Efficient coding provides a powerful principle for explaining early sensory processing. Most attempts to test this principle have been limited to linear, noiseless models, and when applied to natural images, have yielded localized oriented filters (e.g., Bell and Sejnowski, 1995). Although this is generally consistent with cortical representations, it fails to account for basic properties of early vision, such as the receptive field organization, temporal dynamics, and nonlinear behaviors in retinal ganglion cells (RGCs). Here we show that an efficient coding model that incorporates ingredients critical to biological computation -- input and output noise, nonlinear response functions, and a metabolic cost on the firing rate -- can predict several basic properties of retinal processing. Specifically, we develop numerical methods for simultaneously optimizing linear filters and response nonlinearities of a population of model neurons so as to maximize information transmission in the presence of noise and metabolic costs. We place no restrictions on the form of the linear filters, and assume only that the nonlinearities are monotonically increasing.

In the case of vanishing noise, our method reduces to a generalized version of independent component analysis; training on natural image patches produces localized oriented filters and smooth nonlinearities. When the model includes biologically realistic levels of noise, the predicted filters are center-surround and the nonlinearities are rectifying, consistent with properties of RGCs. The model yields two populations of neurons, with On- and Off-center responses, which independently tile the visual space. As observed in the primate retina, Off-center neurons are more numerous and have filters with smaller spatial extent. Applied to natural movies, the model yields filters that are approximately space-time separable, with a center-surround spatial profile, a biphasic temporal profile, and a surround response that is slightly delayed relative to the center, consistent with retinal processing.

Efficient coding provides a powerful principle for explaining early sensory coding. Most attempts to test this principle have been limited to linear, noiseless models, and when applied to natural images, have yielded oriented filters consistent with responses in primary visual cortex. Here we show that an efficient coding model that incorporates biologically realistic ingredients - input and output noise, nonlinear response functions, and a metabolic cost on the firing rate - predicts receptive fields and response nonlinearities similar to those observed in the retina. Specifically, we develop numerical methods for simultaneously learning the linear filters and response nonlinearities of a population of model neurons, so as to maximize information transmission subject to metabolic costs. When applied to an ensemble of natural images, the method yields filters that are center-surround and nonlinearities that are rectifying. The filters are organized into two populations, with On- and Off-centers, which independently tile the visual space. As observed in the primate retina, the Off-center neurons are more numerous and have filters with smaller spatial extent. In the absence of noise, our method reduces to a generalized version of independent components analysis, with an adapted nonlinear "contrast"function; in this case, the optimal filters are localized and oriented.

Linear implementations of the efficient coding hypothesis, such as independent component analysis (ICA) and sparse coding models, have provided functional explanations for properties of simple cells in V1. These models, however, ignore the non-linear behavior of neurons and fail to match individual and population properties of neural receptive fields in subtle but important ways. Hierarchical models, including Gaussian Scale Mixtures and other generative statistical models, can capture higher-order regularities in natural images and explain non-linear aspects of neural processing such as normalization and context effects. Previously, it had been assumed that the lower level representation is independent of the hierarchy, and had been fixed when training these models. Here we examine the optimal lower-level representations derived in the context of a hierarchical model and find that the resulting representations are strikingly different from those based on linear models. Unlike the the basis functions and filters learned by ICA or sparse coding, these functions individually more closely resemble simple cell receptive fields and collectively span a broad range of spatial scales. Our work unifies several related approaches and observations about natural image structure and suggests that hierarchical models might yield better representations of image structure throughout the hierarchy.

We present a hierarchical Bayesian model for learning efficient codes of higher-order structure in natural images. The model, a non-linear generalization of independent component analysis, replaces the standard assumption of independence for the joint distribution of coefficients with a distribution that is adapted to the variance structure of the coefficients of an efficient image basis. This offers a novel description of higherorder image structure and provides a way to learn coarse-coded, sparsedistributed representations of abstract image properties such as object location, scale, and texture.

Some genes produce transcripts that function directly in regulatory, catalytic, or structural roles in the cell. These non-coding RNAs are prevalent in all living organisms, and methods that aid the understanding of their functional roles are essential. RNA secondary structure, the pattern of base-pairing, contains the critical information for determining the three dimensional structure and function of the molecule. In this work we examine whether the basic geometric and topological properties of secondary structure are sufficient to distinguish between RNA families in a learning framework. First, we develop a labeled dual graph representation of RNA secondary structure by adding biologically meaningful labels to the dual graphs proposed by Gan et al [1]. Next, we define a similarity measure directly on the labeled dual graphs using the recently developed marginalized kernels [2]. Using this similarity measure, we were able to train Support Vector Machine classifiers to distinguish RNAs of known families from random RNAs with similar statistics. For 22 of the 25 families tested, the classifier achieved better than 70% accuracy, with much higher accuracy rates for some families. Training a set of classifiers to automatically assign family labels to RNAs using a one vs. all multi-class scheme also yielded encouraging results. From these initial learning experiments, we suggest that the labeled dual graph representation, together with kernel machine methods, has potential for use in automated analysis and classification of uncharacterized RNA molecules or efficient genome-wide screens for RNA molecules from existing families.