35(689.17), 35th Annual Meeting of the Society for Neuroscience (Neuroscience), November 2005 (poster)

Abstract

A fundamental problem in neuroscience is determining whether or not particular neural signals are dependent. The correlation is the most straightforward basis for such tests, but considerable work also focuses on the mutual information (MI), which is capable of revealing dependence of higher orders that the correlation cannot detect. That said, there are other measures of dependence that share with the MI an ability to detect dependence of any order, but which can be easier to compute in practice. We focus in particular on tests based on the functional covariance, which derive from work originally accomplished in 1959 by Renyi. Conceptually, our dependence tests work by computing the covariance between (infinite dimensional) vectors of nonlinear mappings of the observations being tested, and then determining whether this covariance is zero - we call this measure the constrained covariance (COCO). When these vectors are members of universal reproducing kernel Hilbert spaces, we can prove this covariance to be zero only when the variables being tested are independent. The greatest advantage of these tests, compared with the mutual information, is their simplicity – when comparing two signals, we need only take the largest eigenvalue (or the trace) of a product of two matrices of nonlinearities, where these matrices are generally much smaller than the number of observations (and are very simple to construct). We compare the mutual information, the COCO, and the correlation in the context of finding changes in dependence between the LFP and MUA signals in the primary visual cortex of the anaesthetized macaque, during the presentation of dynamic natural stimuli. We demonstrate that the MI and COCO reveal dependence which is not detected by the correlation alone (which we prove by artificially removing all correlation between the signals, and then testing their dependence with COCO and the MI); and that COCO and the MI give results consistent with each other on our data.

Thorpe et al (Nature 381, 1996) first showed how rapidly human observers are able to classify natural images as to whether they contain an animal or not. Whilst the basic result has been replicated using different response paradigms (yes-no versus forced-choice), modalities (eye movements versus button presses) as well as while measuring neurophysiological correlates (ERPs), it is still unclear which image features support this rapid categorisation. Recently Torralba and Oliva (Network: Computation in Neural Systems, 14, 2003) suggested that simple global image statistics can be used to predict seemingly complex decisions about the absence and/or presence of objects in natural scences. They show that the information contained in a small number (N=16) of spectral principal components (SPC)—principal component analysis (PCA) applied to the normalised power spectra of the images—is sufficient to achieve approximately 80% correct animal detection in natural scenes.
Our goal was to test whether human observers make use of the power spectrum when rapidly classifying natural scenes. We measured our subjects' ability to detect animals in natural scenes as a function of presentation time (13 to 167 msec); images were immediately followed by a noise mask. In one condition we used the original images, in the other images whose power spectra were equalised (each power spectrum was set to the mean power spectrum over our ensemble of 1476 images). Thresholds for 75% correct animal detection were in the region of 20–30 msec for all observers, independent of the power spectrum of the images: this result makes it very unlikely that human observers make use of the global power spectrum. Taken together with the results of Gegenfurtner, Braun & Wichmann (Journal of Vision [abstract], 2003), showing the robustness of animal detection to global phase noise, we conclude that humans use local features, like edges and contours, in rapid animal detection.

The algorithmic classification of complex, natural scenes is generally considered a difficult task due to the large amount of information conveyed by natural images. Work by Simon Thorpe and colleagues showed that humans are capable of detecting animals within novel natural scenes with remarkable speed and accuracy. This suggests that the relevant information for classification can be extracted at comparatively limited computational cost. One hypothesis is that global image statistics such as the amplitude spectrum could underly fast image classification (Johnson & Olshausen, Journal of Vision, 2003; Torralba & Oliva, Network: Comput. Neural Syst., 2003).
We used linear discriminant analysis to classify a set of 11.000 images into animal and non-animal images. After applying a DFT to the image, we put the Fourier spectrum into bins (8 orientations with 6 frequency bands each). Using all bins, classification performance on the Fourier spectrum reached 70%. However, performance was similar (67%) when only the high spatial frequency information was used and decreased steadily at lower spatial frequencies, reaching a minimum (50%) for the low spatial frequency information. Similar results were obtained when all bins were used on spatially filtered images. A detailed analysis of the classification weights showed that a relatively high level of performance (67%) could also be obtained when only 2 bins were used, namely the vertical and horizontal orientation at the highest spatial frequency band.
Our results show that in the absence of sophisticated machine learning techniques, animal detection in natural scenes is limited to rather modest levels of performance, far below those of human observers. If limiting oneself to global image statistics such as the DFT then mostly information at the highest spatial frequencies is useful for the task. This is analogous to the results obtained with human observers on filtered images (Kirchner et al, VSS 2004).

The 22nd International Conference on Machine Learning, August 2005 (talk)

Abstract

We propose a general framework for learning from labeled and unlabeled data on a directed graph in which the structure of the graph including the directionality of the edges is considered. The time complexity of the algorithm derived from this framework is nearly linear due to recently developed numerical techniques. In the absence of labeled instances, this framework can be utilized as a spectral clustering method for directed graphs, which generalizes the spectral clustering approach for undirected graphs. We have applied our framework to real-world web classification problems and obtained encouraging results.

The algorithmic classification of complex, natural scenes is generally considered a difficult
task due to the large amount of information conveyed by natural images. Work by Simon
Thorpe and colleagues showed that humans are capable of detecting animals within novel natural
scenes with remarkable speed and accuracy. This suggests that the relevant information
for classification can be extracted at comparatively limited computational cost. One hypothesis
is that global image statistics such as the amplitude spectrum could underly fast image classification
(Johnson & Olshausen, Journal of Vision, 2003; Torralba & Oliva, Network: Comput.
Neural Syst., 2003).
We used linear discriminant analysis to classify a set of 11.000 images into animal and nonanimal
images. After applying a DFT to the image, we put the Fourier spectrum of each image
into 48 bins (8 orientations with 6 frequency bands). Using all of these bins, classification
performance on the Fourier spectrum reached 70%. In an iterative procedure, we then removed
the bins whose absence caused the smallest damage to the classification performance (one
bin per iteration). Notably, performance stayed at about 70% until less then 6 bins were left.
A detailed analysis of the classification weights showed that a comparatively high level of
performance (67%) could also be obtained when only 2 bins were used, namely the vertical
orientations at the highest spatial frequency band. When using only a single frequency band
(8 bins) we found that 67% classification performance could be reached when only the high
spatial frequency information was used, which decreased steadily at lower spatial frequencies,
reaching a minimum (50%) for the low spatial frequency information. Similar results were
obtained when all bins were used on spatially pre-filtered images.
Our results show that in the absence of sophisticated machine learning techniques, animal
detection in natural scenes is limited to rather modest levels of performance, far below those
of human observers. If limiting oneself to global image statistics such as the DFT then mostly
information at the highest spatial frequencies is useful for the task. This is analogous to the
results obtained with human observers on filtered images (Kirchner et al, VSS 2004).

A psychometric function can be described by its shape and four parameters: position or threshold, slope or width, false alarm rate or chance level, and miss or lapse rate. Depending on the parameters of interest some points on the psychometric function may be more informative than others. Adaptive methods attempt to place trials on the most informative points based on the data collected in previous trials. We introduce a new adaptive bayesian psychometric method which collects data for any set of parameters with high efficency. It places trials by minimizing the expected entropy [1] of the posterior pdf over a set of possible stimuli. In contrast to most other adaptive methods it is neither limited to threshold measurement nor to forced-choice designs. Nuisance parameters can be included in the estimation and lead to less biased estimates. The method supports block designs which do not harm the performance when a sufficient number of trials are performed. Block designs are useful for control of response bias and short term performance shifts such as adaptation. We present the results of evaluations of the method by computer simulations and experiments with human observers. In the simulations we investigated the role of parametric assumptions, the quality of different point estimates, the effect of dynamic termination criteria and many other settings.
[1] Kontsevich, L.L. and Tyler, C.W. (1999): Bayesian adaptive estimation of psychometric slope and threshold. Vis. Res. 39 (16), 2729-2737.

In psychophysical studies of perception the psychometric function is used to model the relation between the physical stimulus intensity and the observer's ability to detect or discriminate between stimuli of different intensities. We propose the use of Bayesian inference to extract the information contained in experimental data to learn about the parameters of psychometric functions. Since Bayesian inference cannot be performed analytically we use a Markov chain Monte Carlo method to generate samples from the posterior distribution over parameters. These samples can be used to estimate Bayesian confidence intervals and other characteristics of the posterior distribution. We compare our approach with traditional methods based on maximum-likelihood parameter estimation combined with parametric bootstrap techniques for confidence interval estimation. Experiments indicate that Bayesian inference methods are superior to bootstrap-based methods and are thus the method of choice for estimating the psychometric function and its confidence-intervals.

We discuss reproducing kernel Hilbert space (RKHS)-based measures of statistical dependence,
with emphasis on constrained covariance (COCO), a novel criterion to
test dependence of random variables. We show that COCO is a test for independence if and only if the associated RKHSs
are universal.
That said, no independence
test exists that can distinguish dependent and independent random variables in all circumstances. Dependent random variables can result in a COCO which is arbitrarily close to zero when the source densities are highly non-smooth. All current kernel-based independence tests share this behaviour. We demonstrate exponential convergence between the population and empirical COCO. Finally, we use COCO as a measure of joint neural activity between voxels in MRI recordings of the macaque monkey, and compare the results to the mutual information and the correlation. We also show the effect of removing breathing artefacts from the MRI recording.

Kernel-methods are popular tools in machine learning and statistics that can be implemented in a simple feed-forward neural network. They have strong connections to several psychological theories. For example, Shepard‘s universal law of generalization can be given a kernel interpretation. This leads to an inner product and a metric on the psychological space that is different from the usual Minkowski norm. The metric has psychologically interesting properties: It is bounded from above and does not have additive segments. As categorization models often rely on Shepard‘s law as a model for psychological similarity some of them can be recast as kernel-methods. In particular, ALCOVE is shown to be closely related to kernel logistic regression. The relationship to the Generalized Context Model is also discussed. It is argued that functional analysis which is routinely used in machine learning provides valuable insights also for psychology.

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems