Independence of Conditionals (IC) has recently been proposed as a basic rule for causal structure learning. If a Bayesian network represents the causal structure, its Conditional Probability Distributions (CPDs) should be algorithmically independent. In this paper we compare IC with causal faithfulness (FF), stating that only those conditional independences that are implied by the causal Markov condition hold true. The latter is a basic postulate in common approaches to causal structure learning. The common spirit of FF and IC is to reject causal graphs for which the joint distribution looks ‘non-generic’. The difference lies in the notion of genericity: FF sometimes rejects models just because one of the CPDs is simple, for instance if the CPD describes a deterministic relation. IC does not behave in this undesirable way. It only rejects a model when there is a non-generic relation between different CPDs although each CPD looks generic when considered separately. Moreover, it detects relations between CPDs that cannot be captured by conditional independences. IC therefore helps in distinguishing causal graphs that induce the same conditional independences (i.e., they belong to the same Markov equivalence class). The usual justification for FF implicitly assumes a prior that is a probability density on the parameter space. IC can be justified by Solomonoff’s universal prior, assigning non-zero probability to those points in parameter space that have a finite description. In this way, it favours simple CPDs, and therefore respects Occam’s razor. Since Kolmogorov complexity is uncomputable, IC is not directly applicable in practice. We argue that it is nevertheless helpful, since it has already served as inspiration and justification for novel causal inference algorithms.

Neurons deep in cortex interact with the environment extremely indirectly; the spikes they receive and produce are pre- and post-processed by millions of other neurons. This paper proposes two information-theoretic constraints guiding the production of spikes, that help ensure bursting activity deep in cortex relates meaningfully to events in the environment. First, neurons should emphasize selective responses with bursts. Second, neurons should propagate selective inputs by burst-firing in response to them. We show the constraints are necessary for bursts to dominate information-transfer within cortex, thereby providing a substrate allowing neurons to distribute credit amongst themselves. Finally, since synaptic plasticity degrades the ability of neurons to burst selectively, we argue that homeostatic regulation of synaptic weights is necessary, and that it is best performed offline during sleep.

We consider the problem of imitation learning when the examples, provided by an expert human, are scarce. Apprenticeship learning via inverse reinforcement learning provides an efficient tool for generalizing the examples, based on the assumption that the expert's policy maximizes a value function, which is a linear combination of state and action features. Most apprenticeship learning algorithms use only simple empirical averages of the features in the demonstrations as a statistics of the expert's policy. However, this method is efficient only when the number of examples is sufficiently large to cover most of the states, or the dynamics of the system is nearly deterministic. In this paper, we show that the quality of the learned policies is sensitive to the error in estimating the averages of the features when the dynamics of the system is stochastic. To reduce this error, we introduce two new approaches for bootstrapping the demonstrations by assuming that the expert is near-optimal and the dynamics of the system is known. In the first approach, the expert's examples are used to learn a reward function and to generate furthermore examples from the corresponding optimal policy. The second approach uses a transfer technique, known as graph homomorphism, in order to generalize the expert's actions to unvisited regions of the state space. Empirical results on simulated robot navigation problems show that our approach is able to learn sufficiently good policies from a significantly small number of examples.

Four decades after their invention, quasi-Newton methods are still state of the art in unconstrained numerical optimization. Although not usually interpreted thus, these are learning algorithms that fit a local quadratic approximation to the objective function. We show that many, including the most popular, quasi-Newton methods can be interpreted as approximations of Bayesian linear regression under varying prior assumptions. This new notion elucidates some shortcomings of classical algorithms, and lights the way to a novel nonparametric quasi-Newton method, which is able to make more efficient use of available information at computational cost similar to its predecessors.

Most experiments assume a global transit delay time with blood flowing from the tagging region to the imaging slice in plug flow without any dispersion of the magnetization. However, because of cardiac pulsation, nonuniform cross-sectional flow profile, and complex vessel networks, the transit delay time is not a single value but follows a distribution. In this study, we explored the regional effects of magnetization dispersion on quantitative perfusion imaging for varying transit times within a very large interval from the direct comparison of pulsed, pseudo-continuous, and dual-coil continuous arterial spin labeling encoding schemes. Longer distances between tagging and imaging region typically used for continuous tagging schemes enhance the regional bias on the quantitative cerebral blood flow measurement causing an underestimation up to 37% when plug flow is assumed as in the standard model.

We study the scenario of graph-based clustering algorithms such as spectral clustering. Given a set of data points, one rst has to construct a graph on the data points and then
apply a graph clustering algorithm to nd a suitable partition of the graph. Our main question is if and how the construction of the graph (choice of the graph, choice of parameters, choice of weights) in uences the outcome of the nal clustering result. To this end we study the convergence of cluster quality measures such as the normalized cut or the Cheeger cut on various kinds of random geometric graphs as the sample size tends to innity. It turns out that the limit values of the same objective function are systematically dierent on dierent types of graphs. This implies that clustering results systematically depend on the graph and can be very dierent for dierent types of graph. We provide examples to illustrate the implications on spectral clustering.

Several aspects of primate visual physiology have been identified as adaptations to local regularities of natural images. However, much less work has measured visual sensitivity to local natural image regularities. Most previous work focuses on global perception of large images and shows that observers are more sensitive to visual information when image properties resemble those of natural images. In this work we measure human sensitivity to local natural image regularities using stimuli generated by patch-based probabilistic natural image models that have been related to primate visual physiology. We find that human observers can learn to discriminate the statistical regularities of natural image patches from those represented by current natural image models after very few exposures and that discriminability depends on the degree of regularities captured by the model. The quick learning we observed suggests that the human visual system is biased for processing natural images, even at very fine spatial scales, and that it has a surprisingly large knowledge of the regularities in natural images, at least in comparison to the state-of-the-art statistical models of natural images.

Whenever eye movements are measured, a central part of the analysis has to do with where subjects fixate and why they fixated where they fixated. To a first approximation, a set of fixations can be viewed as a set of points in space; this implies that fixations are spatial data and that the analysis of fixation locations can be beneficially thought of as a spatial statistics problem. We argue that thinking of fixation locations as arising from point processes is a very fruitful framework for eye-movement data, helping turn qualitative questions into quantitative ones. We provide a tutorial introduction to some of the main ideas of the field of spatial statistics, focusing especially on spatial Poisson processes. We show how point processes help relate image properties to fixation locations. In particular we show how point processes naturally express the idea that image features' predictability for fixations may vary from one image to another. We review other methods of analysis used in the literature, show how they relate to point process theory, and argue that thinking in terms of point processes substantially extends the range of analyses that can be performed and clarify their interpretation.

Latest climate projections suggest that both frequency and intensity of climate extremes will be substantially modified over the course of the coming decades. As a consequence, we need to understand to what extent and via which pathways climate extremes affect the state and functionality of terrestrial ecosystems and the associated biogeochemical cycles on a global scale. So far the impacts of climate extremes on the terrestrial biosphere were mainly investigated on the basis of case studies, while global assessments are widely lacking. In order to facilitate global analysis of this kind, we present a methodological framework that firstly detects spatiotemporally contiguous extremes in Earth observations, and secondly infers the likely pathway of the preceding climate anomaly. The approach does not require long time series, is computationally fast, and easily applicable to a variety of data sets with different spatial and temporal resolutions. The key element of our analysis strategy is to directly search in the relevant observations for spatiotemporally connected components exceeding a certain percentile threshold. We also put an emphasis on characterization of extreme event distribution, and scrutinize the attribution issue. We exemplify the analysis strategy by exploring the fraction of absorbed photosynthetically active radiation (fAPAR) from 1982 to 2011. Our results suggest that the hot spots of extremes in fAPAR lie in Northeastern Brazil, Southeastern Australia, Kenya and Tanzania. Moreover, we demonstrate that the size distribution of extremes follow a distinct power law. The attribution framework reveals that extremes in fAPAR are primarily driven by phases of water scarcity.

Combined positron emission tomography (PET) and magnetic resonance imaging (MRI) is a new tool to study functional processes in the brain. Here we study brain function in response to a barrel-field stimulus simultaneously using PET, which traces changes in glucose metabolism on a slow time scale, and functional MRI (fMRI), which assesses fast vascular and oxygenation changes during activation. We found spatial and quantitative discrepancies between the PET and the fMRI activation data. The functional connectivity of the rat brain was assessed by both modalities: the fMRI approach determined a total of nine known neural networks, whereas the PET method identified seven glucose metabolism–related networks. These results demonstrate the feasibility of combined PET-MRI for the simultaneous study of the brain at activation and rest, revealing comprehensive and complementary information to further decode brain function and brain networks.

Structural learning in motor control refers to a metalearning process whereby an agent extracts (abstract) invariants from its sensorimotor stream when experiencing a range of environments that share similar structure. Such invariants can then be exploited for faster generalization and learning-to-learn when experiencing novel, but related task environments.

Journal of the Royal Society Interface, 10(87):1-11, October 2013 (article)

Abstract

Decision-makers have been shown to rely on probabilistic models for perception and action. However, these models can be incorrect or partially wrong in which case the decision-maker has to cope with model uncertainty. Model uncertainty has recently also been shown to be an important determinant of sensorimotor behaviour in humans that can lead to risk-sensitive deviations from Bayes optimal behaviour towards worst-case or best-case outcomes. Here, we investigate the effect of model uncertainty on cooperation in sensorimotor interactions similar to the stag-hunt game, where players develop models about the other player and decide between a pay-off-dominant cooperative solution and a risk-dominant, non-cooperative solution. In simulations, we show that players who allow for optimistic deviations from their opponent model are much more likely to converge to cooperative outcomes. We also implemented this agent model in a virtual reality environment, and let human subjects play against a virtual player. In this game, subjects' pay-offs were experienced as forces opposing their movements. During the experiment, we manipulated the risk sensitivity of the computer player and observed human responses. We found not only that humans adaptively changed their level of cooperation depending on the risk sensitivity of the computer player but also that their initial play exhibited characteristic risk-sensitive biases. Our results suggest that model uncertainty is an important determinant of cooperation in two-player sensorimotor interactions.

Proceedings of the Royal Society of London A, 469(2153):1-18, May 2013 (article)

Abstract

Perfectly rational decision-makers maximize expected utility, but crucially ignore the resource costs incurred when determining optimal actions. Here, we propose a thermodynamically inspired formalization of bounded rational decision-making where information processing is modelled as state changes in thermodynamic systems that can be quantified by differences in free energy. By optimizing a free energy, bounded rational decision-makers trade off expected utility gains and information-processing costs measured by the relative entropy. As a result, the bounded rational decision-making problem can be rephrased in terms of well-known variational principles from statistical physics. In the limit when computational costs are ignored, the maximum expected utility principle is recovered. We discuss links to existing decision-making frameworks and applications to human decision-making experiments that are at odds with expected utility theory. Since most of the mathematical machinery can be borrowed from statistical physics, the main contribution is to re-interpret the formalism of thermodynamic free-energy differences in terms of bounded rational decision-making and to discuss its relationship to human decision-making experiments.

A simple HPLC method was developed for determination of quercitrin and isoquercitrin in rat plasma. Reversed-phase HPLC was employed for the quantitative analysis using kaempferol-3-O--d-glucopyranoside-7-O--l-rhamnoside as an internal standard. Following extraction from the plasma samples with ethyl acetate-isopropanol (95:5, v/v), these two compounds were successfully separated on a Luna C18 column (250 × 4.6 mm, 5 µm) with isocratic elution of acetonitrile-0.5% aqueous acetic acid (17:83, v/v) as the mobile phase. The flow-rate was set at 1 mL/min and the eluent was detected at 350 nm for both quercitrin and isoquercitrin. The method was linear over the studied ranges of 50-6000 and 50-5000 ng/mL for quercitrin and isoquercitrin, respectively. The intra- and inter-day precisions of the analysis were better than 13.1 and 13.2%, respectively. The lower limits of quantitation for quercitrin and isoquercitrin in plasma were both of 50 ng/mL. The mean extraction recoveries were 73 and 61% for quercitrin and i
soquercitrin, respectively. The validated method was successfully applied to pharmacokinetic studies of the two analytes in rat plasma after the oral administration of Hypericum japonicum thunb. ethanol extract.

In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm. On the first glance spectral clustering appears slightly mysterious, and it is not obvious to see why it works at all and what it really does. The goal of this tutorial is to give some intuition on those questions. We describe different graph Laplacians and their basic properties, present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches. Advantages and disadvantages of the different spectral clustering algorithms are discussed.

The abilities to learn and to categorize are fundamental for cognitive systems, be it animals or machines, and therefore have attracted attention from engineers and psychologists alike. Modern machine learning methods and psychological models of categorization are remarkably similar, partly because these two fields share a common history in artificial neural networks and reinforcement learning. However, machine learning is now an independent and mature field that has moved beyond psychologically or neurally inspired algorithms towards providing foundations for a theory of learning that is rooted in statistics and functional analysis. Much of this research is potentially interesting for psychological theories of learning and categorization but also hardly accessible for psychologists. Here, we provide a tutorial introduction to a popular class of machine learning tools, called kernel methods. These methods are closely related to perceptrons, radial-basis-function neural networks and exemplar theories of catego
rization. Recent theoretical advances in machine learning are closely tied to the idea that the similarity of patterns can be encapsulated in a positive definite kernel. Such a positive definite kernel can define a reproducing kernel Hilbert space which allows one to use powerful tools from functional analysis for the analysis of learning algorithms. We give basic explanations of some key conceptsthe so-called kernel trick, the representer theorem and regularizationwhich may open up the possibility that insights from machine learning can feed back into psychology.

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems