Thank you for visiting nature.com. You are using a browser version with
limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off
compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site
without styles and JavaScript.

The free-energy principle: a unified brain theory?

Karl Friston is a neuroscientist at the Wellcome Trust Centre for Neuroimaging at University College London, UK. He is an authority on brain imaging. His group invented statistical parametric mapping and voxel-based morphometry. These technical contributions were motivated by schizophrenia research and theoretical studies of value learning (formulated as the disconnection hypothesis of schizophrenia). He later introduced dynamic causal modelling to infer the architecture of distributed systems like the brain. Mathematical contributions include variational filtering and dynamic-expectation maximization. Friston currently works on models of functional integration in the human brain and the principles that underlie neuronal interactions. His main contribution to theoretical neurobiology is a free-energy principle for action and perception.

Abstract

A free-energy principle has been proposed recently that accounts for action, perception and learning. This Review looks at some key brain theories in the biological (for example, neural Darwinism) and physical (for example, information theory and optimal control theory) sciences from the free-energy perspective. Crucially, one key theme runs through each of these theories — optimization. Furthermore, if we look closely at what is optimized, the same quantity keeps emerging, namely value (expected reward, expected utility) or its complement, surprise (prediction error, expected cost). This is the quantity that is optimized under the free-energy principle, which suggests that several global brain theories might be unified within a free-energy framework.

Knill, D. C. & Pouget, A.The Bayesian brain: the role of uncertainty in neural coding and computation. Trends Neurosci.27, 712–719 (2004).
A nice review of Bayesian theories of perception and sensorimotor control. Its focus is on Bayes optimality in the brain and the implicit nature of neuronal representations.

Desimone, R.Neural mechanisms for visual memory and their role in attention. Proc. Natl Acad. Sci. USA93, 13494–13499 (1996).
A nice review of mnemonic effects (such as repetition suppression) on neuronal responses and how they bias the competitive interactions between stimulus representations in the cortex. It provides a good perspective on attentional mechanisms in the visual system that is empirically grounded.

Montague, P. R., Dayan, P., Person, C. & Sejnowski, T. J.Bee foraging in uncertain environments using predictive Hebbian learning. Nature377, 725–728 (1995).
A computational treatment of behaviour that combines ideas from optimal control theory and dynamic programming with the neurobiology of reward. This provided an early example of value learning in the brain.

Acknowledgements

This work was funded by the Wellcome Trust. I would like to thank my colleagues at the Wellcome Trust Centre for Neuroimaging, the Institute of Cognitive Neuroscience and the Gatsby Computational Neuroscience Unit for collaborations and discussions.

Glossary

An information theory measure that bounds or limits (by being greater than) the surprise on sampling some data, given a generative model.

Homeostasis

The process whereby an open or closed system regulates its internal environment to maintain its states within bounds.

Entropy

The average surprise of outcomes sampled from a probability distribution or density. A density with low entropy means that, on average, the outcome is relatively predictable. Entropy is therefore a measure of uncertainty.

Surprise

(Surprisal or self information.) The negative log-probability of an outcome. An improbable outcome (for example, water flowing uphill) is therefore surprising.

Fluctuation theorem

(A term from statistical mechanics.) Deals with the probability that the entropy of a system that is far from the thermodynamic equilibrium will increase or decrease over a given amount of time. It states that the probability of the entropy decreasing becomes exponentially smaller with time.

Attractor

A set to which a dynamical system evolves after a long enough time. Points that get close to the attractor remain close, even under small perturbations.

Kullback-Leibler divergence

(Or information divergence, information gain or cross entropy.) A non-commutative measure of the non-negative difference between two probability distributions.

Recognition density

(Or 'approximating conditional density'.) An approximate probability distribution of the causes of data (for example, sensory input). It is the product of inference or inverting a generative model.

Generative model

A probabilistic model (joint density) of the dependencies between causes and consequences (data), from which samples can be generated. It is usually specified in terms of the likelihood of data, given their causes (parameters of a model) and priors on the causes.

Conditional density

(Or posterior density.) The probability distribution of causes or model parameters, given some data; that is, a probabilistic mapping from observed data to causes.

Prior

The probability distribution or density of the causes of data that encodes beliefs about those causes before observing the data.

Bayesian surprise

A measure of salience based on the Kullback-Leibler divergence between the recognition density (which encodes posterior beliefs) and the prior density. It measures the information that can be recognized in the data.

Bayesian brain hypothesis

The idea that the brain uses internal probabilistic (generative) models to update posterior beliefs, using sensory information, in an (approximately) Bayes-optimal fashion.

Analysis by synthesis

Any strategy (in speech coding) in which the parameters of a signal coder are evaluated by decoding (synthesizing) the signal and comparing it with the original input signal.

Epistemological automata

Possibly the first theory for why top-down influences (mediated by backward connections in the brain) might be important in perception and cognition.

Empirical prior

A prior induced by hierarchical models; empirical priors provide constraints on the recognition density in the usual way but depend on the data.

Sufficient statistics

Quantities that are sufficient to parameterize a probability density (for example, mean and covariance of a Gaussian density).

Laplace assumption

(Or Laplace approximation or method.) A saddle-point approximation of the integral of an exponential function, that uses a second-order Taylor expansion. When the function is a probability density, the implicit assumption is that the density is approximately Gaussian.

Predictive coding

A tool used in signal processing for representing a signal using a linear predictive (generative) model. It is a powerful speech analysis technique and was first considered in vision to explain lateral interactions in the retina.

Infomax

An optimization principle for neural networks (or functions) that map inputs to outputs. It says that the mapping should maximize the Shannon mutual information between the inputs and outputs, subject to constraints and/or noise processes.

Stochastic

Governed by random effects.

Biased competition

An attentional effect mediated by competitive interactions among neurons representing visual stimuli; these interactions can be biased in favour of behaviourally relevant stimuli by both spatial and non-spatial and both bottom-up and top-down processes.

Reentrant signalling

Reciprocal message passing among neuronal groups.

Reinforcement learning

An area of machine learning concerned with how an agent maximizes long-term reward. Reinforcement learning algorithms attempt to find a policy that maps states of the world to actions performed by the agent.

Optimal control theory

An optimization method (based on the calculus of variations) for deriving an optimal control law in a dynamical system. A control problem includes a cost function that is a function of state and control variables.

Bellman equation

(Or dynamic programming equation.) Named after Richard Bellman, it is a necessary condition for optimality associated with dynamic programming in optimal control theory.

Optimal decision theory

(Or game theory.) An area of applied mathematics concerned with identifying the values, uncertainties and other constraints that determine an optimal decision.

Gradient ascent

(Or method of steepest ascent.) A first-order optimization scheme that finds a maximum of a function by changing its arguments in proportion to the gradient of the function at the current value. In short, a hill-climbing scheme. The opposite scheme is a gradient descent.

Principle of optimality

An optimal policy has the property that whatever the initial state and initial decision, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.

Exploration–exploitation trade-off

Involves a balance between exploration (of uncharted territory) and exploitation (of current knowledge). In reinforcement learning, it has been studied mainly through the multi-armed bandit problem.

Dynamical systems theory

An area of applied mathematics that describes the behaviour of complex (possibly chaotic) dynamical systems as described by differential or difference equations.

Synergetics

Concerns the self-organization of patterns and structures in open systems far from thermodynamic equilibrium. It rests on the order parameter concept, which was generalized by Haken to the enslaving principle: that is, the dynamics of fast-relaxing (stable) modes are completely determined by the 'slow' dynamics of order parameters (the amplitudes of unstable modes).

Autopoietic

Referring to the fundamental dialectic between structure and function.

Helmholtzian

Refers to a device or scheme that uses a generative model to furnish a recognition density and learns hidden structures in data by optimizing the parameters of generative models.