Predictive brains

Whatever next? Predictive brains, situated agents, and the future of cognitive science (Andy Clark 2013, Behavioral and Brain Sciences) is an interesting paper on the computational architecture of the brain. It’s arguing that a large part of the brain is made up of hierarchical systems, where each system uses an internal model of the lower system in an attempt to predict the next outputs of the lower system. Whenever a higher system mispredicts a lower system’s next output, it will adjust itself in an attempt to make better predictions in the future.

So, suppose that we see something, and this visual data is processed by a low-level system (call it system L). A higher-level system (call it system H) attempts to predict what L’s output will be and sends its prediction down to L. L sends back a prediction error, indicating the extent to which H’s prediction matches L’s actual activity and processing of the visual stimulus. H will then adjust its own model based on the prediction error. By gradually building up a more accurate model of the various regularities behind L’s behavior, H is also building up a model of the world that causes L’s activity. At the same time, systems H+, H++ and so on that are situated “above” H build up still more sophisticated models.

So the higher-level systems have some kind of model of what kind of activity to expect from the lower-level systems. Of course, different situations elicit different kinds of activity: one example given in the paper is that of an animal “that frequently moves between a watery environment and dry land, or between a desert landscape and a verdant oasis”. The kinds of visual data that you would expect in those two situations differs, so the predictive systems should adapt their predictions based on the situation.

And apparently, that is what happens – when salamanders and rabbits are put to varying environments, half of their retinal ganglion cells rapidly adjust their predictions to keep up with the changing image predictions. Presumably, if the change of scene was unanticipated, the higher-level systems making predictions of the ganglion cells will then quickly get an error signal indicating that the ganglion cells are now behaving differently from what was expected based on how they acted just a moment ago; this should also cause them to adjust their predictions, and data about the scene change gets propagated up through the hierarchy.

This process involves the development of “novelty filters”, which learn to recognize and ignore the features of the input that most commonly occur together within some given environment. Thus, things that are “familiar” (based on previous experience) and behave in expected ways aren’t paid attention to.

So far we’ve discussed a low-level system sending the higher-level an error signal when the predictions of the higher-level system do not match the activity of the lower-level system. But the predictions sent by the higher-level system also serve a function, by acting as Bayesian priors for the lower-level systems.

Essentially, high up in the hierarchy we have high-level models of how the world works, and what might happen next based on those models. The highest-level system, call it H+++, makes a prediction of what the next activity of H++ is going to be like, and the prediction signal biases the activity of H++ in that direction. Now the activity of H++ involves making a prediction of H+, so this also causes H++ to bias the activity of H+ in some direction, and so on. When the predictions of the high-level models are accurate, this ends up minimizing the amount of error signals sent up, as the high-level systems adjust the expectations of the lower-level systems to become more accurate.

Let’s take a concrete example (this one’s not from the paper but rather one that I made up, so any mistakes are my own). Suppose that I am about to take a shower, and turn on the water. Somewhere in my brain there is a high-level world model which says that turning on the shower faucet will lead to water pouring out, and because I’m standing right below it, the model also predicts that the water will soon be falling on my body. This prediction is expressed in terms of the expected neural activity of some (set of) lower-level system(s). So the prediction is sent down to the lower systems, each of which has its own model of what it means for water to fall on my body, and each of which send that prediction down to yet more lower-level systems.

Eventually we reach some pretty low-level system, like one predicting the activity of the pressure- and temperature-sensing cells on my skin. Currently there isn’t yet water falling down on me, and this system is a pretty simple one, so it is currently predicting that the pressure- and temperature-sensing cells will continue to have roughly the same activity as they do now. But that’s about to change, and if the system did continue predicting “no change”, then it would end up being mistaken. Fortunately, the prediction originating from the high-level world-model has now propagated all the way down, and it ends up biasing the activity of this low-level system, so that the low-level system now predicts that the sensors on my skin are about to register a rush of warm water. Because this is exactly what happens, the low-level system generates no error signal to be sent up: everything happened as expected, and the overall system acted to minimize the overall prediction error.

If the prediction from the world-model would have been mistaken – if the water had been cut, or I accidentally turned on cold water when I was expecting warm water – then the biased prediction would have been mistaken, and an error signal would have been propagated upwards, possibly causing an adjustment to the overall world-model.

This ties into a number of interesting theories that I’ve read about, such as the one about conscious attention as an “error handler”: as long as things follow their familiar routines, no error signals come up, and we may become absent-minded, just carrying out familiar habits and routines. It is when something unexpected happens, or something of where we don’t have a strong prediction of what’s going to happen next, that we are jolted out of our thoughts and forced to pay attention to our surroundings.

The paper has a particularly elegant explanation of how this model would explain binocular rivalry, a situation where a test subject is shown one image (for example, a house) to their left eye and another (for example, a face) to their right eye. Instead of seeing two images at once, people report seeing one at a time, with the two images alternating. Sometimes elements of unseen image are perceived as “breaking through” into the seen one, after which the perceived image flips.

The proposed explanation is that there are two high-level hypotheses of what the person might be seeing: either a house or a face. Suppose that the “face” hypothesis ends up dominating the high-level system, which then sends its prediction down the hierarchy, suppressing activity that would support the “house” interpretation. This decreases the error signal from the systems which support the “face” interpretation. But even as the error signal from those systems decreases, the error signal from the systems which are seeing the “house” increases, as their activity does not match the “face” prediction. That error signal is sent to the high-level system, decreasing its certainty in the “face” prediction until it flips its best guess prediction to be one of a house… propagating that prediction down, which eliminates the error signal from the systems making the “house” prediction but starts driving up the error from the systems making the “face” prediction, and soon the cycle repeats again. No single hypothesis of the world-state can account for all the existing sensory data, so the system ends up alternating between two conflicting hypotheses.

One particularly fascinating aspect of the whole “hierarchical error minimization” theory as presented so far is that it can also cover not only perception, but also action! As hypothesized in the theory, when we decide to do something, we are creating a prediction of ourselves doing something. The fact that we are actually not yet doing anything causes an error signal, which in turn ends up modifying the activity of our various motor systems so as to cause the predicted behavior.

As strange as it sounds, when your own behaviour is involved, your predictions not only precede sensation, they determine sensation. Thinking of going to the next pattern in a sequence causes a cascading prediction of what you should experience next. As the cascading prediction unfolds, it generates the motor commands necessary to fulfill the prediction. Thinking, predicting, and doing are all part of the same unfolding of sequences moving down the cortical hierarchy.

Everything that I’ve written here so far only covers approximately the first six pages of the paper: there are 18 more pages of it, as well as plenty of additional commentaries. I haven’t yet had the time to read the rest, so I recommend checking out the paper itself if this seemed interesting to you.

Share this:

4 comments

As I understand it, the recent success of “deep learning” neural networks is that the earlier layers of the neural network can be used for automatic feature selection in machine learning. Essentially, these layers find regularity in the input, which is then passed to later layers and boosts their effectiveness.

You might be interested in this paper, if you haven’t seen it: Bayesian Surprise Attracts Human Attention: “We find that subjects are strongly attracted towards surprising locations, with 72% of all human gaze shifts directed towards locations more surprising than the average.”

There’s also a bit of Hawkin’s On Intelligence where he suggests a sort of emerging perspective in sensory neuroscience that only surprising information is propagated to higher levels of sensory processing, which seems like it would dovetail nicely with this model.

Regarding meditation (specifically vipassana), my theory of my own practice is that it’s a process of bring automatic mental processes under conscious control, resulting in more conscious access to lower levels of sensory processing. (Note, for instance, that meditators perform better on the Stroop task, but that might just be an executive attention thing.) This seems (at least weakly) supported by larger brain regions associated with sensory processing among meditators than controls, such as in this study.

Oooh, I love this summary Kaj. Very succinct, and using excellent concrete examples to pull the whole concept together. It hits all the points on how it’s related to other interesting ideas, leaving me burning with curiosity to find out more. Nicely done.

Obligatory social links

Follow me on:

Google+ Posts

Kaj Sotala:
Every now and then one sees accusations of plagiarism, in e.g. design: frequently, the evidence is just "these two designs are way too similar for it to be chance", based on an appeal to common sense. And yes, no doubt many of the accusations are correct, and it was indeed a case of plagiarism.

But those news always make me wonder - in a world with almost 8 billion people, how complicated and similar do any two designs have to be before we can be sure that it was indeed plagiarism? With this many people, it would be surprising if people working independently and with no knowledge of each other didn't ever accidentally create designs that looked "too similar for it to be an accident". (especially since different designers aren't developing their designs purely at random, but are rather working under similar constraints and goals)

With design, if that happens, then we might never be able to say for sure whether it was independent creation or whether someone did plagiarize from the other. Now this article's example of something that would also feel too implausible for it to be chance, if we didn't have evidence to the contrary, is from photography. There, enough information did exist in the two photos that the two people who took them could verify that they were indeed different shots. But the next time that I see a side-by-side comparison of two designs, one of them claimed to be a plagiarism of the other, I'm probably going to think "yeah, those two do look so similar that one of them has to be stolen... but that's what I would have thought of those lighthouse shots too."

>... there was one comment that mentioned that I had stolen the image from another New England photographer, Eric Gendon. After letting the commenter know that it was indeed my image and that I possess the original RAW file, I headed over to the other photographers page and was blown away. We had what looked like the exact same image, taken at the exact millisecond in time, from what looked like the same exact location and perspective.How Two Photographers Unknowingly Shot the Same Millisecond in Time

Kaj Sotala:
In the Star Trek universe, we are told that it's really hard to make genuine artificial intelligence, and that Data is so special because he's a rare example of someone having managed to create one.

But this doesn't seem to be the best hypothesis for explaining the evidence that we've actually seen. Consider:

- In the TOS episode "The Ultimate Computer", the Federation has managed to build a computer intelligent enough to run the Enterprise by its own, but it goes crazy and Kirk has to talk it into self-destructing.- In TNG, we find out that before Data, Doctor Noonian Soong had built Lore, an android with sophisticated emotional processing. However, Lore became essentially evil and had no problems killing people for his own benefit. Data worked better, but in order to get his behavior right, Soong had to initially build him with no emotions at all. (TNG: "Datalore", "Brothers")- In the TNG episode "Evolution", Wesley is doing a science project with nanotechnology, accidentally enabling the nanites to become a collective intelligence which almost takes over the ship before the crew manages to negotiate a peaceful solution with them.- The holodeck seems entirely capable of running generally intelligent characters, though their behavior is usually restricted to specific roles. However, on occasion they have started straying outside their normal parameters, to the point of attempting taking over the ship. (TNG: "Elementary, Dear Data") It is also suggested that the computer is capable of running an indefinitely long simulation which is good enough to make an intelligent being believe in it being the real universe. (TNG: "Ship in a Bottle")- The ship's computer in most of the series seems like it's potentially quite intelligent, but most of the intelligence isn't used for anything else than running holographic characters. - In the TNG episode "Booby Trap", a potential way of saving the Enterprise from the Disaster Of The Week would involve turning over control of the ship to the computer: however, the characters are inexplicably super-reluctant to do this.- In Voyager, the Emergency Medical Hologram clearly has general intelligence: however, it is only supposed to be used in emergency situations rather than running long-term, its memory starting to degrade after a sufficiently long time of continuous use. The recommended solution is to reset it, removing all of the accumulated memories since its first activation. (VOY: "The Swarm")

There seems to be a pattern here: if an AI is built to carry out a relatively restricted role, then things work fine. However, once it is given broad autonomy and it gets to do open-ended learning, there's a very high chance that it gets out of control. The Federation witnessed this for the first time with the Ultimate Computer. Since then, they have been ensuring that all of their AI systems are restricted to narrow tasks or that they'll only run for a short time in an emergency, to avoid things getting out of hand. Of course, this doesn't change the fact that your AI having more intelligence is generally useful, so e.g. starship computers are equipped with powerful general intelligence capabilities, which sometimes do get out of hand.

Soong's achievement with Data was not in building a general intelligence, but in building a general intelligence which didn't go crazy. (And before Data, he failed at that task once, with Lore.)

The original design for the game didn't have warfare, diplomacy, or technological advancement; all of that was added as the design was iterated on:

> Like Railroad Tycoon before it, Civilization was born out of Meier’s abiding fascination with SimCity. [...] Railroad Tycoon had attempted to take some of the appeal of SimCity and “gameify” it by adding computerized opponents and a concrete ending date. It had succeeded magnificently on those terms, but Meier wasn’t done building on what Wright had wrought. In fact, his first conception of Civilization cast it as a much more obvious heir to SimCity than even Railroad Tycoon had been. Whereas SimCity had let the player build her own functioning city, Civilization would let her build a whole network of them, forming a country — or, as the game’s name would imply, a civilization.

To think, most 4X games today, they tend to just copy Civ’s basic formula, including elements like the city-building, warfare, diplomacy, technology…

And then the guys making the first Civ had no idea that this would become a genre, just putting together systems that seemed to make sense to them. If they hadn’t thought of the technology idea, for instance, would anyone else have come up with it? Today, it feels like such an obvious idea that surely someone would eventually have made a game that also had you developing technology throughout the ages… but would they have?» The Game of Everything, Part 1: Making Civilization The Digital Antiquarian

> If someone says “in Rotherham the police ignored evidence that these people were assaulting children, for politically motivated reasons”, then if I’m responsible I will go check how often the police ignore evidence that people are assaulting children for absolutely no reason at all and eventually I will probably conclude that police just frequently ignore evidence of serious crimes.

> I have encountered communities where everyone constantly talked at Rotherham in exhausting detail but they had absolutely no idea about any of the other cases I mentioned.

> I mean that. They just had no idea. You ask them “can you name a csa case where there isn’t evidence that the police could have acted ten years sooner than they did?” and they are genuinely surprised that in the case of Larry Nassar, in the case of Jerry Sandusky, in the case of Jimmy Saville, in the case of Catholic clergy, the police could have acted ten years earlier and didn’t. They’ve heard about Rotherham, and only Rotherham, and because their sources were so carefully selective in which horrible things they let their readers learn of, the readers end up thinking that something uniquely went wrong in Rotherham, instead of realizing that police just don’t actually typically do anything about evidence of sexual abuse of children until years and sometimes decades after they could have.

> As far as I can tell, in every single csa scandal that is uncovered, there’s abundant evidence that it could have been uncovered a lot sooner, and the police got reports and failed to act. This seems to be very nearly universal. I’m not sure why it’s true. I find it disturbing that it’s true. The fact that so many people cover up sexual assault of children is something that has caused me to seriously ask myself “am I the kind of person who would do that? Why not? Those people would presumably have answered that question ‘of course not’, and they were wrong, so how do I make sure I’m not wrong?” And I think it’s a good idea for other people to ask themselves that too! But the people who talk endlessly in horrifying detail about Rotherham and are totally clueless that this is a general feature of sexual abuse cases…. they’re working from a disastrously bad model of the world, and I am pretty sure that a lot of sexual abuse might pass them by because they’ve managed to end up with such a wrong and distorted impression of what the problem is. (If you think the problem is “political correctness”, of course you fight political correctness. If it turns out that actually, near-universally police do not act on these accusations, that points to a completely different solution and all of your political-correctness fighting is actively worse than useless.) Re the TERF thing, I think you underestimate the...

Kaj Sotala:
> ... we hypothesized that extreme forms of music such as heavy metal, which is associated with antisocial behavior, irreligiosity, and deviation from the norm is less prevalent in the regions with higher prevalence of pathogenic stress. [...] Results showed that parasite stress negatively predicts the number of heavy metal bands. However, no relationship was found between the intensity of the music and parasite stress.