Tuesday, 26 June 2018

A Reply to Dan Williams on Hierarchical Bayesian Models of Delusions

This post is a reply by Phil Corlett (Yale) (pictured below) to Dan Williams's recent post on Hierarchical Bayesian Models of Delusions.
Dan Williams has put forward a lucid and compelling critique of hierarchical Bayesian models of cognition and perception and, in particular, their application to delusions. I want to take the opportunity to respond to Dan’s two criticisms outlined so concisely on the blog (and in his excellent paper) and then comment on the paper more broadly.

Dan is “sceptical that beliefs—delusional or otherwise—exist at the higher levels of a unified inferential hierarchy in the neocortex.” He says, “every way of characterising this proposed hierarchy... is inadequate.”

Stating that “it can’t be true both that beliefs exist at the higher levels of the inferential hierarchy and that higher levels of the hierarchy represent phenomena at large spatiotemporal scales. There are no such content restrictions on beliefs, whether delusional or not. (Delusional parasitosis concerns tiny parasites).”

I agree that ‘the’ hierarchy is thus far poorly specified, to the extent that it may even seem nebulous. The notion of hierarchy has to some extent been invoked as a sort of get out of jail free card when – for example - some priors appear to be weak in patients with delusions and others strong (e.g. the very elegant work from Philipp Sterzer, Katarina Schmaak and others in Schmaak et al. 2013, Stuke et al. 2018, and Stuke et al. 2017) and both effects correlate with delusions.

One way, within a hierarchical model, for this to make sense would be for the weak priors (often evinced as failures to perceive certain perceptual illusions) to generate prediction errors that must be reconciled. Such prediction errors create a state of perceptual hunger for priors (As Steve Dakin and Jerzy Konorski before him have speculated), which is only satisfied by imposing stronger (and perhaps inaccurate) higher level priors.

Hence the shift toward prior knowledge observed by Teufel, Fletcher and colleagues. This is what we mean by a hierarchy of prior beliefs. And it seems to relate importantly to psychotic symptoms and in particular delusions (although see their more recent work for data consistent with as well as a challenge for this idea of a hierarchy of priors and psychosis). What I don’t think we mean is that if delusions involve high-level prior beliefs, they necessarily have to entail only high-level concepts (or even large rather than small things as Dan suggests – this would indeed make parasitosis impossible).

I agree, we could be clearer. We will be in future publications. We are trying to characterize neural and psychological hierarchies in ongoing experiments in healthy and delusional subjects. One approach that seems to be bearing fruit is hierarchical computational modeling of behavior (Mathys et al. 2014), with which we have implicated priors and hierarchical organization in the genesis of hallucinations (Powers et al. 2017) – watch this space for similar with delusions.

Second, Dan is “sceptical that belief fixation is Bayesian”. I think Dan alludes to a solution to his skepticism in his own piece. None of these models demand optimally Bayesian inference. As Dan says, they involve “(approximate) Bayesian inference”. They entail inferences to the best explanation for a particular subject given their prior experiences and current sensory evidence.

They explicitly allow for deviations from optimality. Those deviations can be different for different inferences and different people and those differences allow opportunities for the theory to applied to the myriad conditions to which it has been applied theoretically and empirically (with some success).

Addressing the second part of Dan’s second concern, can a Bayesian account explain “bias, suboptimality, motivated reasoning, self-deception, and social signaling”? These are important riffs on the first part of Dan’s second concern “Can biased beliefs be Bayesian?”

My answer is yes. One can model the irrational spread of rumors in crowds in a Bayesian manner (Butts 1998). Partisan political biases and polarization can be predicted by Bayesian models (Bullock 2009). I’ve previously argued, with Sarah Fineberg, that motivated reasoning and self-deception in delusional individuals can fall under the umbrella of a Bayesian account.

The key move here (borrowed from Tali Sharot’s work on belief updating biases) is to factor in a degree of doxastic conservatism (championed in delusion models by Ryan McKay amongst others), that is, if one allows some value in consistency, in sustaining the status quo, then belief updating will be biased. Predictive processing based accounts have this value built in to them, since they abhor unreconciled prediction error and have myriad ways to minimize it (including ignoring the conflicting data which is the move when one is biased, motivated and self-deceiving).

I’d like to finish with a comment on explaining delusions in general. Unlike his blog post, Dan’s paper reads not only as a critique of hierarchical predictive models, but as somewhat of an apology for 2-factor theory. This is partly because those two model types have been adversaries for some time (as you can see in previous exchanges on this blog).

They needn’t be. It could be that 2-factor and prediction error models are expressing similar ideas at different levels of abstraction. I don’t subscribe to this view. But some people do. Regardless, if one critiques PE models for a lack of clarity, for being vague with regards to their inner workings, one ought to level similar challenges to 2-factor theory, or indeed any explanation of delusions.

The point here is not to attack 2-factor theory per se, but rather to recognize that explanations develop over time, through thought-experiments and real-world experiments. Some have been around longer than others. Some have more empirical support than others. Some have different explanatory ranges and scopes. It is important that we critically evaluate, compare and contrast all theories, if only to signpost the key areas for future inquiry and perhaps, ultimately, kill our darlings and approach a more complete explanation of delusions.

8 comments:

Hi Phil and Dan - nice posts. I have a few points to add. Btw I admit I'm making the cardinal sin of commenting on a paper I haven't read (it is on my list, Dan) but I'm going by Dan's blog...

First, about this point: “it can’t be true both that beliefs exist at the higher levels of the inferential hierarchy and that higher levels of the hierarchy represent phenomena at large spatiotemporal scales. There are no such content restrictions on beliefs". Dan is correct - but no one is arguing for restrictions on their content. The restrictions are on their form. The point is that a hierarchical model will use more invariant features of things to predict their more dynamic features. This invariance could be spatial or temporal but it doesn't have to be both. Believing I am infested by one species of parasite that consistently inhabits my skin predicts the dynamic itching sensations that move around my body - for example.

About the second part, "can a Bayesian account explain 'bias, suboptimality, motivated reasoning, self-deception, and social signalling? ... Can biased beliefs be Bayesian?'". This seems a strange question - to me biased beliefs seem best explained by priors (e.g. about how great I am, or how much people should like me, etc) interacting with likelihoods, rather than any other explanation. Even errors in logical syllogisms are well accounted for by Bayesian reasoning (Oaksford & Chater's brilliant work).

Whether Bayesian belief updating has broken down in (rather than explaining) delusions is an interesting question. I'm speculating but I don't think so. One under-investigated aspect of delusions is the powerful affect that accompanies most psychiatric ones (e.g. paranoia, grandiosity, love, depression, etc). Such affects must have profound reinforcing effects on prior beliefs in those domains.

I guess the question is whether Bayesian inference is really going wrong in delusions (e.g. whether some neurobiological change means precision is not encoded correctly) or whether some other 'pathological' process is occurring whose effects on brain computations play out in Bayesian ways because that is what the brain does. I.e. is it an explanation of pathology or just a vehicle for it? I think the latter is probably the case in lots of psychiatric disorders - but not always.

But even in psychotic disorders, it does seem likely that prefrontal dysfunction would do more than just reduce prior precision - it likely disrupts heuristics that compensate for our inability to do complex things Bayes-optimally, like planning (and maybe belief-testing? cue 2 factor theory...). So there will probably be limits to a pure predictive coding based explanation of psychosis.

Hi Rick, Nice to get your input. I wanted to follow up on a couple of things. You are right, affect is important in psychosis and psychiatry more broadly. I am a little surprised that you treat it as some independent factor though. I would argue that affect is also inference, inference to the best explanation for our bodily state. It is a belief about our homeostatic integrity, and as such, it’s disruption in people with delusions is explicable in the same precision weighted mechanisms as other beliefs. Affect is simply part of our world model and subject to the same derangements that aberrant PE or priors would cause.

On your last point: I am not sure I follow. You are suggesting psychological and biological causes needn’t align? Can you clarify? Re: PFC dysfunction and cognition, I think we should be careful. This thread is about delusions, not the whole of schizophrenia (negative symptoms, cognition and all). Whilst I can see the appeal of bringing those deficits in to the discussion, the fact remains cognitive deficits (assayed with spatial working memory tasks for example) have almost zero statistical association with delusions (or hallucinations for that matter - they do relate to disorganization)/thought disorder). I don’t understand the allusion to 2-factor theory here either. Can you clarify?

Yes I completely agree with you about affect. I just meant that the current debates about whether prior beliefs in schizophrenia are too imprecise (in visual illusions, ERPs, sensory attenuation, etc) or too precise (in the Teufel/Schmack paradigms) and what this means for understanding delusions don't tend to incorporate affect. For instance, affective manipulations are rarely used in belief-updating paradigms (except Archy de Berker's stress study and Oliver Robinson's work), which are usually about things like beads in jars!

To me, affects are like priors at the very top of the hierarchy. If they are strong and the mid-level 'cognitive' beliefs are weak, one could easily see how delusions could come about. Such delusions might appear 'unupdatable' and thus not caused by Bayesian processes but I doubt that is the case.

The enigmatic comment in the last paragraph (sorry) was meant to get at the potential limits of the 'precision imbalances in cortical hierarchies' explanation of delusions. There are a few potential points here - all speculation obviously. 1) The cortex might not behave like a perfect hierarchical model all the way up. For instance, a perfect model - on receiving some unexpected input - would update every single relevant belief (i.e. that predicted some aspect of the input) consistently. Even a precision imbalance in this model shouldn't make it inconsistent.2) But it seems unlikely that our brains can update *every* relevant belief - too much computation. Perhaps a small (attended) subset are used to make predictions and only those are updated? Might we then have some heuristic mechanism for then checking for inconsistencies that may arise?3) If so, and if this mechanism is compromised in schizophrenia (e.g. due to prefrontal-hippocampal dysfunction), then one might expect many more inconsistent beliefs. Indeed, many delusions are inconsistent with other beliefs (or actions) that the person has/takes. I would argue this is a deficit over and above deficits that the plain 'precision imbalance in the cortical hierarchy' would predict.4) There may be many such heuristic mechanisms that we use for updating/maintaining our constrained models of the world. E.g. separating memory systems into short, medium, long-term. Or 'pruning' when doing planning (I didn't mean to imply any connection with delusions and planning though). Or using the beliefs of others as heuristic constraints on how we update (I guess it's debatable whether this would be heuristic vs optimal). Or strategies for gleaning more information about an uncertain inference. Losing these functions may appear a bit like a second 'factor' contributing to delusions - kind of similar to the one in 2F theory, but also distinct.

You are right though - I'm always surprised how little cognitive deficits relate to delusions. Maybe we aren't measuring the right ones? The beads task data I analysed recently (just accepted - out soon) implied that the Scz group had a high 'cognitive instability' parameter, v different to controls, but it didn't correlate with symptoms or working memory!

Hi Phil, Rick, Thank you for such interesting comments. It was largely reading your work in this area that got me so interested in delusions, so it’s great to get to discuss these issues with you. I found Phil’s original response so thought-provoking that I tried to put my thoughts on this topic together in a longer blog post that can be found on my website here: https://danwilliamsphilosophy.com/2018/06/28/bayes-mania-just-so-stories-and-the-irrational-mind/Here I will just make some brief remarks. 1. I will set aside the topic of the hierarchy here. I address it at much greater length in a (hopefully) forthcoming article. 2. Phil says that I answered my own worries about apparent deviations from rational inference in human cognition: “None of these models demand optimally Bayesian inference. As Dan says, they involve “(approximate) Bayesian inference.”I am not convinced. The thing about approximate Bayesian inference is that it is approximate *Bayesian inference*. Approximation algorithms are designed to approximate the optimal inferential profile exhibited in exact Bayesian inference. In what specific way do the properties of variational Bayes implemented in predictive coding, for example, lead to the psychological phenomena I point to? 3. Phil points to some Bayesian models of things like partisan political biases and some cool work on motivated reasoning and self-deception in delusional individuals that he has done with Sarah Fineberg. Rick goes further and says that it is “strange” to even think that things like bias, suboptimality, motivated reasoning, self-deception, denial, social signalling, and so on, constitute a problem for Bayesian views of belief fixation. “To me, biased beliefs seem best explained by priors… interacting with likelihoods, rather than any other explanation.”Rick doesn’t explain why things seem this way to him, though, and the reference to Oaksford and Chater’s work on errors in logical syllogisms doesn’t address any of the psychological phenomena I outline in my paper. I respond to Phil and Rick’s argument here at much greater length in the blog post mentioned above. In a nutshell, though, here is why I am not convinced: I explicitly note in my paper that one always *can* model a given psychological phenomenon in Bayesian terms. The question is whether one should. The problem is that the psychological phenomena I outline systematically bias human cognition (in the *healthy brain*) away from true beliefs. As such, they are highly unexpected on the assumption that the brain is an approximately optimal inference machine. To Phil, Rick, and other proponents of the Bayesian brain, I want to ask this: If Bayesian updating underlies both perception and belief formation, why is it that (*in the healthy brain*) the former so reliably produces accurate representations of the world whereas the latter so reliably and systematically produces beliefs that are just totally out of whack with reality? Here is one of my favourite quotes from the evolutionary biologist Robert Trivers: “At the heart of our mental lives, there seemed to be a striking contradiction… On the one hand…our sensory systems are organised to give us a detailed and accurate view of reality, exactly as we would expect if truth about the outside world helps us to navigate it more effectively. But once this information arrives in our brains, it is often distorted and biased to our conscious minds. We deny the truth to ourselves. We project onto others traits that are in fact true of ourselves—and then attack them! We repress painful memories, create completely false ones, rationalize immoral behaviour, act repeatedly to boost positive self-opinion, and show a suite of ego-defence mechanisms. Why?”A good model of cognition should explain these differences. A theory that subsumes both perception and belief formation under a general scheme of approximate Bayesian inference doesn’t.

A request for clarification: Will keep this brief and focused only on one point:

I feel I must be misunderstanding Dan's argument that perception and belief are underpinned by different process: "If Bayesian updating underlies both perception and belief formation, why is it that (*in the healthy brain*) the former so reliably produces accurate representations of the world whereas the latter so reliably and systematically produces beliefs that are just totally out of whack with reality?"

I am not sure how relative success in one area of endeavour and relative lack thereof in another should preclude a similar mechanism operating in both cases? Perceptual inferences have more direct, albeit ambiguous, contact with the reality they seek to model and are more readily refuted by signals violating the expectancies based upon those inferences. The subject matter of higher beliefs is much more remote from direct sensory evidence and less likely to be directly challenged by it.

I think that there are other questions central to this point about what would constitute optimal inference - accurately modeling objective reality is not the only criterion for success.

I would also recall that even low level perceptual experience can play fast and loose with objective reality.

I think there are two (partly overlapping) issues here - biases that are due to approximations or heuristics, and biases that are due to (often) evolutionary adaptations.

There are numerous cognitive biases in the literature that have been shown to be compatible with approximation/heuristic strategies. For instance Tom Griffiths' work e.g. on the overestimation of extreme events. Given it is impossible for the brain to integrate over all expected outcomes and their probabilities, it is likely to use sampling of outcomes. The computational cost of sampling increases linearly with the number of samples. Unbiased sampling is probably bad because a rare extreme event is unlikely to be sampled unless lots of samples are taken. Instead you can do 'importance sampling' which weights under-sampled parts of the distribution more - this reduces variance but increases bias (towards extreme events, in this case).

I don't know exactly what you mean by 'suboptimality' but it sounds like what you get when you have to approximate.

About motivated reasoning - take the 'optimism bias’. In the first active inference scheme Karl published (2013, Frontiers), the equation updating inferences about states contains terms for the expected utility of policies and its precision. This means that inference about states is biased towards desired outcomes - like the optimism bias. I don’t think this is even because of any approximation - it is just a consequence (I think) of doing inference about states and policies within the same scheme. That said, that scheme does also use a mean field approximation which doubtless will have its own effects on inference (I doubt any of the ones you mention though).

Then I suspect there are plenty of prior beliefs given to us by evolution because they add adaptive fitness, whether or not they help us reach the ’truth'. I must say I’m surprised that Trivers doesn’t see evolutionary explanations for the biases he talks about. Sure, in some cases survival depends on accuracy (e.g. in the sensorimotor domain) but is this likely to be true for decision-making? I’d have thought it might be better at a population level to be slightly overconfident about what you can achieve, etc. Also the basic Pavlovian biases studied by Marc Guitart-Masip that one ought to activate approach movements to obtain rewards and inhibit ones to avoid punishment. These are useful biases that would probably be there even if we could do exact inference.

Then when it comes to rationalisations of behaviour, what is the reason these should be accurate? If one accepts Dan Sperber’s compelling argument that Kahneman’s ‘System 2’ is not actually a logical thought system but in fact a socially motivated ‘justify and convince others of our plans/actions’ system then the kinds of ways that system seems to err start to make more sense: rationalisations, self-deception better to deceive others, etc.

Why do I think these processes might be Bayesian? Mostly conjecture, it is true. But you said yourself the belief system “reliably and systematically produces beliefs that are just totally out of whack…" To me the fact that such biases are reliable and systematic indicate they could be explained as consistent prior beliefs. Were the biases random in extent and timing one would look elsewhere for an explanation, but they are not. But I must say you are asking too much of any process theory to explain ALL of these effects using one explanation, e.g. approximations. They are a pretty heterogeneous bunch.

Thanks for this interesting discussion, brief comment on perception being Bayesian but why then not 'higher' cognition. Bayesian inference is applied to information, not data. Perception filters out a lot. It appears Bayesian as our measurement is biased.

Hi Paul, Rick, Thanks for these comments. Just a few quick remarks: Paul questions why “relative success in one area of endeavour and relative lack thereof in another should preclude a similar mechanism operating in both cases.” Of course there’s no logical inconsistency (“preclusion”) here. It should just be surprising: if similar mechanisms underlie both domains, why the stark differences? Paul suggests the following hypothesis: “perceptual inferences have more direct, albeit ambiguous contact with the reality they seek to model and are more readily refuted by signals violating the expectancies based upon those inferences.” By contrast, “higher beliefs” are “less likely to be directly challenged by” sensory evidence. That’s an interesting suggestion but I don’ think it will work. To take just one example, consider the famous finding by Dan Kahan: the better informed someone is on a topic like climate change, the *more polarised* their opinion (in both directions). More generally, the problem with wacky belief formation is not a lack of evidence but a range of factors that have no real parallels in perception: tribal/coalition affiliation, clever post hoc rationalisation of what we want to be true or makes us look good (motivated reasoning), and so on. An example I use in my paper: the “backfire effect,” in which presenting people with evidence that contradicts a deeply held view *increases* their confidence in that view—the exact opposite of Bayesian inference. I agree with most of Rick’s comments. As I stress in my paper, there are two questions here: (1) what task is a cognitive system performing? (2) How well is it performing that task? On (2), yes, sometimes deviations from optimality/biases can be accounted for in terms of approximations (i.e. optimality + tractability, resource constraints). But sometimes suboptimality arises from plain old suboptimality. Why should it be otherwise? (See, e.g., Gary Marcus’s wonderful book “Kluge”). On (1), yes, we agree completely: much of belief formation is not optimized for veridical representation. Belief formation and reasoning have social functions, in addition to inferential ones, as I argue in the original paper. Trivers in fact does see an evolutionary explanation for the biases he highlights: namely, that the prolific self-deception humans engage in evolved for social manipulation, as you note. (“We deceive ourselves the better to deceive others”). In line with work in both evolutionary and social psychology, I view the conscious mind as in large part the brain’s “press secretary,” forming beliefs and broadcasting explanations that make us look maximally good to others. I don’t know whether psychological phenomena such as these would be well-explained with Bayesian models (with suitable utility functions), but I look forward to seeing such explanations in the future (or being pointed towards them now if they already exist). Rick says: “I must say you are asking too much of any process theory to explain ALL of these effects using one explanation. They are a pretty heterogenous bunch.” I think we must be talking past each other here, however, because I thought this is the point I’ve been making. I explicitly conclude my paper by advocating a conception of the brain as an organ that serves “a plurality of functions—often imperfectly realised—in the life of an exceedingly complex social primate.” I’m not the one offering a unified brain theory. Thank you for such an interesting and helpful discussion to all who have commented! I’m much less confident of my position on this topic after seeing the many interesting arguments against it here and elsewhere.