Abstract

Prominent theories suggest that compulsive behaviors, characteristic of obsessive-compulsive disorder and addiction, are driven by shared deficits in goal-directed control, which confers vulnerability for developing rigid habits. However, recent studies have shown that deficient goal-directed control accompanies several disorders, including those without an obvious compulsive element. Reasoning that this lack of clinical specificity might reflect broader issues with psychiatric diagnostic categories, we investigated whether a dimensional approach would better delineate the clinical manifestations of goal-directed deficits. Using large-scale online assessment of psychiatric symptoms and neurocognitive performance in two independent general-population samples, we found that deficits in goal-directed control were most strongly associated with a symptom dimension comprising compulsive behavior and intrusive thought. This association was highly specific when compared to other non-compulsive aspects of psychopathology. These data showcase a powerful new methodology and highlight the potential of a dimensional, biologically-grounded approach to psychiatry research.

eLife digest

When an individual resists the temptation to stay out late in order to get a good night’s sleep, he or she is exercising what is known as “goal-directed control”. This kind of control allows individuals to regulate their behaviour in a deliberate manner. It is thought that a reduction in goal-directed control may be linked to compulsiveness or compulsivity, a psychological trait that involves excessive repetition of thoughts or actions. Furthermore, evidence shows that goal-directed control is reduced in people with compulsive disorders, such as obsessive-compulsive disorder (or OCD) and drug addiction. However, failures of goal-directed control have also been reported in other mental health conditions that are not linked to compulsivity, such as social anxiety disorder.

The fact that reduced goal-directed control is found across various mental health conditions highlights a core issue in modern psychiatric research and treatment. Mental health conditions are typically defined and diagnosed by their clinical symptoms, not by their underlying psychological traits or biological abnormalities. This makes it difficult to determine the cause of a specific disorder, as its symptoms are often rooted in the same psychological and biological traits seen in other mental health conditions.

To start to tackle this issue, Gillan et al. used a strategy that allowed them to look at compulsivity as a “trans-diagnostic dimension”; that is, as something that exists on a spectrum and is not specific to one disorder but involved in numerous different mental health conditions. Nearly 2,000 people completed an online task that assessed goal-directed control, and filled in questionnaires that measured symptoms of various mental health conditions. Gillan et al. showed that, as expected, people with reduced goal-directed control were generally more compulsive, and that this relationship could be seen in the context of both OCD and other compulsive disorders such as addiction.

Further, by leveraging the efficiency of online data collection to collect such a large sample, Gillan et al. were also able to examine how much different symptoms co-occurred in people. This enabled them to use a statistical technique to pick out three trans-diagnostic dimensions – compulsive behaviour and intrusive thought, anxious-depression and social withdrawal – and found that only the compulsive factor was associated with reduced goal-directed control. In fact, reduced goal-directed control was found to be more closely related to compulsivity than the symptoms of traditional mental health disorders including OCD. These findings show that research into the causes of mental health conditions and perhaps ultimately diagnosis and treatment – all of which have traditionally approached specific disorders in isolation – would benefit greatly from a trans-diagnostic approach.

Introduction

Compulsivity is a theoretical clinical phenomenon that reflects the loss of control over repetitive self-deleterious behavior seen in a range of disorders, most notably obsessive-compulsive disorder (OCD) and addiction (Everitt and Robbins, 2005; Gillan and Robbins, 2014). But what are the underlying neural, computational, or psychological mechanisms? Researchers have suggested that compulsivity in these disorders may be partially explained by an imbalance between two different modes of control, which are more and less flexible (Everitt and Robbins, 2005; Graybiel and Rauch, 2000). In particular, a deficit in deliberative, ‘goal-directed’ control may leave individuals vulnerable to rely excessively on forming more rigid habits. Habits are behaviors that animals and humans learn to execute automatically when presented with familiar environmental cues (Dickinson, 1985). While habits are typically very useful, allowing us to efficiently perform routine actions while expending minimal cognitive effort, they cannot adapt flexibly to new situations. To override our habits, organisms are capable of ‘goal-directed behavior’. This refers our ability to make more considered choices, reflecting both (i) knowledge of the outcomes that our actions typically produce and (ii) our current motivation for those outcomes (Dickinson and Balleine, 1994). Consistent with the hypothesis that compulsion is linked to an imbalance between these modes of control, deficits in goal-directed learning have been observed across a range of putatively compulsive disorders such as drug addiction (Sjoerds et al., 2013; Voon et al., 2015), obsessive-compulsive disorder (OCD) (Voon et al., 2015; Gillan et al., 2011; 2014a; 2014b; 2015a) and also binge-eating disorder (Voon et al., 2015). These deficits in goal-directed control have been linked to abnormal structure and function of the caudate and medial orbitofrontal cortex (Voon et al., 2015; Gillan et al., 2015a), suggesting that they may be a promising target for understanding the etiology of these disorders and thus for future treatment development.

Critically, the scope of the relationship between goal-directed learning deficits and psychopathology, and particularly their specificity to compulsive versus non-compulsive aspects has not been established. In fact, a similar deficit in goal-directed control was recently reported in other patient groups (Alvares et al., 2014; Morris et al., 2015), including those diagnosed with social anxiety disorder and schizophrenia, at least the former of which is not characterized by repetitive compulsive acts. This casts serious doubt over the hypothesis that goal-directed deficits are a neurocognitive mechanism that is partly responsible for psychiatric compulsivity. This lack of specificity is unfortunately ubiquitous in psychiatry research (Lipszyc and Schachar, 2010; Bickel et al., 2012), a result, we suggest, of the broader issue that psychiatric diagnostic categories do not reflect the most discrete and neurobiologically informative phenomena. Of particular relevance to the present study are the high rates of co-morbidity between OCD and social anxiety disorder (Ruscio et al., 2010), the preponderance of OCD symptoms in the schizophrenia poopulation (Poyurovsky and Koran, 2005), and more broadly that the vast majority of patients diagnosed with obsessive-compulsive disorder (OCD) meet the criteria for another lifetime psychiatric disorder (Ruscio et al., 2010). Given these major overlaps, dissociating the neurocognitive bases for these respective diagnostic categories in their current form may be untenable.

Indeed, the Diagnostic and Statistical Manual of Mental Disorders (DSM), now in it’s fifth edition (American Psychiatric Association, 2013) was developed to provide a reliable, descriptive psychiatric taxonomy, rather than an etiologically valid one. As such it is difficult to clearly discriminate the diagnostic categories it defines on the basis of genetics, neuroimaging, or indeed any of the modern tools of cognitive neuroscience. These issues have been described in detail by others (Cuthbert and Kozak, 2013; Hyman, 2007; Robbins et al., 2012), and have been recognized by the National Institute of Mental Health (NIMH), which has launched the Research Domain Criteria (RDoC) initiative, aiming to identify biologically plausible, trans-diagnostic markers of psychiatric disturbances (Insel et al., 2010). Although progress towards this goal has already been made by studies examining dissociable clusters of patients within groups diagnosed with the same disorder (Brodersen et al., 2014; Fair et al., 2012), the identification of robust, generalizable and specific markers that contribute to psychiatric co-morbidity has been curtailed by the small sample sizes that are typical of patient studies.

Accordingly, we hypothesized that a dimensional approach leveraging the efficiencies of large-scale online data collection among healthy individuals could be used to determine the precise psychiatric phenotype associated with deficits in goal-directed control, and test the specificity of this relationship with respect to other aspects of psychopathology. We hypothesized that this phenotype would broadly relate to compulsive behavior, which is seen across multiple disorders, including OCD and addiction (Gillan et al., 2015b), but were interested to reveal the scope and generality of this, e.g. with respect to impulsivity, a putatively related clinical phenotype (Robbins et al., 2012). We also wished to study how any psychiatric correlates of goal-directed control relate to variation in age and IQ, two more general factors that have been shown to covary both with goal-directed control and with some aspects of psychopathology (Eppinger et al., 2013; Schad et al., 2014; Sandstrom et al., 1998). To this end, rather than diagnosed patients, we used two large general-population samples collected online via Amazon’s Mechanical Turk (AMT) to test (i) if compulsivity as indicated by self-report OCD symptoms is associated with individual differences in goal-directed learning, (ii) if this association generalizes to self-report symptoms of other DSM diagnostic categories that involve compulsivity, and (iii) if this association is specific to compulsive versus non-compulsive psychopathology.

Goal-directed control has recently been computationally formalized as arising from a form of reinforcement learning known as ‘model-based’ (Daw et al., 2011), which can be expressed as an individual difference measure that has been shown to predict how likely individuals are to form habits (Gillan et al., 2015c). Using this well-validated task (Figure 1) (Daw et al., 2011), we found support for all three postulates. In Experiment 1, we found that total scores on a self-report questionnaire measuring the severity of OCD symptoms were tracked by normal variation in model-based learning in the general population, but not by self-report anxiety or depression symptoms. In Experiment 2, we replicated the association with self-report OCD symptoms and showed that it generalized to a broader set of psychiatric symptoms that similarly involve failures in exerting control over self-deleterious behaviors, specifically alcohol addiction, eating disorders and impulsivity. Once again, we found tentative evidence for specificity with respect to non-compulsive aspects of psychopathology. Next, we conducted a factor analysis, which indicated the existence of three latent symptom dimensions that cut across the nine different questionnaires assessed in this study. Crucially, the second symptom dimension identified was characterized by ‘Compulsive Behavior and Intrusive Thought’, in which items were most consistently drawn from the questionnaires assessing symptoms of OCD, eating disorders and addiction, pertaining not just to repetitive compulsive behaviors (as was our prediction), but also to associated preoccupations and cognitive distortions. This factor, which was defined independently of task performance, was a significant predictor of deficits in model-based learning. Crucially, this effect was highly specific to this factor, when directly compared to the two other factors identified in this analysis, ‘Anxious-Depression’ and ‘Social Withdrawal’.

(a) Subjects chose between two fractals, which probabilistically determined whether they would transition to the orange or blue second stage state. For example, the fractal on the left had a 70% chance of leading to the blue second stage state (‘common’ transition) and a 30% chance of leading to the orange state (‘rare’ transition). These transition probabilities were fixed and could be learned over time. In the second stage state, subjects chose between two fractals, each of which was associated with a distinct probability of being rewarded with a 25 cents coin. The probability of receiving a reward associated with each second stage fractal could also be learned, but (unlike the transition structure) these drifted slowly over time (0.25 < P <0.75, panel b). This meant that in order to earn the most rewards possible, subjects had to track which second stage fractals were currently best as they changed over time. Reward probabilities depicted (34%, 68%, 72%, 67%) refer to example trial 50, denoted by the vertical dashed line in (b). (b) Drifting reward probabilities determined by Gaussian Random Walks for 200 trials with grey horizontal lines indicating boundaries at 0.25 and 0.75. (c) Schematic representing the performance of a purely ‘model-free’ learner, who only exhibits sensitivity to whether or not the previous trial was rewarded vs. unrewarded, and does not modify their behavior in light of the transition that preceded reward. (d) Schematic representing the performance of a purely ‘model-based’ learner, who is more likely to repeat an action (i.e. ‘stay’) following a rewarded trial, only if the transition was common. If the transition to that rewarded state was rare, they are more likely to switch on the next trial.

Results

In Experiment 1, we tested the hypothesis that individual differences in total scores on a questionnaire assessing the severity of OCD symptoms are associated with normal variation in goal-directed control, rather than necessitating the categorical comparison of OCD patient vs. control groups. Participants (N = 548) first completed a reinforcement-learning task that quantifies individual differences in goal-directed (‘model-based’) learning, which is operationalized as a parameter estimate from a logistic regression analysis predicting choices in the task (see Materials and methods and refs [Daw et al., 2011; Gillan et al., 2015c]). Next, we administered a short Intelligence Quotient (IQ) test, followed by self-report questionnaires assessing symptoms of OCD, along with depression and trait anxiety, which we did not expect to be associated with goal-directed deficits. In line with our hypothesis, there was a significant association between scores on the OCD questionnaire and goal-directed deficits (i.e. a negative relationship between OCD severity and model-based learning; β = −0.040, Standard Error (SE) = 0.02, p=0.049) when (as in all analyses reported henceforth) controlling for age, IQ and gender, which have been previously reported to covary with goal-directed behavior (Eppinger et al., 2013; Schad et al., 2014; Sandstrom et al., 1998). Specifically, for each increase of 1 standard deviation (SD) in the total score on the OCD questionnaire, model-based learning was reduced by 14%. No such relationship was observed for self-report depression (β = −0.016, SE = 0.02, p=0.439) or trait anxiety (β = −0.006, SE = 0.02, p=0.777) severity (Table 1, Figure 2A). Moreover, the relationship between total scores on the OCD questionnaire and goal-directed deficits survived inclusion of the depression and trait anxiety total scores in the same model as covariates (β = −0.048, SE = 0.02, p=0.04). These data indicate that deficits in goal-directed control are a marker of normal variation in OCD symptomatology in the general population.

The y-axes indicate the% change in model-based learning for each change of 1 standard deviation (SD) of clinical symptoms. Error bars denote standard error. (a) In Experiment 1, total scores on a self-report questionnaire assessing OCD symptoms in a general population sample were associated with deficits in goal-directed (model-based) learning. Specifically, for each increase of 1 SD in OCD symptoms reported, model-based learning was 14% lower than the group mean. No effects were observed in depression or trait anxiety. (b) In Experiment 2, the results from Experiment 1 were replicated: OCD symptoms were associated with deficits in goal-directed learning, while total scores on questionnaires assessing depression and trait anxiety were not. We found that the association between compulsive behavior and goal-directed deficits generalized to symptoms associated with other disorders that are similarly characterized by a loss of control over behavior, alcohol addiction, eating disorders and impulsivity. No significant effects were observed for scores on questionnaires assessing schizotypy, depression, trait anxiety, apathy or social anxiety.

In Experiment 2, we aimed to test the reliability, generalizability and specificity of this finding in a larger cohort of task-naïve subjects (N = 1413, based on a power analysis given the aforementioned results). The procedure was identical to that in Experiment 1, except for the addition of several more clinical questionnaires. To test for generalizability, we assessed symptoms associated with other disorders that have been hypothesized to have compulsive features. In addition to the OCD questionnaire used in Experiment 1, these pertained to alcohol addiction, eating disorders, along with aspects of impulsivity and schizotypy (Everitt and Robbins, 2005; Poyurovsky and Koran, 2005; Robbins et al., 2012; Godier and Park, 2014). To test for specificity, in addition to the mood symptoms assessed previously (depression and trait anxiety) we added self-report measures assessing social anxiety and apathy; we also predicted that non-compulsive aspects of schizotypy and impulsivity might not be associated with goal-directed control. In this independent sample, we replicated the results from Experiment 1; scores on the OCD questionnaire were significantly associated with goal-directed deficits (β=−0.026, SE=0.01, p=0.020), while controlling for age, gender and IQ (Table 1, Figure 2B). As we hypothesized, this effect generalized to phenotypically disparate manifestations of psychiatric compulsivity: total scores on self-report measures of eating disorder severity (β=−0.037, SE=0.01, p<0.001), impulsivity (β=−0.034, SE=0.01, p=0.007) and alcohol addiction (β=−0.025, SE=0.01, p=0.029). Also as predicted, we found no significant associations between goal-directed deficits and total scores on the depression (β=−0.01, SE=0.01, p=0.385), apathy (β=−0.001, SE=0.01, p=0.953), trait anxiety (β=−0.008, SE=0.01, p=0.498) or social anxiety (β=0.008, SE=0.01, p=0.496) questionnaires. We found no significant association between self-report levels of schizotypy and goal-directed control (β=−0.017, SE=0.01, p=0.14), possibly reflective of the great deal of heterogeneity within this particular psychiatric construct.

Previous studies using this task have assessed an individual’s goal-directed learning in either of two ways: predicting their choices using either a regression model (as reported above) or the fit of a more elaborate computational learning model, which the regression model approximates. In separate analyses using fits of the computational learning model (see Materials and methods; Supplementary file 5A), all of the aforementioned results were recapitulated, with the exception that the relationship between OCD and model-based learning in Experiment 1 fell short of significance (but was significant in Experiment 2) and schizotypy reached significance in Experiment 2 as a negative predictor of model-based learning.

Given both the heterogeneity within, and the high correlation across these questionnaires (e.g., Depression and Trait Anxiety scores correlate at r=0.81) these questionnaires, assessing the statistical specificity of these effects by including their total scores in the same model is both methodologically and conceptually fraught. To address this issue, we conducted a factor analysis based on the 209 individual questionnaire items, thereby reducing the collinearity across scores on these psychiatric questionnaires. Note that this analysis was carried out on the questionnaire scores alone, without reference to the results on the reinforcement learning task. We found evidence for three dissociable factors (‘dimensions’) that cut across the nine questionnaires from which items were drawn, which we labeled ‘Anxious-Depression’, ‘Compulsive Behavior and Intrusive Thought’ and ‘Social Withdrawal’, based on the loadings of individual items (Supplementary file 2A–C, Table 2, Figure 3A). Although the labeling of factors is of course a subjective process, quantitatively speaking, ‘Compulsive Behavior and Intrusive Thought’ had high and consistent loadings from almost all items pertaining to eating disorders (Mean loading=0.36, SD=0.15), OCD (Mean loading=0.50, SD=0.06) and addiction (Mean loading=0.31, SD=0.07), which have all been couched as ‘compulsive’ disorders in the literature (Everitt and Robbins, 2005; Gillan and Robbins, 2014; Godier and Park, 2014) (Table 2). In addition to picking up every self-report item that pertained to compulsive behavior from our question pool, the loadings on Factor 2 were not confined to compulsive behaviors, but equally featured items pertaining to related patterns of thought, i.e. obsessions, preoccupations, or intrusive thoughts. We cannot speak to causality here, but this suggests that repetitive behavior and repetitive, irrational patterns of thought are not orthogonal symptom dimensions, but perhaps share a common neurobiological root. Items from the impulsivity scale (of which the total score was a significant predictor of goal-directed deficits) did not load as strongly or consistently on this factor (M=0.15, SD=0.15; significantly less than the former three questionnaires, Eating Disorders vs. Impulsivity: t(52)=5.178; OCD vs. Impulsivity: t(41)=11.379; Alcohol Addiction vs. Impulsivity: t(33)=4.342, all p<0.001) (Table 2).

Trans-diagnostic factors.

(a) Factor analysis on the correlation matrix of 209 questionnaire items suggested that 3-factor solution best explained these data. Factors were ‘Anxious-Depression’, ‘Compulsive Behavior and Intrusive Thought’ and ‘Social Withdrawal’. Item loadings for each factor are presented on the top, left and bottom sides of the correlation matrix, color-codes indicate the questionnaire from which each item was drawn. (b) These factors were entered into mixed-effects models, revealing that only the Factor 2 ‘Compulsive Behavior and Intrusive Thought’ was associated with goal-directed deficits, the effect size (17% reduction in model-based learning for every 1 SD increase in ‘Compulsive Behavior and Intrusive Thought’) was larger than for any individual questionnaire, and pairwise contrasts revealed that these deficits were specific to this factor, compared to Factor 1 ‘Anxious-Depression’ and Factor 3 ‘Social Withdrawal’. The y-axes indicate the% change in model-based learning for each change of 1 standard deviation (SD) of clinical symptomatology. Error bars denote standard error.

We next tested for an association between subjects’ scores on these three factors and their, separately measured, goal-directed performance. When tested alone, ‘Compulsive Behavior and Intrusive Thought’ was significantly associated with deficits in goal-directed learning (β=−0.046, SE=0.01, p<0.001), and this effect size was greater than that of any of the questionnaires used in this study, corresponding to a 17% reduction in model-based learning for an increase of 1 SD in ‘Compulsive Behavior and Intrusive Thought’ (Figure 3B, Table 3). There were no significant effects of Factor 1 (β=−0.001, (0.01), p=0.92) or Factor 3 (β=0.013, SE=0.01, p=0.24) on model-based learning. Finally, we directly compared the associations between goal-directed deficits and these factors by including them in the same model and conducting planned contrasts. We found that deficits in goal-directed control were highly specific to the ‘Compulsive Behavior and Intrusive Thought’ (vs. ‘Anxious-Depression’, β=−0.062, SE=0.02, p=0.001; vs. ‘Social Withdrawal’, β=−0.089, SE=0.02, p<0.001). Moreover, when included in the same model with the other factors, ‘Social Withdrawal’ (onto which addiction and aspects of impulsivity load negatively) emerged as a significant positive predictor of goal-directed control over action (β=0.031, SE=0.01, p=0.014). To test the extent to which the relationship between goal-directed deficits and ‘Compulsive Behavior and Intrusive Thought’ is truly continuous, we carried out a supplementary analysis in which this factor was entered as a quadratic term in our model, thereby testing for a nonlinear effect. We found no evidence for nonlinearity (beta=−0.0016, p=0.822), and the linear effect remained significant when included in this model (beta=−0.045, p=0.001). Similarly, we repeated our analyses in subsets of our population comprising either ‘putative patients’ (defined as those who scored in the top 25% on a given self-report measure) or subjects in the normal range (bottom 75%) and the results were broadly consistent across sub-samples (Materials and methods, Supplementary file 3).

Finally and in complement to the unsupervised factor analysis used to define ‘Compulsive Behavior and Intrusive Thought’, we carried out a fully supervised analysis (regression with elastic net regularization) to identify directly from the individual questionnaire items those most predictive of goal-directed learning, as assessed using the regression model. Supporting our previous conclusions, those items that predicted model-based deficits in the negative direction substantially overlapped with items with above-threshold loadings on ‘Compulsive Behavior and Intrusive Thought’ (75% overlap; Supplementary file 4). One noteworthy pattern arises among the exceptions. The supervised analysis also identified several additional items from the impulsivity questionnaire, which had not loaded on ‘Compulsive Behavior and Intrusive Thought’, but did predict goal-directed learning. Those were items that tracked subjects’ motivation to engage with the experimental paradigm, e.g. “I (do not) like to think about complex problems”. Other, more compulsivity-relevant items from the impulsivity scale, involving compulsive shopping and general loss of control over action, were identified in both analyses. The former items are likely of little clinical relevance, but can explain the strong association between impulsivity total scores and goal-directed deficits, despite the fact that impulsivity did not load strongly onto ‘Compulsive Behavior and Intrusive Thought’.

In addition to tracking one well-delineated aspect of psychopathology, we found that task performance was significantly related to other measures collected in this study. First, although individual variation in ‘model-free’ performance on the learning task did not track any of the scores from our psychiatric questionnaires (Supplementary file 1C), in Experiment 1, model-free performance did relate significantly to age (Supplementary file 1B, Reward*Age interaction). ‘Model-based’ learning was also related to age and IQ. In particular, higher IQ was associated with increases in goal-directed, ‘model-based’ learning. In contrast to the effect of age on ‘model-free’ learning, older people were significantly less ‘model-based’ compared to their younger counterparts. All of these results were replicated in Experiment 2. Additionally, the larger sample size in Experiment 2 allowed us to detect small but significant associations between gender and model-free and model-based learning. Males were significantly less model-free and more model-based relative to females tested in this study. Importantly, all of these effects are controlled for (by including age, IQ, and gender as additional covariates) in the analyses relating learning to psychiatric symptoms.

Discussion

Here, we tested the utility of a dimensional approach to investigating the neurocognitive basis of compulsivity using two large-scale general population samples. Evidence from multiple complimentary analyses supported the conclusion that ‘Compulsive Behavior and Intrusive Thought’ is a symptom dimension associated with deficits in goal-directed control that links features of multiple psychiatric disorders; most notably symptoms of OCD, addiction, and eating disorders. Interestingly, this dimension goes beyond the uncontrolled behaviors that have been previously associated with compulsivity, to include obsessions, preoccupations and intrusive thoughts.

That self-report scores of OCD and addiction symptoms were associated with these deficits is consistent with previous research in patient populations (Sjoerds et al., 2013; Voon et al., 2015; Gillan et al., 2011; 2014a; 2014b; 2015a), and extends these results for the first time to a general population sample. Likewise, binge-eating disorder has also been previously associated with reduced goal-directed control in one patient study and an animal model (Voon et al., 2015; Furlong et al., 2014). Critically, the results of the present study extend this finding to self-report symptoms of other subtypes of eating disorders, suggesting that Compulsive Behavior and Intrusive Thought (and associated deficits in goal-directed control) are a key component of more aspects of eating disorders than previously documented. An entirely consistent exception was that items relating to exerting control over food intake (e.g. “I display self-control around food”) did not load strongly on the ‘Compulsive Behavior and Intrusive Thought’ factor.

A previous study reported an association between social anxiety disorder and deficits in goal-directed control (Alvares et al., 2014). Using self-report social anxiety symptom scores in our general population sample, we did not replicate this finding, and in fact observed a trend towards enhanced goal-directed control associated with social anxiety symptoms. Specifically, in most analyses social anxiety symptoms (both total scores and the ‘Social Withdrawal’ factor) was unrelated to task performance. We did however observe a significant positive association between the ‘Social Withdrawal’ factor and goal-directed control in one analysis, while controlling for the other factors in the same analysis. This result should be interpreted with caution, given that the association was not sufficiently robust to predict goal-directed control alone, but this serves to illustrate that ‘Social Withdrawal’ trended towards predicting better goal-directed control, not worse. Two explanations for the discrepant findings between the present study and the prior investigation with diagnosed social anxiety disorder patients are the differences in sample size between our respective studies and that the co-morbidities reported for the social anxiety disorder population of the study by Alvares and colleagues (2014) could not be controlled for and may have driven the reported association. This underscores the importance of a dimensional approach to psychiatric phenotyping.

Schizophrenia has also been previously associated with deficits in goal-directed control (Morris et al., 2015), a finding that was partially supported by the present study (to the limited extent that ‘schizotypy’, measured here, has implications for schizophrenia as a clinical condition). Consistent with the heterogeneous nature of schizophrenia, where two diagnosed patients can have entirely non-overlapping symptoms (American Psychological Association, 2013), we did not find a significant association between the total score on the schizotypy questionnaire and deficits in goal-directed control (although this was significant in a second analysis based on a full computational model). However, using our trans-diagnostic approach, we found that in particular ‘unusual experiences’ characteristic of schizotypy loaded onto the ‘Compulsive Behavior and Intrusive Thought’ factor, which in turn was a strong predictor of goal-directed deficits. This finding converges with studies highlighting that delusions are more closely linked to executive deficits than the negative symptoms of schizophrenia (Lysaker et al., 1998; 2003). In terms of clinical phenomenology, schizophrenia and OCD share a common pattern of abnormal beliefs and as DSM-5 and others have noted, the distinction between a delusion in schizophrenia and a strongly held belief in OCD is often blurred (Poyurovsky and Koran, 2005; American Psychological Association, 2013). These data suggest that ‘Compulsive Behavior and Intrusive Thought’, which comprises automatic behaviors as well as associated repetitive thoughts, may be common to both schizophrenia and OCD and explained by deficits in goal-directed control.

Earlier work investigating deficits in goal-directed learning in compulsive patient populations did not employ a positive clinical control (Voon et al., 2015), therefore until now the possibility that goal-directed deficits were non-specific, i.e. evident in all psychiatric populations, remained untested. For instance, prior studies have found a consistent association between stress and goal-directed learning deficits (Otto et al., 2013; Schwabe and Wolf, 2009), which might in principle mediate non-specific effects due to the considerable burdens of mental illness. Here, we tested this possibility rigorously in two independent samples. We found no association between ‘Anxious-Depression’ and deficits in goal-directed control, and moreover the specificity of goal-directed deficits to ‘Compulsive Behavior and Intrusive Thought’ was confirmed through direct statistical comparisons.

Prior work has shown that the model-based learning deficits predict the presence of habits using a devaluation probe (Gillan et al., 2015c), providing a tentative mechanism through which the goal-directed deficits observed in the present study might cause the development of compulsive behaviors. Indeed, this converges with prior work showing that when OCD patients are performing habits, they show dysfunctional hyperactivity in the caudate (Gillan et al., 2015a), a region associated with goal-directed control over behavior (Dolan and Dayan, 2013). An outstanding question, however, is the extent to which excessive stimulus-response habit learning also contributes to Compulsive Behavior and Intrusive Thought. The model-free component of the task we employed in the present study did not relate significantly to psychiatric symptomatology, as indeed we had hypothesized because it also does not appear to be sensitive to slow habitual learning (indeed, unlike the model-based component of the task, it does not predict devaluation [Gillan et al., 2015c]). Future work is needed to develop a computational marker of individual differences in stimulus-response habit formation, so that this possibility can directly be tested.

Another interesting question that emerges from these data is how deficits in goal-directed control might result in both cognitive distortions (which take the form of obsessions in OCD, such as a fear of germs) and compulsive behavior (e.g. repetitive hand-washing), which our factor analysis suggested are inextricably linked. One possibility was raised by a recent study, which demonstrated that just like low-level stimulus-response behaviors, more abstract goal selection can also be rendered habitual (Cushman and Morris, 2015). If these habitual cognitive actions can be conceived as a sort of ‘habit of thought,’ this might indicate a common mechanism for both compulsive behavior and the related repetitive patterns of thought (i.e. ‘habits of thought’). An alternative possibility posits that obsessive thoughts may develop as a result of compulsive behavior (Gillan and Robbins, 2014). Evidence for this idea comes from a study where OCD patients were found to engage in post-hoc rationalization in order to explain a series of habitual responses (Gillan et al., 2014b). The notion is that in OCD, experiencing a recurrent urge to wash one’s hands might cause a patient to infer that they are concerned about hygiene. Future, longitudinal work will be needed to dissect the temporal dynamics of these symptom features to test these hypotheses, which are not mutually exclusive.

Researchers have suggested that ‘Impulsivity’ and ‘Compulsivity’ are partially overlapping neurocognitive features relevant for many psychiatric disorders (Robbins et al., 2012). The present study offers some insights in this regard. While the total score of the impulsivity scale was a strong predictor of goal-directed deficits, it did not load significantly onto the ‘Compulsive Behavior and Intrusive Thought’ factor, suggesting it has an independent association with goal-directed deficits. The supervised analysis identified the items from the impulsivity scale that best predicted goal-directed deficits. In terms of the overlap between the impulsivity questionnaire items and Factor 2, the two above-threshold predictors of model-based deficits were “I spend or charge more than I earn” and “I do things without thinking”, each of which is qualitatively characteristic of compulsive, habitual behavior. Importantly, the three items that did not overlap with ‘Compulsive Behavior and Intrusive Thought’, but still predicted model-based learning, tracked subjects’ general interest in engaging with the task (e.g. “I do not like puzzles”, “I do not like to think about complex problems”). We suggest that these items may not be of particular clinical importance, but simply serve as a marker of how likely individuals are to engage with the task material. In summary, while a small subset of the impulsivity items contributed to ‘Compulsive Behavior and Intrusive Thought’, impulsivity as assessed by our scale was mostly distinct. Of course, impulsivity as a construct itself involves a broad range of potentially distinct behaviors, such as impatient inter-temporal choice preferences and premature responding (Dalley et al., 2011). Further work will be need to assess how such behaviors relate to the features measured here; notably, our large-scale online methodology is well suited for examining such questions.

As has been shown for other tests that broadly fall within the category of executive function (Arffa, 2007), model-based learning was also associated with IQ and age (and gender in experiment 2 only). Although these effects were controlled for in all analyses and therefore do not bias the interpretation of our results, they highlight the fact that the coupling between model-based learning and ‘Compulsive Behavior and Intrusive Thought’ is far from perfect. One particularly interesting observation is that as people get older, they show greater deficits in model-based learning (Supplementary file 1B), but fewer psychiatric symptoms on all nine questionnaires collected in the present study (Supplementary file 1A), in line with prior work with diagnosed patients (Kessler et al., 2005). This incongruence suggests that there may be multiple dissociable processes responsible for model-based learning. Future studies are needed to dissect this somewhat complex construct into its constituent parts (as has been already attempted for other executive tasks [Miyake et al., 2000]), with a view to identifying the simpler component that is specific to the compulsive phenotype. Relatedly, future work might test if working memory might conceivably contribute to this association observed in the present study (Otto et al., 2013). Also, the strength of the association between a clinical phenotype and an underlying mechanism is fundamentally limited by the accuracy with which we can assess that phenotype. Aside from issues of relatively low reliability of self-report clinical symptoms (e.g. self-report OCD, r=0.71 [Hajcak et al., 2004]), we are also limited by the questions we ask. For example, in the present study we did not account for pathological gambling or trichotillomania, which are similarly defined clinically by a loss of control over repetitive behavior (Potenza, 2008; Chamberlain et al., 2007) and therefore may contribute noise to our signal. It is clear that iterative improvements to both self-report assessment and behavioral testing are needed to increase effect sizes and further refine the neurobiological characterization of Compulsive Behavior and Intrusive Thought suggested by these data.

Although we have labeled the three factors that emerged from our unsupervised analysis based on theoretical considerations, we acknowledge that this is an inherently subjective process and that some may rightfully disagree with our choice of terminology. An important distinction to be made here is that although this labeling process was subjective, the way in which these clusters were identified was not. We first identified a heretofore-unrecognized collection of trans-diagnostic psychiatric symptoms based on their inter-correlations and then validated this clustering by demonstrating an association with neurocognitive performance in an independent task. ‘Compulsive Behavior and Intrusive Thought’ is not intended to be a fixed or final definition – rather it is hoped that future work can (i) use the clusters defined in this study to find closer links between biological markers and clinical and (ii) improve and augment these clusters through further data-driven evaluations. More broadly, we hope that this methodology can be employed in many other areas of psychiatry where the considerable issues of heterogeneity within and homogeneity across the existing diagnostic categories is curtailing efforts to delineate the precise neurobiological basis of psychiatric problems.

In the present study, we did not screen for psychiatric disorders, favoring the acquisition of a large sample within which we could leverage normal variation in psychopathology. Although our results converge with prior work using this neurocognitive marker in compulsive disorders (Voon et al., 2015), future studies will be needed to test if these dimensional results map onto clinically diagnosed patients. For example, based on the results of the present study, we hypothesize that the co-morbidity between OCD and addiction might be largely explained by a common deficit in goal-directed control. Conversely, the co-morbidity between OCD and anxiety disorders might be explained by an orthogonal (equally important) symptom dimension. This kind of exciting work should be coupled with studies aiming to use such trans-diagnostic markers to predict treatment response on an individual basis within the existing diagnostic categories.

Altogether, these data suggest that ‘Compulsive Behavior and Intrusive Thought’ together constitute a dimensional psychiatric phenotype that can be tracked in the general population and is linked to deficits in goal-directed control over action, which has a clear neurobiological foundation (Dolan and Dayan, 2013). These data highlight the utility of a computational approach to psychiatry (Montague et al., 2012) and specifically our novel approach of leveraging large datasets, online testing, and normal variation in psychopathology to isolate the neurocognitive basis of psychiatric dimensions that may be relevant for multiple disorders. More broadly, the results of this study constitute progress toward realizing the promise of the RDoC initiative, suggesting that dimensional markers of psychiatric disturbances may map more closely to underlying biological states than do the overlapping and heterogeneous definitions of DSM disorders.

Materials and methods

Participants

Data were collected online using Amazon’s Mechanical Turk (AMT). Participants were paid a base rate (Experiment 1: $2, Experiment 2: $2.50) in addition to a bonus based on their earnings during the reinforcement-learning task (In each experiment, M=$0.54, SD=0.04). Subjects were based in the USA (i.e. had a US billing address with an associated US credit card, debit card or bank account), 95% of their previous tasks were approved and were 18 years or older. Participants in Experiment 1 (N=548) were 357 females (65%) and 191 males with ages ranging from 18 to 72 (M=35, SD=11). Using the effect size of the relationship between OCD symptoms and model-based learning observed in Experiment 1, we estimated that to achieve 80–90% power on a two-tailed test with a significance level of p<0.05, the sample size needed in Experiment 2 between 1223–1637 subjects. Experiment 2, participants (N=1413) were 823 females (58%) and 590 males with ages ranging from 18 to 76 (M=33, SD=11). The research team did not know participants’ identities; participants provided their consent online by clicking ‘I Agree’ after reading the study information and consent language in accordance with procedures approved by the New York University Committee on Activates Involving Human Subjects.

Exclusion criteria

In line with suggestions made in the literature with respect to studies conducted using Amazon’s Mechanical Turk (AMT), several a priori exclusion criteria were applied to ensure data quality (Crump et al., 2013). Prior to completing the RL task subjects completed a practice phase, which consisted of written instructions, passively viewing 20 trials demonstrating the probabilistic nature of the associations between second stage fractals and subsequent 25c rewards, and actively participating in 20 trials demonstrating the probabilistic transition structure of the task (i.e. selecting a top-stage box on each trial and observing the transition to second-stage states). After this practice phase, participants were required to correctly answer a 3-item basic comprehension test regarding the rules of the reinforcement-learning task (Gillan et al., 2015c). If subjects failed to answer the questions correctly, they were sent back to the beginning and required to repeat the instructional section prior to re-taking the comprehension test. Participants were permitted to repeat this cycle as many times as was necessary for them to pass this test and continue to the main experiment.

The RL instructions and associated comprehension test were always administered first, followed by the RL task, then the IQ test and finally the self-report psychiatric assessments. Within the self-report section, the order of the questionnaires was fully randomized. Exclusions based on task performance/engagement were applied sequentially, in the order listed below. Reinforcement-Learning Task Exclusion Criteria: Subjects were excluded if they missed more than 10% of trials (Exp1: n=11; Exp2: n=62), responded on the same key on more than 95% of trials on which they registered a response (Exp1: n=46; Exp2: n=85) or had implausibly fast reaction times, i.e. ± 2 standard deviations from the mean (Exp1: n=9; Exp2: n=18). Clinical Questionnaires Exclusion Criterion: In an effort to identify participants who were not reading the questions prior to selecting their responses, we included one catch item: “If you are paying attention to these questions, please select 'A little' as your answer”. Very few subjects failed to select the appropriate response to this catch question; those that did were excluded (Exp1: n=0; Exp2: n=6). IQ Test Exclusion Criterion: Participants who did not answer correctly to any of the IQ questions were excluded from further analysis (Exp1: n=32; Exp2: n=87). The adaptive character of the test meant that participants responding incorrectly received increasingly easy items; consistently failing to respond correctly indicates that given participants might have been inattentive or dishonest. In total, 98/646 (15%) subjects who submitted data were excluded in Experiment 1 and 258/1671 (15%) were excluded in Experiment 2.

We tested post hoc if subjects excluded on the basis of RL task performance were typical in terms of psychiatric self-report and other assessments. In study 1, we found that those subjects who were excluded had lower symptoms of OCD (t(604)=2.477, p=0.014), trait anxiety (t(604)=2.225, p=0.027), and a trend towards lower levels of depression (t(604)=1.799, p=0.073). These differences were not observed in Study 2, where all questionnaire total scores were not significantly different across groups (p>0.05). For both Experiment 1 and 2, results presented in this paper are not changed by the inclusion of these subjects in the analyses.

Reinforcement learning task

To assess individual differences in goal-directed learning, we used a reinforcement-learning task (Daw et al., 2011) that distinguishes goal-directed ('model-based') learning from basic temporal difference ('model-free') learning. Model-based learning, like ‘goal-directed learning’, reflects the extent to which individuals integrate contingency information with estimations of outcome value to make choices, and predicts whether or not individuals can exert control over their habits in a devaluation test (Gillan et al., 2015c; Friedel et al., 2014). While model-free learning has been suggested to capture slow incremental learning characteristic of habit-formation itself, empirical studies using sequential decision tasks have not detected this relationship (Gillan et al., 2015c; Friedel et al., 2014), and this converges with the empirical observation that deficits in model-based (but not model-free) learning have been observed in compulsive disorders (Voon et al., 2015). The design of the task is presented in Figure 1. On each trial, subjects were presented with a choice between two fractals (2.5 s choice window). Each fractal usually (i.e. ‘common’ transitions: 70%, Figure 1A, white arrow) led to a particular second state (orange or blue) displaying another two fractal options. Selecting one of the fractals in the second stage resulted in participants being probabilistically rewarded with a picture of a 25¢ coin. There was a unique probability of receiving a reward associated with each second stage fractal, and these drifted slowly and independently over time (never being less than 0.25 or greater than 0.75). Responses were indicated using the left (‘E’) and right (‘I’) keys. Critically, on 30% of ‘rare’ trials (Figure 1A, grey arrow), choices uncharacteristically led to the alternative second state. A purely ‘model-free’ learner makes choices based solely on whether or not they were rewarded the last time they performed this action, regardless of whether the transition was rare or common (Figure 1C). A ‘model-based’ learner, in contrast, makes decisions based not only on the history of reward, but also the transition structure of the task, i.e. the environmental contingency (Figure 1D). For example, if a choice was followed by a rare transition to a second state, and that second state was rewarded, a model-based learner would be more likely to choose the alternate action on the next trial, because this is more likely to return them to that rewarding second state. A model-free learner, on the other hand, would be more likely to repeat that same action again, making no adjustment based on the transition type. We used a logistic regression based on this logic to identify from their switching patterns the extent to which each participant exhibited goal-directed (model-based, vs. model-free) choices (Daw et al., 2011).

IQ - Progressive matrices

Intelligence Quotient (IQ) was approximated using a Computerized Adaptive (CAT) based on a bank of n=26 items similar to those used in Raven's Standard Progressive Matrices (SPM: [Raven, 2000]). The item bank was built using two parameter logistic Item Response Theory model (2pl: [Baker, 1992]). Item parameters were estimated using an online piloting sample of 760 participants (not included in the present study) that took both the test used in this study and original SPM. Items retained in the item bank were characterized by parameters (item-fit and discrimination) comparable or better than original SPM items. The length of the CAT test was 5 items (plus one non-diagnostic starting items). The items, including the starting item, were selected using Maximum Fisher Information criterion (va der Linden et al.) with a randomesque parameter of n=3 (Kingsbury and Zara, 1989). The scores were estimated using a Bayes Modal estimator (Birnbaum, 1969). Estimates based on the piloting sample showed that the score based on a 5-item CAT correlates relatively highly (r=0.77) with a score of a full SPM test.

Self-report psychiatric questionnaires

In both Experiments 1 & 2, subjects completed self-report questionnaires assessing obsessive-compulsive disorder (OCD) using the Obsessive-Compulsive Inventory – Revised (OCI-R) (Foa et al., 2002), depression, using the Self-Rating Depression Scale (SDS) (ZUNG, 1965) and trait anxiety was assessed using the trait portion of the State-Trait Anxiety Inventory (STAI) (Spielberger et al., 1983). In Experiment 2, subjects were additionally assessed for alcohol addiction using the Alcohol Use Disorder Identification Test (AUDIT) (Saunders et al., 1993), apathy using the Apathy Evaluation Scale (AES) (Marin et al., 1991), eating disorders using the Eating Attitudes Test (EAT-26) (Garner et al., 1982), impulsivity using the Barratt Impulsivity Scale (BIS-10) (Patton et al., 1995), schizotypy scores using the Short Scales for Measuring Schizotypy (Mason et al., 2005) and social anxiety using the Liebowitz Social Anxiety Scale (LSAS) (Liebowitz, 1987). Means of these total scores are presented in Supplementary file 1A, along with their relationship to age, gender and IQ. In Experiment 2, subjects also completed some additional self-report assessments that were unrelated to the present study and will be published elsewhere. These self-report assessments were fully randomized within the psychiatric assessment component of the procedure.

Quantifying model-based learning (Logistic regression)

Logistic regression analyses were conducted using mixed-effects models implemented with the lme4 package in the R programming language, version 3.1.1 (http://cran.us.r-project.org). The model tested if subjects’ choice behavior (coded as switch: 0; stay: 1, relative to the previous choice) was influenced by Reward (coded as rewarded: 1; unrewarded: -1), Transition (coded as common: 1, rare: -1), and their interaction, on the preceding trial. A main effect of reward indicates that there is a significant contribution of model-free learning to choice behavior. An interaction between Reward and Transition indicates that there is a significant contribution of model-based learning to choice behavior. Within-subject factors (the intercept, main effects of reward and transition, and their interaction) were taken as random effects, i.e. allowed to vary across subjects. First, we tested our basic logistic regression model, which included age, gender and IQ as fixed effects covariates. We used Bound Optimization by Quadratic Approximation (bobyqa) with 1e5 functional evaluations. The model was specified in the syntax of R as follows:

In each Experiment, we found a significant main effect of Reward ('model-free') and a significant Reward x Transition interaction ('model-based') (Figure 4, Supplementary file 1B). There was also an unhypothesized significant main effect of Transition and an interaction between Transition and IQ, such that subjects were more likely to stay following a common transition and individuals higher in IQ showed this pattern more strongly. This seemingly anomalous effect is likely a side effect of additional structure in the choices that the regression model fails to capture. In particular, in the full computational model, choices are impacted by incremental learning that accrues over trials, such that a choice on some trial is affected by rewards on multiple preceding trials. Although the regression model considers only the most recent trial’s rewards, some aspects of additional learning might be correlated with the transition term, producing small bias that can be detected given the large sample size of the current study (Skatova et al., 2013). For instance, the full model tends to encounter a negative reward prediction error immediately following a rare transition, which is driven by learning about second-stage state values driven by rewards received on previous trials. Such structure is more interpretably subsumed within the model-based and model-free learning terms in the fits of the fuller computational model, where, notably, the key results were all recapitulated (see below).

Model-based learning and self-report clinical phenomenology

To test the hypothesis that a symptom severity of a given clinical construct ('SymptomScore') was associated with model-based learning deficits, we included the total score for each questionnaire (z-scored) as a between-subjects predictor and tested for interactions with all other factors in the model. We included age, gender and IQ (all z-scored) as fixed effects predictors interacted with Reward, Transition and Reward x Transition, to control for potentially confounding relationships between model-based learning and these covariates of no-interest. We hypothesized that there would be a significant three-way interaction between Reward, Transition and SymptomScore, only if those symptoms pertained to compulsive patterns of behavior. Specifically, we expected that greater severity of self-reported compulsive symptoms (i.e. OCD, addiction, eating disorders and aspects of impulsivity) would be predictive of reductions in model-based control over action. In the syntax of the lme4 package, the specification for the regression was the same as above with the addition of the SymptomScorez, as follows:

In Experiment 1, three models were tested in which ‘SymptomScorez’ refers to the z-scored OCD, Trait Anxiety and Depression total scores in each respective model. Additionally, in Experiment 1, we also tested a model where self-report symptoms of OCD, trait anxiety and depression were included in the same model, to illustrate that the association with OCD symptoms survived the exclusion of shared variance. This was specified as follows:

In Experiment 2, due to the high correlations across the different clinical scales, including all of the questionnaires in the same model would not produce an interpretable result - such that meaningful shared variance would be lost. Therefore, the associations between model-based learning and each questionnaire were assessed using separate models for each questionnaire (SymptomScorez, as specified above). As expected based on prior literature in this area (Voon et al., 2015), there was no relationship between clinical symptomatology and model-free learning in either Experiment (Supplementary file 1C). Note that we tested this model without gender (as gender was not itself significant in the model), and the results do not change - the effect of OCD symptoms on model-based learning remains significant (β=−0.041, SE=0.02, p=0.043). We nonetheless include gender in the presented models for Experiment 1 for consistency with Experiment 2, where gender effects were observed.

Factor analysis

In order to (i) reduce the collinearity between the total scores for each of the 9 questionnaires employed and (ii) investigate the possibility that a more parsimonious latent trans-diagnostic structure could explain item-level responses in this dataset, we employed factor analysis using Maximum Likelihood Estimation (MLE). Factor analysis was conducted using the factanal() function from the Psych package in R, with an oblique rotation (oblimin). Two hundred and nine individual questionnaire items were entered as measured variables into the factor analysis. As responses on the schizotypy scale were binary at the item-level, a heterogeneous correlation matrix was computed using the hector function in polycor package in R. This allowed for Pearson correlations between numeric variables, polyserial correlations between numeric and binary items and polychoric correlations between binary variables. Factor selection was based on Cattell’s criterion (Cattell, 1966); wherein a sharp transition from horizontal to vertical (‘elbow”) indicates that there is little benefit to retaining additional factors. The scree-plot was analyzed using an objective implementation of this criterion, the Cattell-Nelson-Gorsuch (CNG) test, which computes the slopes of all possible sets of three adjacent eigenvalues and determines the point at which there is the greatest differences in slope (nFactors package in R) (Gorsuch and Nelson, 1981). The CNG test indicated the existence of a 3-factor latent structure (Figure 5), which comprises factors that we labeled ‘Anxious-Depression’, ‘Compulsive Behavior and Intrusive Thought’ and ‘Social Withdrawal’ based on the strongest individual item loadings (Supplementary files 2A, 2B & 2C, respectively). Although Cattell’s criterion is perhaps the most widely utilized rule-of-thumb for factor selection, we acknowledge that there are many alternatives and indeed another objective method, ‘Parallel Analysis’ (Drasgow and Lissak, 1983), suggests an 8-factor solution to our data. This model was not only less parsimonious than the 3-factor solution, but in addition, a post hoc analysis revealed that it was also quantitatively inferior at predicting task performance when these 8 factors were entered as predictors in a mixed effects model (as per our main task analyses). Specifically, both Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) were lower for the mixed effects model with covariates derived from the 3-factor solution relative to the 8-factor solution, indicating that this model was the best at predicting behavior.

Scree plot of eigenvalues.

The outer frame shows the eigenvalues for every possible factor solution, N=209. Inset is data for the first 20 potential factor solutions only. An empirically defined elbow, where Eigenvalues begin to level out, was identified at factor 4 using the nFactors package in R, provideing evidence for a 3-factor solution (Cattell, 1966), indicated in orange.*p<0.05 **p<0.01 ***p<0.001.

Labeling factors 1 and 3

As outlined in the results section, factors were labeled based on items that loaded most strongly and consistently. For the ‘Anxious-Depression’ factor, the highest average loadings came from the Trait Anxiety questionnaire (M=0.52, SD=0.17), followed by Apathy (M=0.44, SD=0.16) and Depression (M=0.38, SD=0.23) (Table 2). No other questionnaires reached the 0.25 average loading threshold we apply throughout this manuscript, but impulsivity came very close (M= 0.24, SD=0.22). Those impulsivity items that loaded most consistently reflected a tendency to not plan for the future and reduced ability to concentrate.

Factor 3 was labeled ‘Social Withdrawal’. This factor was dominated by items from the Social Anxiety questionnaire (M=0.57, SD=0.14), and interestingly did not have a significant contribution from trait anxiety (M=0.13, SD=0.17). We chose the term ‘withdrawal’ primarily to distinguish this factor from the original social anxiety disorder questionnaire. Interestingly this factor had borderline negative contributions from the alcohol addiction scale, which were low but consistent (M=−0.23, SD=0.06). Overall, this factor describes a phenotype that fears and avoids social situations, but interestingly also thinks excessively about future events and appears risk averse.

Dimensional factors predicting model-based learning

A mixed effects logistic regression analysis was conducted to test the extent to which ‘Anxious-Depression’, ‘Compulsive Behavior and Intrusive Thought’, and ‘Social Withdrawal’ factors predicted deficits in goal-directed control over action. Specifically, these three factors were entered as z-scored fixed effect predictors in the basic model described above (i.e. interacted with Reward, Transition and Reward*Transition), while controlling for age, gender and IQ:

The extent to which a factor is related to deficits in goal-directed control is indicated by the presence of a significant Reward*Transition*Factorz interaction (in the negative direction). Unlike the analysis of the original questionnaire total scores, in addition to testing the predictors separately in independent models, here we also tested a model where all three clinical predictors were included in the same model, which allowed us to statistically compare their effect sizes and thereby make claims about the specificity of our effects to compulsive (versus non-compulsive) aspects of psychopathology. Table 3 shows the effects for model-based learning. There were no effects on model-free reinforcement learning.

To test the extent to which these results reflect a continuous relationship between model-based learning and Factor 2 (‘Compulsive Behavior and Intrusive Thought’), constructed subsets of our total sample comprising either ‘putative patients’ (defined as those who scored in the top 25% on a given self-report measure) or subjects in the normal range (bottom 75%). We then repeated our analyses in these sub-samples. The slopes of the regression lines were consistent across all analyses, such that the relationship between model-based deficits and Factor 2 were observed both in individuals reporting the most severe symptoms and those in the normal range (see Supplementary file 3). In all 9 analyses with individuals in the ‘normal range’, the relationship between Factor 2 and model-based deficits were significant at p<0.05. In 5/9 analyses with ‘probably patients’ who were in the top 25% of symptom severity, the relationship between Factor 2 and model-based deficits were significant at p<0.05. This analysis had just ¼ of the total sample and was therefore severely underpowered. But nonetheless, the direction and slope of the effect were consistent across the board, providing evidence to suggest that these relationships will likely generalize to patient populations.

Supervised analysis

In addition to the factor analysis, we also carried out a fully supervised analysis to identify the individual items that explained the most independent variance in goal-directed learning using linear regression with elastic net regularization. Elastic Net (Zou and Hastie, 2005) regularization imposes a hybrid of both L1- and L2-norm penalties (i.e., penalties on the absolute (L1 norm) and squared values of the β weights (L2 norm)). This allows relevant but correlated coefficients to coexist in a sparse model fit, by doing automatic variable selection and continuous shrinkage simultaneously, and selects or rejects groups of correlated variables. Least absolute shrinkage and selection operator LASSO, (Tibshirani, 1996) and ridge regression (Hoerl and Kennard, 1970) are special cases of the Elastic Net. The dependent measure in this analysis was each subject’s model-based score (i.e individual subject’s coefficients for reward x transition, corrected for age, IQ and gender, from the analysis in Experiment 2, Supplementary file 1B). All predictor data were first feature scaled (z-score transformed). We implemented ten-fold cross-validation with nested cross-validation for tuning and validating the model. Briefly, to implement cross-validation, the data were randomly split into 10 groups. A model was then generated based on 9 training groups, and then applied to the remaining independent testing group. Each group served as the testing group once, resulting in 10 different models, and predictions for every subject based on independent data. Nested cross-validation involved subdividing the 9 training groups (i.e., 90% of the sample) into a further 10 groups (‘inner’ folds). Within these 10 inner folds, 9 were utilized for training a model over a range of 50 alpha (0.01–1) and 50 lambda (0.0001–1) values, where alpha is the weight of lasso versus ridge optimization and lambda is the regularization coefficient. This generated a resulting model fit on the inner fold test set for each possible combination of alpha and lambda. The mean fit over all 10 inner folds for each combination of alpha and lambda was then calculated and then used to determine the optimal parameters for the outer fold. We conducted 100 iterations of regularization with tenfold validation and retained items that were significant predictors of model-based learning in >=95% of final models. The overall model was significant, with the median cross-validated p=0.00003, median cross-validated r=0.11. Twenty-eight features met these criteria and are listed in Supplementary file 4.

Quantifying model-based learning (Computational model)

The logistic regression analyses presented are a simplified method for analyzing the data, but as this approach only considers events taking place on the trial immediately preceding choice, it does not fully capture the influence of slow, incremental learning that takes place over many trials. These analyses have been shown to produce very similar results, particularly when estimating model-based learning (Gillan et al., 2015c; Otto et al., 2013) (indeed they are correlated at 0.87 here). Nonetheless, to complement these analyses, we verified that the relationship between model-based learning and compulsive behavior holds in the full computational instantiation of model-based and model-free reinforcement learning. For this analysis, choices were modeled as arising due to the weighted combination of model-free and model-based reinforcement learning. The model is equivalent to that used by Otto et al (Otto et al., 2013), which is itself a simplified variant of the one used by Daw et al (Daw et al., 2011). At each trial t, a participant makes a stage-1 choice c1,t, occasioning a transition to a stage-2 state st where she makes another choice c2,t and receives reward rt. At stage 2, subjects are assumed to learn a value function over states and choices, Qtstage2(s,c), whose value for the chosen action is updated in light of the reward received at each trial according to a delta rule, Qt+1stage2(st,c2,t)=(1−α)Qtstage2(st,c2,t)+rt. Here, α is free learning rate parameter, and (in this and all analogous update equations throughout) we have omitted a factor of α from the last term of the update, which is equivalent to rescaling the rewards and Qs by 1/α and the corresponding weighting parameters β by α. (Otto et al., 2013) The probability that a subject will make a particular stage-2 choice is modeled as governed by these choices according to a logistic softmax, with free inverse temperature parameter βstage2: Pc2,t=cαexpβstage2Qtstage2(st,c), normalized over both options c.

Stage-1 choices are modeled as determined by the weighted combination of both model-free and model-based value predictions about the ultimate, stage-2 value of each stage-1 choice. Model-based values QMB are given by the learned values of the corresponding stage-2 state, maximized over the two actions: QtMB(c)=maxc2(Qtstage2(s,c2)), where s is the stage-2 state predominantly produced by stage-1 choice c. Model-free values are learned by two learning rules, each of which updates according to a delta rule with a different estimate of the second-stage-value: TD(0), where Qt+1MF0(c1,t)=(1−α)QtMF0+Qtstage2(st,c2,t), and TD(1), where Qt+1MF1(c1,t)=(1−α)QtMF1+rt. Stage-1 choice probabilities are then given by a logistic softmax, with contributions from each value estimate, each weighted by its own free inverse temperature parameter: P(c1,t=c)∝exp(βMBQtMB(c)+βMF0QtMF0(c)+βMF1QtMF1(c)+βstickI(c=c1,t−1)). Here, I(c=c1,t−1) is a binary indicator for the choice that repeats the one made on the previous trial; the corresponding weight βstick measures the tendency to alternate or perseverate regardless of feedback.

At the conclusion of each trial, the value estimates Q (of all three sorts) for all unchosen actions and unvisited states are decayed multiplicatively by (1−α).

Altogether, the model has six free parameters: five weights β and a learning rate α. These represent a minor change of variables with respect to the equations in Otto et al. (2013): In particular, by separating the TD(0) and TD(1) stages of the model-free update into separate Q values, we split Otto et al.’s aggregate model-free weight βMF version into two variables, thereby also replacing their eligibility trace parameter λ which encodes the balance between the two updates and eliminating the (0,1) boundaries associated with that variable. Following estimation, we reconstruct the aggregate model-free weighting as βMF=βMF0+αβMF1, where the factor of α accounts for the difference in scaling between the two weights arising from the omission of α from the update equations.

For each participant, we estimated the free parameters of the model by maximizing the likelihood of her sequence of choices, jointly with group-level distributions over the entire population using an Expectation Maximization procedure (Huys et al., 2011) implemented in the Julia language (Bezanson et al., 2012). We extracted the per-subject model-based and model-free weightings βMB and βMF as indices of the strength of each sort of learning for further analysis of individual differences. Specifically, we used subject-level estimates of model-based and model-free learning from the computational model as dependent variables in regression analyses where clinical characteristics (i.e. questionnaire total scores and factors from factor analysis) were independent variables. The results of the full reinforcement-learning model mirrored that of the logistic regression analysis in almost every respect. The two differences were that when estimated using the computational model, the relationship between self-report OCD symptoms and goal-directed learning in Experiment 1 fell short of reaching significance at p<0.05 (Supplementary file 5A). The size of this effect was similar in Experiment 2, but with the benefit of an increased sample size was highly significant, indicating this was an issue of statistical power. Secondly, while Schizotypy did not reach significance as a predictor of model-based deficits using the regression model, it was a significant predictor model-based learning when computationally estimated. There were no relationships between self-report psychopathology and model-free learning defined using the computational model. A side-by-side comparison of the predictive power of model-based learning defined using the computational model versus one-trial back regression analysis is presented in Supplementary file 5B.

Decision letter

Michael J Frank

Reviewing Editor; Brown University, United States

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your work entitled "Compulsivity is a trans-diagnostic trait characterized by deficits in goal-directed control" for consideration by eLife. Your article has been reviewed by two peer reviewers, and the evaluation has been overseen by Reviewing Editor Michael Frank and Timothy Behrens as the Senior Editor. One of the two reviewers has agreed to reveal his identity:

Klaas Stephan.

The reviewers have discussed the reviews with one another and the Reviewing editor has drafted this decision to help you prepare a revised submission

Summary:

The authors conduct a comprehensive analysis from a large sample of subjects using questionnaire-based measures of clinical variables and two independent experiments using a sophisticated reinforcement learning task that dissociates goal-directed behavior from habitual stimulus-response learning. They report consistent demographic and clinical factors that underlie reductions in goal-directed behavior during learning. Supervised and unsupervised analysis of the link from questionnaire data to task performance points to compulsivity (and its various manifestations) as one key clinical factor that is related to reduced goal-directed behaviors.

Essential revisions:

Overall, the reviewers were impressed with the sophistication of the analysis and agreed that this study represents an important step toward large scale quantitative assessment of relevant phenotypes informed by computational cognitive neuroscience – that is, one of the main goals of computational psychiatry. They also agreed that the approach is original, exploits a large online sample (with appropriate controls for data quality), and is based on a systematic body of work, conducted by the authors over several years. However, they all expressed concerns with respect to the main take-home message of the manuscript that compulsivity is a trans-diagnostic factor that relates to deficits in model-based learning. This was particularly concerning given that other demographic factors (e.g., age) had even greater impact on the same measure of model-basedness, limiting the conclusions (and application) that one could garner from using this theoretically grounded construct clinically. Nevertheless, we all agree that the analysis is itself useful and sophisticated and we would like to see a revised manuscript that substantially tones down the main claim in the thrust of the motivation (including the title). We would be happy to consider a more nuanced, balanced and thorough (perhaps longer) characterization of the phenotype that is not as centrally focused on compulsivity, but rather presents a large scale analysis of factors that relate to MB vs. MF performance (and indeed other task measures not obviously MB per se, like the transition effect), including age, gender, IQ but also other clinical factors (impulsivity vs. compulsivity), whether these are independent or interactive factors, with discussions on whether they are likely to have similar or different mechanisms, etc. While this may seem like a major undertaking, we think that ultimately this kind of description can be more useful for showcasing the strength of your combination of theory-driven and data-driven approaches. Should you wish to maintain the stronger claim and message and focus on compulsivity we would suggest submitting elsewhere. Below these points are elaborated by comments from the individual reviewers, compiled together.

1) I've spent a substantial amount of time mulling over this paper, which really does have many strengths and is exemplary in many ways: a well-motivated task; applied to a large population; combined with an interesting methodology that makes an important step forwards in terms of relating neurobiological/cognitive mechanisms to psychopathology. The results are initially very intriguing – particularly those from the elastic net where impairments in goal-directed control seem to pick out symptoms of compulsivity and intrusive thoughts. However, on reading it more closely there are some important drawbacks which I think require the conclusions to be very significantly toned down; or additional analyses to substantiate them. This, in turn, might make it, in my view, more appropriate for less general journal. My major concerns are:

Age has much more of an effect than compulsivity – but OCD prevalence does not increase with age (e.g. Kessler et al. 2005, Arch Gen Psych 62(6):593-602). How can this be if goal-directed impairments underlie compulsivity in such a specific manner? The answer presumably is that goal-directedness depends on multiple processes, and that those related to compulsivity and age might in some way be dissociable. But does that not then make compulsivity a less specific guide to the underlying neurobiology? Isn't this also suggested by the fact that the relationship is, overall, quite weak: in the elastic net, the cross-validated correlation is 0.11? The temporal evolution of OCD decreasing with age also jars with the influence of age.

OCD, addiction, etc. are characterised by the positive presence of certain behaviors that bear the hallmark of 'habits'. Why does this not show up in the task? The possibility that the task seems to be insensitive to habitual variation (it never seems to show up in correlations despite the model-free prediction error regressors showing the strongest correlations with BOLD, i.e. neurobiology) somewhat questions the strong conclusions about compulsivity being specifically due to an impairment in goal-directed control: subjects could also have an impairment (excess?) in habitual learning (as one might conclude from excessive habitual behavior in e.g. Gillan et al., 2015 AJP), but this doesn't show up in the task because it's not sensitive. This again makes the conclusions they are drawing from the results just too strong. They state that the literature shows that deficits in model-based but not model-free decision-making has been found and cite Voon et al., Mol. Psych. 2014, but that study used the same task, hence not really addressing this point.

They make statements about patient populations but include neither patients nor any other measure by which functional impairment could be judged, and refer to diagnostic categories ('OCD', 'Alcohol addiction') despite not performing any diagnostic tests. The results, figures and discussion needs to avoid reference to diagnostic categories, and I find the term 'trans-diagnostic' difficult in the absence of any diagnosis. At the core if this is that it is unclear whether the results are driven by what one might observe in a typical patient population. One way to address this is to recapitulate the results only amongst those subjects with scores above cutoffs in any one measure, and then talk about 'putative patients' or so. We also need to know whether those subjects excluded based on performance were typical in terms of self-report.

There is no information about the stability of the effects over time, and hence the term trait is confusing. In fact, the covariates are mostly measures of state, not trait.

2) On closer inspection, the elastic net analysis is far less convincing than on reading the results – the strongest loadings (I tried to sort them in descending order from Table 3A):

I feel that there are good and bad numbers;

Am preoccupied with the thought of having fat on my body;

I vomit after I have eaten;

I check things more often than necessary;

Am terrified about being overweight;

Like my stomach to be empty;

My heart beats faster than usual.

With overall only two items from the OCI-R (the measure of OCD used), and neither of these is being significantly loaded onto by the compulsivity factor. The fact that so many eating disorder items show up certainly deserves some comment beyond it being just another compulsive phenotype, but overall this just doesn't quite capture 'compulsivity'. Only one out of the top 8 items has anything obvious to do with compulsivity (other than referring to a disease which they labelled as compulsive).

3) I do wonder about how overall severity contributes. This is important because severity is strongly related to comorbidity (see e.g. Kessler et al., 2005, in the same volume as above), and hence important for any trans-diagnostic processes. Half the questionnaires are correlated (and picked up by the compulsivity factor). The most severely ill patients might thus be most likely to respond positively on many compulsivity items. Could it be that the most severely impaired patients simply look compulsive because they are more likely to have more comorbid disorders and hence show up in the compulsive category?

4) In the FA, the first component doesn't contain anxiety at all. Anxiety loads much more on the second factor, and does so possibly even more than compulsivity: there are around 9 or 10 items that clearly relate to anxiety loading onto it, but only 2 items relating to compulsive behaviours. A number of the AUDIT variables are hard to relate to compulsions: alcoholics start drinking early as they experience withdrawal symptoms after a night of sleep. If anything, this component is more related to obsessions, anxious worries and difficulties controlling thoughts – which is, in terms of constructs, much closer to goal-directed deficits, it seems to me.

5) The task itself isn't obviously specific as it is not clear what the model-free component quite captures. This makes it more of a shame they didn't test components we know impact on m-b choices, such as working memory or stress. Impairments in this are also 'trans-diagnostic', and it would have been nice to show that they don't have the specificity of g-d choices.

6) Both reviewers expressed concerns about the explanatory power (of excessive habit formation due to deficient model-based control) for understanding clinical aspects of compulsivity. As you outlined in the Introduction, a key motivation for studying the relation between model-based /goal-directed decision-making and compulsive symptoms is the notion that "a deficit in deliberative, goal-directed control may leave individuals vulnerable to rely excessively on forming more rigid habits". I understand why this is a straightforward and attractive perspective to explain certain aspects of compulsivity. However, I think it would also be appropriate to mention challenges and potential limitations of this perspective in the Discussion – particularly because the dimensional approach chosen here suggests applicability of the proposed mechanism to clinical phenomena. For example, how exactly would a putative deficit in model-based control lead to prominent symptoms in OCD, such as excessive checking, fear of germs, or desire for order? The nomenclature and with it the framing need quite some work, e.g. categorical/dimensional measures, in terms of state/trait distinction, and distinctions between compulsions and obsessions.

7) The paper is very well written and of beautiful simplicity – a pleasure to read. However, sometimes a few more technical details or conceptual distinctions may have to be included in the main text to avoid confusion. First, the Introduction repeatedly refers to unspecified "OCD symptoms" which I found confusing, given that the paper is about the general population and that numerous symptoms of OCD exist. I would recommend avoiding the clinical label OCD and referring to compulsivity instead, stating the specific questionnaire you used. Similarly, in the Results section (second paragraph), there is a tension between using trait labels (impulsivity, compulsivity) and diagnostic labels (eating disorders, alcohol addiction); the latter is confusing (and not quite appropriate), given that your study examines the general population. You could eliminate this tension and, at the same time, increase clarity by always referring to the scores of the respective questionnaires. Second, the Results section should define the measure of model-based learning used (first paragraph). Until I went through the Methods section, I was not sure how exactly model-based learning was operationalised, and whether you were referring to a behavioural readout or to the parameter estimates of a computational model.

8) You report analyses based on behavioural readouts (trial-by-trial stay/switch behaviour), not model parameter estimates, because the qualitative conclusions drawn from both types of analyses seemed to be almost equivalent. Does this also hold with regard to how well questionnaire scores can be predicted, or does the computational model have a competitive advantage there? It would be instructive for the technically interested reader if you could include estimates of predictive accuracy for both approaches, perhaps in the supplementary material.

9) In the subsection “Quantifying Model-based Learning (Logistic Regression)”, second paragraph: The significant main effect of Transition is very interesting. Could you please state the direction of this effect and perhaps even offer a (speculative) interpretation? This is another place in which a more thorough analysis of the factors on both sides (task measures and demographic/clinical variables) can be useful.

Author response

Essential revisions: Overall, the reviewers were impressed with the sophistication of the analysis and agreed that this study represents an important step toward large scale quantitative assessment of relevant phenotypes informed by computational cognitive neuroscience –

that is, one of the main goals of computational psychiatry. […] Below these points are elaborated by comments from the individual reviewers, compiled together.

We thank the reviewers and Reviewing Editor for their comments. We have made substantial changes to the manuscript in light of the issues raised by the reviewers. The main changes are summarized here:

1) We have changed the title, Abstract and body, making clear that the symptom dimension related to failures in goal-directed control is characterized by both compulsive behavior and associated repetitive thoughts (i.e. not just behavior), and more generally making clear that our goal is to delineate the scope of the psychiatric phenotype associated with failures of goal directed control, including or beyond compulsivity. Related to this issue, we speculate in more detail about the functional relationship between compulsions and obsessions in the Discussion, drawing on recent empirical work (Cushman and Morris, 2015 and Gillan et al., 2014).

2) We have made it more explicit throughout the manuscript that we are not studying diagnosed patients, but rather normal variation in self-report symptomatology. We do this by clearly referring to ‘self-report questionnaires’ at all points where we evoke the name a clinical diagnostic category such as ‘OCD’. Similarly, we make clear that we do not have any evidence for state vs. trait-dependence of these symptom dimensions. We have eliminated the word ‘trait’ from the manuscript (except when describing the ‘trait anxiety’ questionnaire). We also omit the term “trans-diagnostic.”

3) We have moved the age, IQ and gender findings to the main Results section and talk about these in more detail in the Discussion, thereby providing a more comprehensive analysis of what demographic (as well as psychiatric) features relate to task performance.

4) We have responded to all analytical queries, providing supplementary summary data, figures and statistics where appropriate.

We think that these changes deliver on the suggestion of a more balanced and broader exploration of the phenotype. That said, we do retain some of the framing and emphasis (though toned down) on the notion of compulsivity and its relationship to goal-directed control as the initial idea that launches our study.

This is for several reasons:

1) Several of the reviewers’ concerns with respect to how appropriate it was to label Factor 2 ‘compulsivity’ arose from problems with how we presented the results tables, which we believe led reviewers to underappreciate the strength of the evidence. Specifically, in the prior submission we displayed just the top 6 items from each questionnaire that loaded on a given factor. This evidently led one reviewer to believe certain items from the OCD scale did not load at all on Factor 2, when in fact they were amongst the highest loading items of all items (just not in the top 6). We have changed our presentation approach to avoid this misunderstanding and believe this resolves several of the reviewers’ concerns. There was a similar problem with the table presenting the results of the regularized regression, which we have corrected.

2) That being said, we agree with the reviewers that the label ‘compulsivity’ was not apt to describe Factor 2, because thought processes also contributed significantly to this factor. This should rightly be seen as a novel, positive finding of the study. Not directly mentioning these in the label, may serve to inappropriately diminish their contribution and this was not our intention. We have renamed the factor ‘Compulsive Behavior and Thought’, to reflect the tight coupling between compulsive behaviors and cognitive processes (i.e. interpretations, obsessions, preoccupations, unusual beliefs). We struggled somewhat with the nomenclature, which (as we say in the Discussion) is obviously subjective but also necessary simply for communication. If the appearance of the term “compulsive” in this more expansive and tentative label remains unsatisfactory we are happy to take further advice.

3) The design choices for our study (including things like the choice of questionnaires and the power analysis/sample size) are all driven by, and really only sensible given our specific motivating hypothesis. That hypothesis, while admittedly imperfect, is based on the intersection of: (i) a consistent body of our own work published over the past 5 years and (ii) recently highlighted issues with our current psychiatric taxonomy.

4) Relatedly, while we give more attention in both Results and Discussion to the additional findings regarding age and IQ, we think that nevertheless retaining greater emphasis to the psychiatric features is appropriate given that these are the novel findings. Age did not have a greater impact on model-based learning than Factor 2 (relevant data detailed below): age and Factor 2 were statistically equivalent. While we can appreciate that age and IQ effects may be of interest to the reader, these effects have both been previously reported in the literature (Eppinger et al., 2013; Schad et al., 2014). We now report these findings in the main Results section and discuss them in more detail in the Discussion section.

Below are direct responses to all reviewer comments, including those summarized above.

1) I've spent a substantial amount of time mulling over this paper, which really does have many strengths and is exemplary in many ways: a well-motivated task; applied to a large population; combined with an interesting methodology that makes an important step forwards in terms of relating neurobiological/cognitive mechanisms to psychopathology. The results are initially very intriguing –

particularly those from the elastic net where impairments in goal-directed control seem to pick out symptoms of compulsivity and intrusive thoughts. However, on reading it more closely there are some important drawbacks which I think require the conclusions to be very significantly toned down; or additional analyses to substantiate them. This, in turn, might make it, in my view, more appropriate for less general journal. My major concerns are: Age has much more of an effect than compulsivity –

but OCD prevalence does not increase with age (e.g. Kessler et al. 2005, Arch Gen Psych 62(6):593-602). How can this be if goal-directed impairments underlie compulsivity in such a specific manner? The answer presumably is that goal-directedness depends on multiple processes, and that those related to compulsivity and age might in some way be dissociable. But does that not then make compulsivity a less specific guide to the underlying neurobiology? Isn't this also suggested by the fact that the relationship is, overall, quite weak: in the elastic net, the cross-validated correlation is 0.11? The temporal evolution of OCD decreasing with age also jars with the influence of age.

We apologize that we believe our presentation caused some confusion here. Age did not have more of an effect on model-based learning than compulsivity (e.g. βAge = -0.049 vs. βCompulsivity = -0.046, difference not significant at p=.83 from mixed effects model), they were statistically equivalent. Nonetheless, the point regarding specificity is an important one and we have clarified our stance in the paper and toned down some of our language to make clear what we mean by ‘specificity’. It was not our intention to suggest that goal-directed deficits are exclusively characteristic or solely causal of compulsivity, and indeed previously published studies have already shown that age, gender and IQ relate to performance on this task (Eppinger et al., 2013; Schad et al., 2014). Our experiment was designed to test if, taking into account these other potentially confounding factors, goal- directed deficits are specific to one well-delineated aspect of psychopathology (e.g., compulsive symptoms or some other construct with some larger scope) compared to others (i.e. depressive or anxious symptoms). This is an important question because in psychiatry, deficits on cognitive tasks are generally found to be non-specific, i.e. observed in multiple diagnosed patient groups.

The reviewers’ point regarding the temporal evolution of OCD is interesting. Indeed, consistent with Kessler et al., OCD symptoms also lessen with age in our sample (as do scores for all psychopathologies). We now mention this in the Discussion and weave it in with our existing discussion regarding multiple processes that may contribute to model-based learning.

We also now report the relationship between psychopathology scores and age, gender and IQ (Supplementary file 1A).

OCD, addiction, etc. are characterised by the positive presence of certain behaviors that bear the hallmark of 'habits'. Why does this not show up in the task? The possibility that the task seems to be insensitive to habitual variation (it never seems to show up in correlations despite the model-free prediction error regressors showing the strongest correlations with BOLD, i.e. neurobiology) somewhat questions the strong conclusions about compulsivity being specifically due to an impairment in goal-directed control: subjects could also have an impairment (excess?) in habitual learning (as one might conclude from excessive habitual behavior in e.g. Gillan et al., 2015 AJP), but this doesn't show up in the task because it's not sensitive. This again makes the conclusions they are drawing from the results just too strong. They state that the literature shows that deficits in model-based but not model-free decision-making has been found and cite Voon et al., Mol. Psych. 2014, but that study used the same task, hence not really addressing this point.

We agree that we cannot completely rule out the possibility that excessive habit learning may also contribute to compulsive behaviors, as the model-free component of the task is evidently not very sensitive to this, which we make clear in the Discussion. (Though note that the model-free portion of the task does detect relationships with Age, IQ, and gender in the current study, suggesting it is not entirely powerless to detect individual differences in a sample of this size.) However, based on converging data from prior studies, we believe that the most parsimonious explanation for devaluation failures in OCD is deficits in goal-directed control.

The data from the Gillan et al. (2015) paper showed that OCD patients perform habitually in a devaluation test, but this does not provide preferential evidence for excess in habit learning over deficits in goal directed control – this is ambiguous in all devaluation tests. Crucially, the study showed that devaluation failures were associated with dysfunction in a goal-directed structure (the caudate), not for example the putamen, which has been associated with stimulus-response habit learning. While we acknowledge that this is indirect evidence, it converges on the notion that it is goal-directed deficits that are associated with excessive habit-forming in OCD patients. We highlight this observation in the Discussion and also comment on the need for future work to develop a viable marker of individual differences in slow stimulus-response learning that is not confounded with goal- directed control.

They make statements about patient populations but include neither patients nor any other measure by which functional impairment could be judged, and refer to diagnostic categories ('OCD', 'Alcohol addiction') despite not performing any diagnostic tests. The results, figures and discussion needs to avoid reference to diagnostic categories, and I find the term 'trans-diagnostic' difficult in the absence of any diagnosis. At the core if this is that it is unclear whether the results are driven by what one might observe in a typical patient population. One way to address this is to recapitulate the results only amongst those subjects with scores above cutoffs in any one measure, and then talk about 'putative patients' or so.

We thank the reviewer for this important point, which we have addressed in two key ways. First, we have carefully omitted any sign of claims of this sort from the paper. While we do of course hope that our results extend to diagnosed patient populations, we acknowledge that we cannot speak to this directly in the present study. We have changed the text throughout to make it clearer that we are talking exclusively about variation of self-report symptoms in the general population; we specifically omit the term “trans-diagnostic”.

For the same reason, we think it is probably inappropriate to even speculate about “putative patients” in the way the reviewer suggests. That said, at the reviewer’s request, we repeated our key analyses using (i) subjects in the top 25% of the population in terms of symptom severity, ‘probably patients’ and (ii) using subjects in the ‘normal range’, i.e. the bottom 75%. Results of these analyses are presented in Author response image 1, but we have chosen not to include them in the revised manuscript for precisely the reason stated by the reviewer. We do not know which subjects in this study had psychiatric diagnoses and which did not. Therefore, sub-setting our data to create artificial groups (i.e. ‘putative patients’) is not sensible and may mislead the reader. If the reviewer feels strongly about this point, we are open to including this analysis in the paper, but it is our preference not to.

For the reviewers’ interest, the analysis of ‘putative patients’:

The slopes of the regression lines were broadly consistent across all analyses, such that the relationship between model-based deficits and Factor 2 were observed both in individuals reporting the most severe symptoms and those in the normal range (see Author response image 1). In all 9 analyses with individuals in the ‘normal range’, the relationship between Factor 2 and model-based deficits were significant at p<.05. In 5/9 analyses with ‘probably patients’ who were in the top 25% of symptom severity, the relationship between Factor 2 and model-based deficits were significant at p<.05. This analysis had just ¼ of the total sample and was therefore severely underpowered. But nonetheless, the direction and slope of the effect were consistent across the board, providing evidence to suggest that these relationships will likely generalize to patient populations.

Plotted are the regression lines for 18 different analyses, where the population was split into the top 25% and bottom 75% for each of the nine clinical questionnaires.

To answer a related question (i.e. whether the effects might be driven by discontinuous effects among only the sickest people) we tested the extent to which our effects are truly dimensional, i.e. linear across the range of severity observed in the study, we fitted ‘Factor 2’ including both a quadratic and linear interaction term to our mixed effects model. The quadratic term was not significant (β=-.0016, p=.822), while the linear term was still highly significant (β=-.045, p=.001), indicating that our results can be best described as linear. We include this in the paper (Results, fifth paragraph).

We also need to know whether those subjects excluded based on performance were typical in terms of self-report.

In study 1, we found that those subjects who were excluded had lower symptoms of OCD (t(604)=2.477, p=.014), trait anxiety (t(604)=2.225, p=.0265), and a trend towards lower levels of depression (t(604)=1.799, p=.073). These differences were not observed in Study 2, where all questionnaire total scores were equivalent across groups, all p<.05. For both Experiment 1 and 2, results presented in this paper are not altered by the inclusion of these subjects in the analyses (and indeed are slightly stronger when these subjects are included) (Subsection “Exclusion Criteria”, last paragraph).

There is no information about the stability of the effects over time, and hence the term trait is confusing. In fact, the covariates are mostly measures of state, not trait.

We agree and have omitted all references to ‘trait’ throughout the manuscript.

2) On closer inspection, the elastic net analysis is far less convincing than on reading the results – the strongest loadings (I tried to sort them in descending order from Table 3A):

I feel that there are good and bad numbers; Am preoccupied with the thought of having fat on my body; I vomit after I have eaten; I check things more often than necessary; Am terrified about being overweight; Like my stomach to be empty; My heart beats faster than usual.

With overall only two items from the OCI-R (the measure of OCD used), and neither of these is being significantly loaded onto by the compulsivity factor.

We apologize that this was not clear. The reviewer is referring to the last column of Supplementary file 3A, which are not the betas for the regularized regression – these are the loadings from the FA. The questions are already presented in descending order of importance for the regularized regression in this table, based on the Beta column (which is relevant to this analysis). We have taken steps to prevent this kind of confusion in future.

In our previous submission, we reported the FA loadings in the last column of Supplementary file 3A to facilitate cross-analysis comparison, but we see now that their prominence in the table could lead the reader to think they were results from the regression. We have changed the ‘loading’ column so that it now no longer reports the numerical loadings on compulsivity, but instead indicates whether or not there was overlap on any of the factors from factor analysis. Specifically, we use F1, F2 or F3 to indicate whether an item in the regularized regression also loaded above a threshold (>.25) onto each of the factors. It should now be clear that of the negative predictors of goal-directed performance, the overlap with Factor 2 (F2) is substantial at 75% (15/20), compared to F2, ‘anxious-depression’ (overlap 10%, 2/20), or F3 ‘social withdrawal’ (overlap 10%, 2/20).

The two items from the OCI-R that the reviewer has flagged above do in fact load substantially on the compulsivity factor, as do all of the OCD items: “I feel that there are good and bad numbers” (loading on ‘compulsivity’ = 0.52), as does “I check things more often than is necessary” (loading on ‘compulsivity’ = 0.47).

We believe there is some misunderstanding here due to another similar formatting problem, for which we apologize. In our previous submission, we provided just a snapshot of the 6 top loadings from each questionnaire in Supplementary file 2B, but we see that this led the reviewer to erroneously conclude that certain items did not load at all on Factor 2. We apologize for the confusion and now present the results in descending order regardless of questionnaire or origin so that the main contributors to each factor are plainly seen. We believe this makes it clear that all OCD items loaded similarly and very strongly on Factor 2.

The fact that so many eating disorder items show up certainly deserves some comment beyond it being just another compulsive phenotype, but overall this just doesn't quite capture 'compulsivity'. Only one out of the top 8 items has anything obvious to do with compulsivity (other than referring to a disease which they labelled as compulsive).

There are four eating disorder items in the regularized regression result and two OCD items. We do not think this difference is particularly notable for several reasons. This difference is likely due to the fact that the eating disorders questionnaire addresses a greater number of putatively discrete DSM disorders compared to the OCD questionnaire. DSM-5 defines OCD as a unitary disorder and the OCI questionnaire records severity of this disorder. The EAT scale addresses at least three different disorders, which the clinician must consider in the context of differential diagnosis: Anorexia Nervosa, Bulimia Nervosa, and Binge Eating Disorder. Moreover, the eating disorder questionnaire also simply has more items (n=26) compared to the OCD questionnaire (n=18) and the addiction scale (n=10).

The suggestion that some disorders are ‘compulsive’ comes from prior work upon which our hypotheses were based. We do not make any original claims about certain disorders being compulsive here, and labeling cannot be construed as circular. As we cite in the introduction, previous work has suggested that OCD, eating disorders and disorders of addiction are characterized by compulsivity, in that patients feel compelled to perform behaviors (i.e. they are urge-driven), which are repetitive and relatively insensitive to negative consequences (e.g. Voon et al., 2014; Godier & Part 2014; Everitt & Robbins, 2005).

Nonetheless, we agree with the reviewers’ more general concern regarding the specificity of the results to behavioral compulsivity. Indeed, we did not anticipate that the repetitive thoughts that accompany these compulsions would load so tightly with the behaviors. This is an interesting and novel finding and in response to this and other comments, we have relabeled the factor ‘Compulsive Behavior and Thought’ and discuss this relationship in more detail in the Discussion.

3) Finally, I do wonder about how overall severity contributes. This is important because severity is strongly related to comorbidity (see e.g. Kessler et al., 2005, in the same volume as above), and hence important for any trans-diagnostic processes. Half the questionnaires are correlated (and picked up by the compulsivity factor). The most severely ill patients might thus be most likely to respond positively on many compulsivity items. Could it be that the most severely impaired patients simply look compulsive because they are more likely to have more comorbid disorders and hence show up in the compulsive category?

Although an interesting suggestion, a severity hypothesis does not sit with several observations in our study.

Firstly, one of the reasons that assessing the specificity of this effect is so crucial is because it can refute exactly the kind of general severity confound that the reviewer suggests. For example, individuals reporting the most severe depression symptoms in our sample are no more impaired on model-based learning than those with no depression symptoms. In other words, leaving aside the ‘compulsive behavior and thought’ factor for a moment, the basic effects on the total scores of questionnaires show the basic pattern of specificity we later formalize.

Second, depression and anxiety are much more common compared to compulsive disorders (as in Kessler et al., 2005). This means that individuals with more comorbidities would be expected to show depression more often than compulsive symptoms. This again does not sit with the view that the more severe patients are preferentially picked up by the compulsive factor.

Third, although Factor 3 included contributions from fewer questions overall compared to the other factors, factor 1 and 2 had a similar number of loadings. In other words, Factor 2 did not tap into a greater number of symptoms compared to Factor 1.

Fourth, the Mechanical Turk population have been reported to have a significantly higher rates of social anxiety relative to the general population – 7x the 12-month prevalence reported by Kessler et al., 2005 (Shapiro et al., 2013, Clinical Psychological Science). Social anxiety was marginally associated with better goal-directed control, not worse.

4) In the FA, the first component doesn't contain anxiety at all. Anxiety loads much more on the second factor, and does so possibly even more than compulsivity: there are around 9 or 10 items that clearly relate to anxiety loading onto it, but only 2 items relating to compulsive behaviours.

We apologize that the way we have displayed the loadings in the Supplementary file 2A (much like above for file 2B) was unclear. As described above, we now display the loadings in descending order and we believe this resolves the reviewer’s concern. Anxiety does not load more on Factor 2 than Factor 1. Empirically, of all the items that loaded onto Factor 2 (‘Compulsivity and Related Cognitions’) at >0.25, just 6/87 (7%) are from the STAI-T (trait anxiety inventory), whereas for the Anxious-Depressive factor, 18/79 (23%) are from the STAI-T. Taking into account the total number of STAI-T items available, 90% of these items loaded onto the Anxious-Depressive factor, while just 30% loaded onto the Compulsivity factor. The mean loading of the STAI-T items onto the Anxious-Depressive factor is 0.52, while the mean loading onto compulsivity is 0.15 (difference is significant at p<.001, Supplementary file 2D). Finally, the highest average loadings for the Depressive-Anxious factor came from the Trait Anxiety questionnaire (M=0.52, SD=0.17), followed by Apathy (M=0.44, SD=0.16) and Depression (M=0.38, SD=0.23). The data are unequivocal: items from the trait anxiety questionnaire loaded more onto Factor 1 compared to Factor 2.

We now include a more detailed characterization, including many of these descriptive statistics in the Materials and methods section.

To make this clearer, we also now include what was formerly Supplementary file 2D as a table in the main manuscript (now Table 2) and we refer the reader there multiple times.

A number of the AUDIT variables are hard to relate to compulsions: alcoholics start drinking early as they experience withdrawal symptoms after a night of sleep. If anything, this component is more related to obsessions, anxious worries and difficulties controlling thoughts – which is, in terms of constructs, much closer to goal-directed deficits, it seems to me.

While we acknowledge that the labeling of Factors is a subjective process, our position is that persistent use of alcohol despite adverse consequences is a compulsive behavior and that all AUDIT items that are indicators of the severity of alcohol addiction and are thus are indicators of compulsion. Experiencing withdrawal is one such indicator, in that the extent to which this is experienced marks the severity of addiction.

Nonetheless, in response to a number of comments received on this issue, we have relabeled this factor as ‘Compulsive Behavior and Thought’. Again, if the appearance of the term “compulsive”, even in this deliberately more inclusive rephrasing remains problematic, we are open to advice.

5) The task itself isn't obviously specific as it is not clear what the model-free component quite captures. This makes it more of a shame they didn't test components we know impact on m-b choices, such as working memory or stress. Impairments in this are also 'trans-diagnostic', and it would have been nice to show that they don't have the specificity of g-d choices.

While we cannot rule out a role for stress on model-based learning, we believe that the specificity of our effect to compulsive phenotypes speaks to this in some sense. I.e. the lack of association between ‘Anxious-Depression’ and model-based learning suggests that a general stress mechanism is unlikely. We unfortunately cannot rule out a possible role for working memory in the effects reported, and think this is a plausible hypothesis that warrants further investigation. We highlight this as a target for future research in the Discussion.

6) Both reviewers expressed concerns about the explanatory power (of excessive habit formation due to deficient model-based control) for understanding clinical aspects of compulsivity. As you outlined in the Introduction, a key motivation for studying the relation between model-based /goal-directed decision-making and compulsive symptoms is the notion that "a deficit in deliberative, goal-directed control may leave individuals vulnerable to rely excessively on forming more rigid habits". I understand why this is a straightforward and attractive perspective to explain certain aspects of compulsivity. However, I think it would also be appropriate to mention challenges and potential limitations of this perspective in the Discussion – particularly because the dimensional approach chosen here suggests applicability of the proposed mechanism to clinical phenomena. For example, how exactly would a putative deficit in model-based control lead to prominent symptoms in OCD, such as excessive checking, fear of germs, or desire for order?

This is an important issue and we now include a paragraph detailing our speculations on this issue and refer the reader to a more detailed exposition published previously (Gillan and Robbins, 2014), along with more recent data illustrating how habits of thought might arise via a similar mechanism (Discussion, seventh paragraph).

The nomenclature and with it the framing need quite some work, e.g. categorical/dimensional measures, in terms of state/trait distinction, and distinctions between compulsions and obsessions.

We have addressed each of these points in earlier responses. We make clear that we are dealing with dimensional measures, that we do not have any particular stance or evidence for state vs. trait-dependence of these symptom dimensions and include more detailed commentary on our findings regarding the tight relationship between repetitive compulsive behaviors and the associated cognitions.

7) The paper is very well written and of beautiful simplicity – a pleasure to read. However, sometimes a few more technical details or conceptual distinctions may have to be included in the main text to avoid confusion. First, the Introduction repeatedly refers to unspecified "OCD symptoms" which I found confusing, given that the paper is about the general population and that numerous symptoms of OCD exist. I would recommend avoiding the clinical label OCD and referring to compulsivity instead, stating the specific questionnaire you used. Similarly, in the Results section (second paragraph), there is a tension between using trait labels (impulsivity, compulsivity) and diagnostic labels (eating disorders, alcohol addiction); the latter is confusing (and not quite appropriate), given that your study examines the general population. You could eliminate this tension and, at the same time, increase clarity by always referring to the scores of the respective questionnaires.

Given there are nine different questionnaires, with acronyms, we feel it is still beneficial to refer for example to ‘eating disorders’ rather than EAT-26. But we now do so in longer form at every instance to eliminate any possibility of confusion, while still allowing the reader to easily digest the material e.g. “total scores on self-report measures of eating disorder severity, impulsivity and alcohol addiction”.

Second, the Results section should define the measure of model-based learning used (first paragraph). Until I went through the Methods section, I was not sure how exactly model-based learning was operationalised, and whether you were referring to a behavioural readout or to the parameter estimates of a computational model.

We have made this clearer upfront, and also refer the reader to the Materials and methods section for a full description.

8) You report analyses based on behavioural readouts (trial-by-trial stay/switch behaviour), not model parameter estimates, because the qualitative conclusions drawn from both types of analyses seemed to be almost equivalent. Does this also hold with regard to how well questionnaire scores can be predicted, or does the computational model have a competitive advantage there? It would be instructive for the technically interested reader if you could include estimates of predictive accuracy for both approaches, perhaps in the supplementary material.

The correlation between the two estimates of model-based learning is r=.87. The difference between the two approaches in terms of capturing psychopathology is therefore necessarily negligible and we therefore do not think it is worthwhile to include this is the manuscript (except for reporting this correlation coefficient: subsection “Supervised Analysis”). We are fine with being overruled on this, but it is our opinion that this analysis does not make sense given the high correlation.

The requested analysis: we tested the extent to which total scores on all 9 questionnaires and the three factors from our factor analysis could be predicted by model-based learning from the one trial back regression versus the full computational model. Prior to conducting these analyses, we regressed out the effects of age, gender and IQ so that we could directly compare the r2 of the models. The difference between the two approaches is negligible. However, the computational model did produce nominally higher r2 and lower p-values for the relationship between clinical scores and model-based learning.

Author response table 1

Each row reflects the results from an independent analysis where each questionnaire total score (z-transformed) was entered as SymptomScorez in the following model: lm(SymptomScorez ~ Agez + Genderz+ IQz + ModelBasedScore). ModelBasedScore was derived from from the one-trial back regression (first three columns) or the computational model (last three columns results). For each, positive β values indicate that the ModelBasedScore is associated with fewer symptoms, whereas negative β values indicate that the symptom score is associated with increased symptoms.

9) In the subsection “Quantifying Model-based Learning (Logistic Regression)”, second paragraph: The significant main effect of Transition is very interesting. Could you please state the direction of this effect and perhaps even offer a (speculative) interpretation? This is another place in which a more thorough analysis of the factors on both sides (task measures and demographic/clinical variables) can be useful.

We now state the direction of this effect and state that this is likely due to small biases due to un-modeled structure in the data that is more correctly captured in the full model fit.

Acknowledgements

Ethics

Human subjects: Participants provided their consent online after reading the study information and consent language in accordance with procedures approved by the New York University Committee on Activates Involving Human Subjects.

eLife is a non-profit organisation inspired by research funders and led by scientists. Our mission is to help scientists accelerate discovery by operating a platform for research communication that encourages and recognises the most responsible behaviours in science.eLife Sciences Publications, Ltd is a limited liability non-profit non-stock corporation incorporated in the State of Delaware, USA, with company number 5030732, and is registered in the UK with company number FC030576 and branch number BR015634 at the address:
eLife Sciences Publications, Ltd
Westbrook Centre, Milton Road
Cambridge CB4 1YG
UK