There has been some discussion of our article on social media. In particular, some people have noted that the principal authors of the original article (Cole and Fredrickson) have replied with a 550-word letter (which I guess is all they were allowed; they have more material, as shown below) claiming that our article contains multiple errors and is, hence, invalid. So I'm writing this blog post to put our side of the story and explain why we feel that our article is correct, and that Cole and Fredrickson have not made a dent in it. I was a little disappointed that Dr. Martin Seligman, on the APA
Friends-of-PP mailing list, chose to describe our article as a "hatchet
job". We believe that we have identified a number of serious scientific
problems with Fredrickson et al.'s original article, which are not
adequately addressed either by Cole and Fredrickson's published letter,
or by their more extensive unpublished analysis.

I will address each of the principal points of Cole and Fredrickson's response to our article in turn (although not in the exact sequence in which they appeared in their letter). Before I begin, though, I want to apologise for the length and complex nature of some of the points I will be making here (which is unfortunately necessary, as most of the issues under discussion here are quite technical). This is particularly true of the section entitled "Bitmapping?" below, which might appear, to the reader who has tried to struggle through our article and SI, and Cole and Fredrickson's letter and additional analysis, to be not much more than a case of "he said/she said". I note, however, that each of the major issues that we raised in our article is sufficient, on its own, to render Fredrickson et al.'s results meaningless. These major issues are:
- The MHC-SF psychometric scale does not measure hedonic versus eudaimonic well-being
- Fredrickson et al.'s regression procedure produces mostly spurious correlations, even with random psychometric data
- The errors in Fredrickson et al.'s dataset directly invalidate their published numerical results

MHC-SF factor analysis

Cole and Fredrickson criticise us for attempting to perform factor analyses on the MHC-SF psychometric with such a small sample size. It is interesting to contrast this with Cole and Fredrickson's 2013 letter to PNAS, in reply to Coyne's criticism of the high degree of intercorrelation between their "hedonic" and "eudaimonic" factors, in which they describe how they themselves performed exploratory and confirmatory factor analyses on exactly the same data, apparently claiming to have found the hedonic/eudaimonic factor pair with a very good model fit. (A p-value of < .0001 is offered, but without sufficient context to establish to what exactly it refers; however, the message seems clear: we did EFA and CFA and obtained a fantastic model.) We attempted to reproduce this, but were unable to do so; indeed, we noted ourselves in our article that the sample size was an issue here. But the only reason we were doing this was in an attempt to replace the factor analyses that Cole and Fredrickson claimed, in their 2013 letter, to have performed. We look forward to seeing the results of those analyses, which have so far not been published.

Still on the factor analysis, Cole and Fredrickson claim that their assumption of a hedonic/eudaimonic split for the MHC-SF scale is supported by three references from Fredrickson et al.'s article. We have examined these references (for your convenience, they are here, here, and here) and have not found any point at which any of them supports this claim, or indeed makes any statements at all about the factor structure of the MHC-SF. Please feel free to check this yourself, and if you find such a discussion, let me know the page number. In the meantime, the claim of a two-factor, hedonic/eudaimonic split for the MHC-SF seems to be supported by no published evidence. (However, there has been plenty of reporting in the literature of a clear three-factor structure, e.g. here and here and here.)

Again, we stand by our analysis: Fredrickson et al.'s claim for a hedonic/eudaimonic factor split of the MHC-SF is not supported by theory, nor by the data, nor by historical studies. The factor structure of the MHC-SF that emerges from Fredrickson et al.'s dataset is unclear, but of the possible two-factor structures, the one that we described in our article and SI (i.e., "personal well-being" and "evaluative perception of the social environment") is a considerably better fit to the data in all respects than Fredrickson et al.'s claimed hedonic/eudaimonic split. The only structure that has been documented for the MHC-SF in prior published work is a three-factor structure corresponding to its three subscales, as designed by Keyes.

Bitmapping?

The "bitmapping"
operation to which Cole and Fredrickson devote part of their letter (and
most of their additional analysis document) is merely an artifact of
the way in which our R program loops over all possible combinations of
the 14 MHC-SF psychometric items into two factors. There are many ways
in which we could have done this that do not involve the programming
technique of converting an integer into a bitmap. Indeed, the inclusion
in our SI document of the brief mention of how our outer loop works (the
inner loop does the regressions, using Fredrickson et al.'s exact
parameters) is arguably slightly redundant, but we included it to
facilitate the understanding of our code, should someone wish to
undertake a reproduction of our results.

Cole and
Fredrickson's analysis seems mainly aimed at demonstrating that our
"bitmapping" technique is an inadequate way to resample from a dataset.
We agree. We never suggested that it was a way to perform resampling. We
are not even sure how it could. We are not performing any resampling,
bootstrapping, or any other form of Type 1 error reduction. Our program
simply generates every possible factor combination of the psychometric
data and determines whether or not it appears to show an effect, using
Fredrickson et al.'s own regression procedure. The results of this
procedure demonstrates that, no matter how the data are sliced or diced,
Fredrickson et al.'s regression procedure will generate apparently
statistically significant results in the majority of cases; indeed, in
most of those cases, it will appear to show effect sizes larger than
those found by Fredrickson et al.

The graphs in our SI
document (Figures 7-11) plot the results obtained by iterating over all
possible two-factor combinations of several forms of psychometric data:
Fredrickson et al.'s actual data, assorted random numbers, etc. We are
not completely sure what Cole and Fredrickson think that these graphs
show. To be clear: they show all the possible "effects" (relationships
between psychometric "factors" and gene expression values) results that
Fredrickson et al. could have obtained, had they chosen another factor
split of their data from the MHC-SF scale than the one that they did
choose. Figure 7, in particular, uses the real psychometric data to
show that most of the possible factor combinations would have produced
effects greater in magnitude than the ones that Fredrickson et al.
claimed to show that their "Hedonic/Eudaimonic" split were associated
(presumably uniquely) with differential gene expression.

Why, then, does this procedure continue to produce apparently significant results even when the psychometric data are replaced with uniformly-distributed random numbers (aka "white noise")? We believe that this is due to strong correlations within the gene data.
As shown by Neuroskeptic, this leads to an enormous false-positive rate. Thus, when Fredrickson et al. ran their
regression procedure (which we called "RR53") and averaged the resulting
correlation coefficients, they were making the elementary mistake of
running a t-test on a set of non-independent observations.

Incidentally, there is an alternative way of doing the regression analysis. Fredrickson et al. regressed each individual gene on Hed/Eud (and some control variables), collected the 53 coefficients per IV, and averaged them; this is what we called the "RR53" procedure. The alternative is to average the gene expression values and regress this average on Hed/Eud. We had noticed that this gave non-significant results. Then, just after the PNAS window for updating our supplementary information document closed, a colleague --- who, I believe, wishes to remain anonymous --- pointed out that using this alternate method, the apparent effect sizes are exactly the same as the ones "found" by RR53. Only the p-values are different. We believe this is because, when the RR53 procedure picks up the regression coefficients of the individual genes to analyse them, it conveniently loses the associated confidence interval (almost all of these coefficients are associated with non-significant t-tests or model ANOVA) and re-inserts them into the mix as if they were perfect fresh data from a measuring instrument, whereas in fact they are almost all carrying an amount of "noise" that makes them highly unreliable.

We
have made many of our materials available online, including our
program's source code, and we are happy to share our other files (some
of which are quite voluminous) on request, or to answer any specific
questions that anybody might have about how our program works. (We could
have reproduced this work with SPSS, but it would have taken an awfully
long time.)

Thus, we stand by our analysis: Fredrickson et
al.'s regression procedure is guaranteed to produce huge numbers of
spurious "effects" which have no psychological or physiological meaning.

Issues with the dataset

Finally, Cole and Fredrickson claim that they have recently reproduced the same numerical results as their 2013 study with a new sample. I will leave aside, for now, the question of how meaningful it is to show that a procedure which has been criticised (by us) for producing invalid results can be "shown" to be correct if it produces much the same results a second time; perhaps there is some validity in having two data points rather than one. However, this question turns out to be irrelevant. In Cole and Fredrickson's reply, they notably fail to address the question of the various errors in their original dataset, which we discuss quite extensively (and for good reason) in our supporting information. In particular, Cole and Fredrickson do not address the coding error in their original dataset for participant SOBC1-1299, which we examine on pages 3 and 24 of our Supporting Information. Near the end of our Table 7, we show that this coding error can be (and should have been, in the original study) resolved in one of two ways, either of which results in a reduction of over half in the magnitude of the effect for "hedonic" well-being that was reported by Fredrickson et al. (as well as a small change in the magnitude of the effect for eudaimonic well-being). In other words, had this coding error not existed in the 2013 dataset, Fredrickson et al.'s figures of +0.28(hedonic)/-0.28(eudaimonic) for the 2013 study should have been calculated and reported as approximately +0.13(hedonic)/-0.27(eudaimonic). To subsequently obtain +0.25(hedonic)/-0.21(eudaimonic) with the new sample thus appears to be evidence against a successful reproduction of the original results (unless some new theory can explain why the effect of hedonic well-being has suddenly doubled).

Summary

In summary, we stand by our overall conclusions, namely that Fredrickson et al.'s article does not tell us anything about the relationship between different types of well-being and gene expression. We will be sending a summary of the above position to PNAS for peer review and possible publication as a Letter to the Editor.

I am sure that this will not be the final word on this matter. We trust that Cole and Fredrickson will go back and re-examine their study in the light of our response, and perhaps return with some additional information that might clarify matters further. We anticipate that our peers will also contribute to this debate.

22 August 2014

(A few weeks ago I wrote a comment on Erik-Jan Wagenmakers' blog. Somebody contacted me to say that they would like to be able to link to my comment, but Disqus doesn't provide individual URLs for comments. So I am using my blog to repeat the comment here. The only change I have made is to italicise two words which was not possible in the original content format, or at least I didn't know how to do it. Please read the original post first to establish a little bit of context and perhaps save yourself wondering what I'm rambling on about.)

The standard human problems of power/status and money seem to be all-pervading in psychology; why should we expect anything else?

I would like to advance the radical thesis that the *entire point* of a whole class of contemporary social-psychological research (i.e., not just a nice side-effect, but the PI's principal purpose in running the study) is to generate "Gladwellizable" results. Such results will, as a minimum, earn you considerable kudos among the less critical of your colleagues and grad students, and probably also keep your institution's director of communications very happy ("University of Madeupstuff research is featured in the Economist/NY Times again"). More advanced practitioners can leverage their research into their own mass-market publications/lectures/audiotape series, thus bypassing the Gladwell/Pink axis and turning the results of their grant-funded research into $$$ for themselves.

I'm with Kahneman: this will not stop until a train wreck occurs, quite probably involving some major public policy decision. The actual train wreck will be 10-15 years down the line when the Government Accountability Office (etc) catches up with things, by which time the damage will have been done (it will take a generation or more to undo some of the myths floating around out there) and the perpetrators will be lying in the sun, untouchable (they will, perhaps, mutter "science self-corrects", aka "heads I win, tails I get away with it"). The asymmetry is visible from space: find a gee-whiz result, speculate loudly on its implications for humanity, and make a pile of money/power/influence; have it refuted (which almost never happens anyway, since in psychology "A" and "not A" seem to be very happy co-existing for ever) and the worst that can happen is that you have to spin your idea as having being "refined" by the latest findings, which in fact "make my idea even stronger".

As EJ is finding out here, defiant denial seems to impose very little cost on those who engage in it. Until the industry [sic] decides to change that, this will continue. But, remind me again what Gladwell's advance was for his latest book? That's what you're up against.

We were not very happy about this, to say the least. Our article addressed two other papers published in other journals (Losada, 1999; Losada & Heaphy, 2004 [PDF]) as well as Fredrickson and Losada's. Furthermore, we initially had our article summarily rejected because (according to AP) it was a comment, and comments are only accepted for 3 months after the appearance of an article. (Quite how science is meant to self-correct if any mistakes which are not identified within 90 days become officially true, is an interesting but separate issue.) Only after we wrote directly to the CEO of the American Psychological Association did we get an offer for our article to be reviewed (which was done in a very thorough and professional manner, I should add). We expected that Dr. Fredrickson would be invited to reply --- indeed, we encouraged this, on the basis that we would have the last word, as is customary in these discussions. So when we were later told that we were not going to be offered a final response, we were not very happy.

We considered writing our reply and sending it to a different journal, combined perhaps with some complaining via blogs or the news media. However, in the end we decided to grit our teeth and embark upon the APA's publications appeal process. It took a lot of effort (almost all by Harris Friedman), but in the end, we were allowed a 1,000-word reply --- and with some further negotiating, this became 2,000 words. (I note, in passing, that when AP's position was that Dr. Fredrickson's reply was the end of the A-B-A sequence, she was allowed 5,800 words and 60 references, almost none of which addressed the points made in our article.)

After review, our response to Dr. Fredrickson's reply to our article is now in press at American Psychologist. You can see the final draft version here [PDF].

But wait --- there's more!

After our article, together with Dr. Fredrickson's reply, appeared in the December 2013 print edition of American Psychologist (which, as part of the benefits of APA membership, has a print run of about 100,000, meaning that my photo is now in the bathroom of most of the psychologists in North America), several people wrote to the editor with comments. Five of these were selected for publication. After some further discussion with the editorial team, we obtained the right to respond to these comments, which one would have thought would be automatic... never mind. We replied to each of the comments, concentrating in particular on the three which critically addressed our article. Our reply is now also in press, but I don't want to share the full draft in this case because to put it in context requires reading the readers' comments, and I'm not in a position to share them. (But for those who enjoyed the first article: there are a few zingers in this one to look forward to.)

I'll end on one of the things that didn't make the cut of our reply to those readers' comments. On p. 821 of the December 2013 issue of American Psychologist, Fredrickson and Losada jointly authored the "withdrawal" of Losada's mathematical model "as invalid". Yet, if you visit the web site of Losada's consulting company, you will find the model everywhere, right down to the stylised "Lorenz butterfly" which forms the favicon of the site. Quite how the model can simultaneously be withdrawn as invalid while simultaneously forming the backbone of a training programme that has apparently been delivered to companies such as Apple, Boeing, GE, and AT&T (cf. the "Clients" tab of the Losada Line Consulting site) is, I suppose, just another miracle of mathematical logic to which we mere mortals are not to be allowed access.

PS: Hot off the press: American Behavioral Scientist, which published the Losada and Heaphy (2004) article, has recently issued an Expression of Concern about it. The last paragraph is particularly intriguing; have you ever heard of the editors of a journal, whether permanent or a guest team for a special issue, taking an incoming article and asking a grad student to polish it up a bit for their target audience? (I note that one of the guest editors of the special issue in question, Kim Cameron, is a senior professor at the Ross School of Business at the University of Michigan, where Emily Heaphy was studying at the time.)

11 March 2014

This is not an original idea. I saw it the other day, somewhere on the web. I haven't been able to find it again via Google; maybe it was inside some proprietary comment engine on someone's blog post.

It may well not even have been original with the person whose version I saw the other day. Very little is truly original. Maybe that person copied it; maybe they came up with it independently.

Anyway.

The discussion was about pre-registration of studies. Most people seem to think pre-registration is a good idea, as a way to help fight some of the current problems in science, especially the social sciences, especially psychology, especially (I would argue) social psychology. If you have to pre-register your study, including what you're going to look for, that ought to prevent or reduce all kinds of questionable research practices. If you have to state your hypotheses before you start, you can't HARK later. If you state how many people you're going to have in your sample, you can't apply your own stopping rules, or several other researcher degrees of freedom.

The difficulty seems to be in establishing a register of all these studies. There must be tens or even hundreds of thousands of studies conducted worldwide each year, in the psychology and related departments of thousands of universities worldwide. Just the logistics of creating a central repository for that would be overwhelming. Don't expect the journals to do it, either. Although a few journals have announced their intention to support replication via the concept of a "registered report," this is completely unscalable; it more or less requires the journal to guarantee in advance that the article will be accepted, regardless of the results. And while some journal editors are starting to pay lip service to the idea that it's important to publish null results, the incentive structure of impact factors means that these are likely to remain token efforts, a drop in the bucket alongside the flashy, underpowered Gladwell-bait that nobody will be able to reproduce (but the authors will be way over the line into Tenure County before anyone notices, if they ever do).

And yet, as this person whose contribution I can no longer find (seriously, if you wrote something about this on the Internet in the last 10 days, point me at it and I'll give you full credit) points out, pretty much every PI on any research topic involving human subjects already preregisters their study today via the Institutional Review Board form.

Everyone hates filling out IRB forms. They ask you dozens of questions that have nothing to do with protecting potentially-vulnerable participants and everything to do with covering every imaginable base in case the university gets sued. (Never mind that most such lawsuits would be dismissed as frivolous, or that if you did something really bad, the form wouldn't count for a row of beans.) These forms are so irrelevant to protecting the vulnerable, they even ask you for ridiculous things that are none of the IRB's business, such as your research design and your hypotheses...

Whoa.

You're already forced to register your hypotheses and your design before you can even begin your study. Your institution is going to keep a copy of this for a million years, or at least until the great-great-grandchildren of any of the participants are dead. Probably 80% of the important stuff that a hypothetical universal pre-registration protocol might contain, at some point in an unknown bureaucratic future once the APA and APS and other interested parties have ummmed and ahhhed for a couple of decades, is being recorded right now for almost every study being done in the world, even those cute little quantitative exercises done by Masters' students as course assignments. (Actually, if the APA makes as big a mess of this as they have done with their "Open Data" journal, that 80% will probably turn into something more like 200%, but that's another story.)

To illustrate just how much information PIs are already being asked to provide, I picked a couple of results from a Google search for "IRB form" at random. Cornell's IRB form asks, "Please provide a lay summary of the study,
including the purpose, research questions and hypothesis to be
evaluated". The University of Minnesota's compact, succinct 18-page form
requires PIs to describe "the objective(s) of the proposed research including purpose, research question, hypothesis and relevant background information etc". As a Master's student at the
University of East London in the UK, doing a cute little quantitative exercises as a course assignment, I had to fill in a form that asked
for "... a statement on the aims and significance of the proposed
research, including potential impact to knowledge and understanding in
the field (where appropriate, indicate the associated hypothesis which
will be tested). This should be a clear justification of the proposed
research, why it should proceed and a statement on any anticipated
benefits to the community". Helpfully, the form then adds "(Do not
exceed 700 words)". 700 words? When did you last read an article with 700 words, or even half that number, of justification of the hypotheses and the research design?

So, here's the challenge to journal editors and IRB chairpersons: this is an approach that would allow you to implement a meaningful, albeit undoubtedly imperfect form of study pre-registration which would have an immediate, overnight, substantial effect on the quality of published research, for a total worldwide cash outlay of approximately zero. All that has to happen is that the editors start to require that authors, instead of mechanically copying and pasting the words "Ethical approval was obtained from the IRB of Pabulum College" from their last 10 articles, actually submit a copy of the form granting that approval for the attention of the reviewers. And of course, the IRBs have to be prepared to give out this data --- but what reason would they have to refuse? Any specific information concerning vulnerable groups could be redacted, along with the PI's mobile phone number; but in most cases, the information on IRB forms is no more sensitive than last week's cafeteria menu. And surely if the IRB forms are as big and complex and bureaucratic as they are, it's because the IRB has a mission, beyond protecting human subjects, of protecting the institution's reputation more widely --- because after all, it would be very embarrassing for a major university if it became associated with the repeated publication of dubious research.

18 February 2014

Apparently, Scandinavia is big in the UK right now. (Something "foreign" is always big in the UK, it seems.) And when something is big, the backlash will be along very soon, as exemplified by this Guardian article that I came across last week. So far, so predictable. But while I was reading the introductory section of that article, before getting to the dissection of the dark side of each individual Nordic country, this stood out:

[T]he Danes ... claim to be the happiest people in the world, but why no mention of the fact they are second only to Iceland when it comes to consuming anti-depressants?

Hold on a minute. When those international happiness/wellbeing surveys come out each year, Denmark is always pretty close to the top. So I checked a couple of surveys of happiness (well-being, etc.). The United Nations World Happiness Report 2013 ranks Denmark #1. The OECD Better Life Index (BLI) 2013 rather coyly does not give an immediate overall ranking, but by taking the default option of weighting all the various factors equally, you get a list headed by Australia, Sweden, and Canada, with Denmark in seventh place (still not too shabby). The OECD report adds helpful commentary ; for example, here you can read that "Denmark, Iceland and Japan feel the most positive in the OECD area, while Turkey, Estonia and Hungary show lower levels of happiness."

So what's with the antidepressants? Well, it turns out that the OECD has been researching that as well. Here is a chart listing antidepressant consumption in 2011, in standardised doses per head of population per day, for 23 OECD member states. Let's see. Denmark is pretty near the top. (It's not second, as mentioned in the Guardian article above, because the author of that piece was using the 2010 chart.) And the other top consumers? Iceland first (remember, Iceland is among the "most positive [countries] in the OECD area"), followed by Australia, Canada, and (after Denmark) Sweden. That's right: the top three countries for happiness according to the UN are among the top five consumers of antidepressants in the OECD's survey(*). And those countries showing "lower levels of happiness"? Two of the three (Estonia and Hungary) are on the antidepressant list - right near the bottom. Perhaps they'd be happier if they just took some more pills?

I decided to see if I could apply a little science here. I wanted to examine more closely the relationship between antidepressant consumption and happiness/wellbeing/etc. So I built a dataset (available on request) with a rank-order number for each of the 23 countries in the antidepressant survey, on each of several measures. Then I asked my trusty computer to give me the correlation (Spearman's rho) between consumption of antidepressants and each of these measures. Here are the results:

Measure

Correlation with antidepressant
consumption (Spearman's rho)

p

UN World Happiness Report 2013

.590

.003

OECD BLI 2013 - Life Satisfaction

.621

.002

OECD BLI 2013 - Self-reported health

.730

.000

OECD BLI 2013 - Educational attainment

-.077

.727

OECD BLI 2013 - Leisure time

.106

.631

OECD BLI 2013 - Overall (equal weighting)

.653

.001

For the uninitiated: the first three lines, and the last line, show substantial and statistically significant correlations between the item concerned and antidepressant usage. The fourth and fifth lines show no significant correlation. Overall summary: countries with high antidepressant usage also report high happiness/life satisfaction/health.

Note that this is not a case of "everything being correlated with everything else" (as the great Paul Meehl put it). Only certain measures from the OECD survey are significantly correlated with antidepressant consumption. (I encourage you to explore the measures that I didn't include.)

Ah, I hear you say, but this is only correlational. When two variables, A and B, are correlated, there are usually several possible explanations. A might cause B, B might cause A, or A and B might be caused by C. So in this case, consuming antidepressants might make people feel happy and healthy; or, being happy and healthy might make people consume antidepressants; or, some other social factor might cause people to consume antidepressants and report that they feel happy and healthy. I'll let you decide which of those sounds plausible to you.

Now, what does this prove? Probably not very much; I'm not going to make any truth claims on the basis of some cute numbers. After all, it's been "shown" that autism correlates better than .99 with sales of organic food. But here's a thought experiment for you: Imagine what the positive psychology people would be telling us if the results had been the other way around --- that is, if Australia and Denmark and Canada had the lowest levels of antidepressant consumption. Do you think it's just remotely possible that we might have heard something about that by now?

(*) The OECD has 34 member states, of which some, such as the USA and Switzerland, do not appear in the antidepressant consumption report. All correlations reported in this post are based on comparisons in rank order among the 23 countries for which antidepressant consumption data are available.

[ Edit 2014-02-20: fixed the figures in the tables, following the discovery of a couple of minor typos in the data. Correlation of UN Happiness Report changed from .595 to .590. Correlation of OECD Better Life Index 2013 - Life Satisfaction changed from .617 to .621. Correlation of OECD Better Life Index 2013 - Self-reported Health changed from .734 to .730. ]