Pages

Thursday, 7 November 2013

Psychology is big into replication these days. A lot of people think that a major problem with the field is that many important results have not been replicated, and that this is in part because journals don't like to publish replications (not original or sexy enough).
I'm all for replication; it's part of good science. But I've never been that into the whole 'replication movement' that's kicking around, and the reason crystallised for me during a 4am baby feed: Being able to replicate a study is an effect, not a cause of good scientific practice. So the emphasis on replication as a goal has the whole thing backwards. We should actually be focusing on improving the experiments we run in the first place. If we run better experiments, the replicability will take care of itself.
Better experiments means better theory. We need strong, clearly formulated theories in order to generate strong, clearly formulated hypotheses that we can then test vigourously with robust results. We wrote a paper (Golonka & Wilson, 2012) about how Gibson's ecological psychology, while not a complete theory of psychology, stands as an excellent example of a) how to be theory driven in psychology and b) how well it can work out for you empirically. I'd like to quote the introductory section because I love how it came out and it summarises the argument:

When particle physicists recently found that some neutrinos had apparently travelled faster than light (Adams et al. 2011) it never actually occurred to them that this is what had happened. On the basis of the extraordinarily well supported theory of relativity, the physics community went ‘that's weird - I wonder what we did wrong?’, and proceeded to use that theory to generate hypotheses they could then test. It would take a lot of fast neutrinos to disprove relativity, and even though the result turned out to be caused by a faulty cable, the robust response by physicists stands as an example of the benefits of a good theory. Similarly, the core of modern biology is the theory of evolution. When creationists say ‘we can’t see how a bacterial flagellum which rotates like an outboard motor could possibly have evolved, it’s irreducibly complex’ (e.g. Dembski 2002), biologists are entitled to say ‘we have evidence that lots and lots of other things have evolved. Let’s see if we can figure out how the flagellum did it, and in the meantime, we’re going to operate on the assumption that it did evolve until we have strong evidence to the contrary’. The resulting theory driven empirical work then happily led to a coherent evolutionary story for the flagellum (e.g. Musgrave 2004).

Psychology has many individual theories describing isolated phenomena but no core theory of behaviour to guide our research, no analogue to the theories of relativity or evolution. This is beginning to cost the discipline. Recently Bem (2011) published a series of experiments purporting to demonstrate evidence of precognition. Bem took several standard psychological experiments and reversed the temporal ordering of the elements. Analyses showed a series of statistically significant effects that suggested that events in the near future were affecting earlier performance. For example, he showed participants a list of words then tested their free recall. After this test, he trained the participants on a subset of the words, and showed that there was improved recall of those words, even though the training had come last. Because he followed the rules of experimental design and had statistically significant results, the Journal of Personality & Social Psychology was unable to find a reason to reject the paper. The editors only noted that “the reported findings conflict with our own beliefs about causality and that we find them extremely puzzling” (Judd & Gawronski 2011: 406, emphasis ours). Note that the cited conflict was with their beliefs about causality, and not, for example, the laws of physics and what they have to say about time travel. This should have been an opportunity for Bem to discuss problems with the standard methods and analyses that produced these physically impossible results (the approach taken in a companion paper by Wagenmakers, Wetzels, Borsboom & van der Maas 2011). Instead, his discussion was framed in terms of a loose reading of quantum physics and an appeal to psychologists to keep an open mind. The paper simply described what had happened, without any real attempt to explain how it had happened. A failure to replicate Bem’s key effects has recently been published (Ritchie, Wiseman & French 2012), but this paper was also entirely empirical and descriptive in nature, with no reference to any underlying theory of how the world works.
Psychology needs a core theory in order to mature as a science. Theory serves a dual role in science. It allows the scientist to identify when a result is likely to be anomaly (e.g. faster-than-light neutrinos), and, more critically, it provides a guide to discovery to structure the search for explanations of novel phenomena (e.g. the bacterial flagellum). The Bem experiments demonstrate how, without a theory, psychology is unable to deal rigorously with anomalous results. This paper will discuss how an example psychological theory (James J. Gibson’s ecological approach to visual perception; Gibson 1966; 1979) has been able to guide discovery and explanation of new phenomena, specifically how people learn to produce a novel coordinated rhythmic movement. It has been able to do this because it is a theory of both the objects of perception and the ecological information that supports that perception. The theory can therefore be used to propose specific mechanisms to explain a given behaviour, rather than simply providing some terms to describe that behaviour. We will suggest that the successes of this approach in the area of perceptually guided action stands as a clear model of what a truly theory-driven psychology could achieve.

Psychology is an empiricist discipline; it has no core theory and so it leans heavily on it's empirical results to prove that it's doing something interesting. This, I think, is why replication has been held up as the saviour of psychology; our training makes us think that saving the phenomena will save the science. But that's not really how it works.
Take social priming; what a mess. Specific results stand, fall and stand again as people run replication attempts, and the reason for the mess is that social priming is a poorly thought out paradigm to begin with. Results will sometimes replicate; statistically even a broken clock will be 'working' 5% of the time. This is just the nature of the statistical game we play. But none of these replications or failures of replication do anything to make social priming a more scientifically rigorous discipline. (An upcoming paper in Perspectives in Psychological Science seems to agree with me here; HT to Rolf Zwaan via this recent related post).But even in psychology there is good theory driven work. Golonka & Wilson (2012) use coordinated rhythmic movement as a model. When I run coordination experiments, I never worry about being able to replicate the basic pattern of in-phase being easier than anti-phase, and thanks to the theory based model I use I know why I can count on it. I also discussed some work by Geoff Bingham to contrast with to the typical small-effect psychology that is targeted for replication. When Geoff figured out metric shape perception, after years of hypothesis driven work ruling out ideas, he got to a point where he could make it work, or not work. There is no question about it only working sometimes, when the light is right and the statistical stars align. The moral of the story is that we know focusing on theory works.Look, go forth and replicate. Your success or failure will tell you something about whether the theory your experiments come from is any good because replicability is an effect of good science. But just don't think that replications can save psychology; only theory can do that, and so it's time to start thinking about what that might look like (Wilson & Golonka, 2013 has our thoughts). ReferencesGolonka, S., & Wilson, A. D. (2012). Gibson’s ecological approach - a model for the benefits of a theory driven psychology. Avant, 3(2), 40-53. Download
Wilson, A. D., & Golonka, S. (2013). Embodied Cognition is Not What you Think It Is. Frontiers in Psychology, 4.Download

14 comments:

This deserves a longer, better thought out comment, but this will have to suffice as a start...

I agree with you to a large degree on two claims: 1) replication (alone) will not save psychology, and 2) we need better theory. Having said that, I think that while there has been an emphasis recently on replication (for very good reason), both replication and better theory are critical, and they are complements that balance one another out, rather than one trumping the other or vice versa.

A good theory is great, but theories as the saying goes, are never right, some are just useful (or less wrong). When the neutrinos traveled faster than light, it was critical to have a good theory there to make sense of the situation -- we must have made a mistake somewhere. But when astronomic measurements disproved the heliocentric universe that's the same reaction scientists had then too.

Replication is essential because when you have a result that's anomalous given your theoretical expectations, it's really important to know whether it's something that happens consistently or not. If it does, then you can't just ignore it and sweep it under the rug.

When you live in a world where replication happens only occasionally, then the very data upon which theory is built is questionable, and theorists have their own degrees of freedom to exploit, which is just as problematic as researcher degrees of freedom.

It's fine to say that we need better theory and then replication will follow, but that's just a nice thing to say, it's not a plan of action. Given the current academic context, it seems to me that replication is an important place to start, even if it doesn't, by itself, solve our problems.

I'm far less troubled by the ESP research than you seem to be. It was an anomalous result, found (it can reasonably be assumed) in good faith. It was published (and maybe it shouldn't have been given the credibility it was, but that seems like a different argument; certainly it shouldn't have been suppressed). Then the rest of the academic community kicked into gear and tried to replicate it, unsuccessfully, and poked holes in Bem's methodology.

That's science working pretty well, I'd say. What would have been worse would have been to dismiss the findings outright because they didn't fit our theory. No harm done with ESP, but the outcome is a lot worse when a similarly anomalous, but true set of data come along.

I imagine we're pretty much on the same page most of the way here -- it's a mistake to let the emphasis be too strongly on replication or theory, particularly when it's to the exclusion of the other. I'm just not sure that the idea that replication will solve everything isn't a bit of a straw man. It's necessary, but certainly not sufficient. Still, more attention to better theory is a good idea -- it just can't save psychology by itself any more than replication can.

I think Nussbaum pretty much nailed, but other points of difference are worth raising. The first is with the use of the term "save," which just sounds way too alarmist / pessimistic to me. Maybe I haven't been paying enough attention to broader trends, but it seems to me that psychology has found itself in (roughly) better shape every day since its inception. Isn’t that the natural trend of progress in any science that builds theory on a cumulative body of empirical data through decentralized dialogue among independent theorists? I’m an optimist at heart, so I admit that’s how I derive my opinion on this matter, and won’t try to support it any further than you justify the implication that the damsel psychology is in distress. (I’m not even supporting it that far, to be fair!)

However, in my next point, I’d go further than Nussbaum in saying the whole discussion of Bem's experiments is blatantly unfair. For instance, reconcile this sentence, "The Bem experiments demonstrate how, without a theory, psychology is unable to deal rigorously with anomalous results," with this sentence from the preceding paragraph: "This should have been an opportunity for Bem to discuss problems with the standard methods and analyses that produced these physically impossible results (the approach taken in a companion paper by Wagenmakers [and colleagues])." Since you acknowledge that JPSP chose to publish a companion piece by psychologists that did in fact directly "deal rigorously with anomalous results," it makes no sense to claim "psychology is unable to" do so. It's nonsense, and it's unfair to all of those psychologists who did deal with this rigorously in a variety of ways, and it's unfair the rest of us who could in an even wider variety of ways if we didn’t already consider it a proverbial dead horse.

"Physically impossible results" is also nonsense if taken literally. As in quantum physics, no results are physically impossible until proven so thoroughly in all conceivable scenarios, and even then we would still have to worry about what we did to change the system by observing it. Sufficiently thorough disproof of a phenomenon’s general possibility is probably one of the most "physically impossible results" I can think of, so people should probably avoid claiming impossibility almost as carefully as the better empiricists among us avoid claiming proof. Even when we accept a null hypothesis, we shouldn't claim to have proven something doesn't or can't happen, only that it didn't happen (...consistently enough to produce results we think it should have produced). Even when using Bayesian inference to reject Bem's statistically significant results, we base this decision on the insufficiency of Bem's effect size to support his unparsimonious-hence-improbable explanatory theory. We judge its parsimony by comparison with the prior assumptions we accept collectively, and we thus conclude it is improbable, but we are better skeptics for not assuming it is impossible.

Thus your criticism of the published failure to replicate Bem as “empirical and descriptive in nature, with no reference to any underlying theory of how the world works" rings hollow. First, the articles that use Bayesian inference to reject Bem's finding (there are at least two such articles I know of, and I think both were published by multiple coauthors in peer-reviewed journals...how's that for rigorous?) handle the business of referring to underlying theory quite nicely in this case. They also argue that psychologists should do this more often, and Bayesian analysis has already established a non-negligible role for itself in psychology. It would be ignorant and arrogant to think oneself the lone voice in the crowd advocating a science of psychology that attends sufficiently to prior theory, or any lesser shade of that caricature...

I agree not enough psychologists study and synthesize sufficiently, but it’s a popular complaint too (I don’t know, but I have a feeling its cumulative popularity throughout psychological literature would easily outweigh the recent replication uproar), and we’re working on it.

As for your argument that “only theory can save psychology,” you offer your own counterexample by mentioning this published failure to replicate Bem’s results. In as much as that study didn't rely on old theory to undermine Bem's new-ish theory, I'd say this is both a success story for the “replication movement,” and a demonstration that replication studies can cure bad theory (gradually) without depending on other theory (mostly). If one can reject new theory without referring to more underlying theory than is necessary, that's the more parsimonious way to do it. To reverse the conclusion of your 4AM reasoning, "If we run [worse] experiments, the [lack of] replicability will take care of [the theoretical problem] itself." This only requires that we make sufficient replication attempts before accepting new theory into the body of Bayesian priors (let alone textbooks and methods of application, which we all know are far enough behind the cutting edge to afford researchers plenty of time), and that we pay sufficient attention to failures to replicate when / if we do. Since the study you mention got published, it's hard to argue that hasn't happened in this case, or that the other failures to replicate really needed to be published--in this particular case, as it stands today. If someone else publishes a successful replication someday, as you say happens in other cases, that would surely renew the collective interest in all the other failures to replicate…and in Whoville they’d say, “Our file drawers grew three times that day!”

In light of your blog’s repeated (central?) claim that psychology "doesn't have a theory," it seems you’re busy grinding an axe for psychology-as-is, and that you’re using an unrealistic, inflated opinion of physics as a reference point to justify your condemnation. I could be reading too much into things though; I admit I have a chip on my own shoulder. I'm sick of people pretending it's fair to compare physics (or other classic "hard sciences") with psychology (or other social sciences) in terms of how much theorists agree with one another, or in terms of how well we can explain / predict / etc. the behavior of inanimate objects (or even animals) vs. humans. It wouldn't be such a bad thing if these comparisons were made fairly, but usually the specific comparison is between a relatively simple, directly observable physical phenomenon, and a complex or latent psychological phenomenon that depends on many more underlying processes. If the people with whom these comparisons are so popular were more aware of physical research on complex / latent / multi-determined phenomena, I think they'd see that even physicists also disagree with one another quite a bit, and rely on loose-fitting statistics quite a bit, and fail to predict quite a lot. I’m sure this would’ve been the case with Adams and colleagues’ seemingly superluminal neutrinos too, if they hadn’t found that faulty cable. As in the news article on this you originally linked, “[A] senior lecturer in particle astrophysics…said: ‘Neutrino experimental results are not historically all that reliable, so the words 'don't hold your breath' do spring to mind when you hear very counter-intuitive results like this.’" As in that article, even physicists know they need replication, not just theory.

That being said, one obvious contrast exists between physics and psychology: physics has a much longer history of rigorous empiricism. Even today, psychology is not a purely empiricist discipline, as many humanistic and old-school Freudian / Jungian psychologists would attest. The pure empiricists among us would probably say those theories are on their way out, and that’s what focusing on theory gets you if it comes at cost to new and replicative research. I’d say both approaches have their place, and the humanistic and psychodynamic theories will still have some utility at least as long as it takes us to replace them completely for all the purposes they serve. They do work for some things (e.g., counseling and describing certain defense mechanisms, respectively), and they don’t for others, so judging by these as representatives of theory-focused psychology, do “we know that focusing on theory works?” No. It’s not hard to find critics of ecological psychology either. Nonetheless, I trust your theory-focused work deserves consideration on its own merits apart from its epistemological class (and so do atheoretical experiments), I hope it gets big enough to find you your share of attention from the ubiquitous haters, and I sincerely hope you trounce them all, but good luck convincing them that you have.

As for physics, consider the modern state of the historical equivalents to Freud’s theoretical approach. When physics was as young as psychology, Newton practiced alchemy and Copernicus had us orbiting the center of the universe, unless you want to go back as far as Aristotle…and I don’t think you do. Now we have 24 times as many elements, no idea where the center of the universe is, and about as much respect for traditional alchemy as we do for astrology…which is still more than zero, all quacks considered! Getting our scientific ducks in line has always been a slow process, generally approaching but never achieving completion in any science. Hence if you set your standard for theoretical consensus too high, any science will fail, and psychology as a whole only really performs poorly in this when we fail to adjust the standard for its relative youth.

Last point (or set of points): this blog suggests an unfairly prejudicial disposition toward exploratory research in general. New research needn't (and in many cases shouldn't) depend on prior theory to contribute toward new theory. In your 2011 blog about superluminal neutrinos and the weaknesses they imply for psychology, you say this:"Psychologists hate ruling things out...because they have no particular reason to rule anything out. But this has a huge cost: psychology becomes a mere collection of empirical results, with nothing tying them together. Results from the different disciplines can't inform each other, because they aren't testing the same things."Respectively, these three sentences are unfair and false, unappreciative and false, and false and false. The first sentence again implies a false contrast with other sciences; hints at an underlying, unresolved, personal disagreement with or misunderstanding of skeptical empiricism; and ignores the whole point of null hypothesis significance testing. The second sentence dismisses the value of “mere” empirical research, which, when rigorously scientific and potentially valid, is often considerable even when conducted in isolation from theory, and is sometimes most valuable when theory can't tie it together at first (imagine if we hadn’t found the problems with superluminal neutrinos or Bem’s studies!); the second sentence also ignores all the theories that do tie together a great deal of empirical results (e.g., any halfway decent review paper/journal/text/book), however short they may fall from achieving a Grand Unified Theory of psychology (physics lacks a GUT too BTW).

The third sentence just suggests ignorance or an unfair dismissal of any good counterexample from the modern interactions of historically distinct disciplines, and an underlying personal disagreement with or misunderstanding of the means by which psychologists operationalize latent constructs.

I’ll only offer the one counterexample with which I’m intimately familiar: the slowly mending rift between social psychology and personality psychology. Plenty of theoretical cross-fertilizing, experimental paradigm-replicating, and measure-sharing has been going on there for over twenty years now. Sure, there could be more mutual informing going on, but the emphasis is on COULD, because some has been ongoing for decades. Practically every personality psychologist has some appreciation for the psychological power of the environment over people’s personalities, but personality psychologists have also published powerful theoretical accounts of the mountains of empirical evidence that speak to the consistency of personalities across situations. If you want the next best thing to proof of that, I suppose I can offer another counterexample: studies of twins raised apart. There again you have different disciplines informing one another, merging as behavioral genetics, and providing evidence that whether we measure at the level of the gene or the level of the latent personality trait, it seems we are in some part testing the same things.

There is another important, general problem in science (not just psychology), and I think you might agree: we expect all our scientists to do all the work themselves, including experimentation, theorization, AND teaching. This might be ideal, but it’s too idealistic. Not every scientist has all three talents, nor does one need all three to contribute meaningfully, even laudably. I recently argued this in a conversation about colleague’s opinion article in the local newspaper, in which he proposed that scientists deserve more fame. The problem raised at the time was that exceptional, popular physicists like Stephen Hawking and Neil deGrasse Tyson catch flak from their researcher colleagues for spending too much time on public outreach, and not enough on original research. Even if they aren’t the most revolutionary original researchers (emphasis on “if,” not stating an opinion here), Dr. Hawking has certainly delivered exceptional, fame-worthy theory, and Dr. Tyson has certainly earned his fame as a public advocate for his science. (And for science in general!) By the same reasoning, we should not thumb our noses at excellent original researchers, even if they leave it to someone else to explain what they discover first or demonstrate best.

Truly valid theory is the purview of the literary reviewer. Yes, we should all try to review enough to write an introduction and theorize enough to write a discussion, but no, we don’t all need to focus on this. Personally, I'd argue that any psychological theory worth taking at face value should have at least one meta-analysis or another kind of big, serious literature review to back it up. Every theory that falls short of that standard should probably be regarded as a work in progress, analogous to the release of beta versions of software: potentially unstable, not to be trusted as bug-free, and in a state that users should expect will be revised, and remain wary of until it has been. That doesn’t make beta software useless, and you wouldn’t expect every individual who writes code for the beta version to also co-manage the beta testing.

Career specialization is okay (not to mention the thus-far irreversible trend of modernization), as long as every job still gets done. I agree the reviewers and Grand Unifying Theorists among us are having trouble keeping up with the original researchers, but let’s not hold back the asymmetrically talented researchers among us just so the theory can catch up, and let’s not expect too much from the discussion of every article that gives us some interesting results. Scientific discovery is a creative, opportunistic, and serendipitous process, so we simply can’t expect it to always follow a proper theoretical introduction. Sometimes we just have to write our intros after we have our results (imagine Alexander Fleming’s predicament when he had no better term for penicillin than “mould juice”). We should not force scientific pioneers to accept the same fetters that scientific synthesizers need to keep our more thoroughly vetted theory grounded. Instead, we should better appreciate the distinction between these jobs, not demand that everyone do both, and not portray the work of one as the work of the other, especially not as a straw man to burn as an effigy of any whole science.

Hi, I am commenting here because Twitter can sometimes be a little too brief. Having read the entire post (which I had not before, and for which many thanks) I can say that I agree with you that we need better theories, but I am unsure how psychology will go about achieving them.I am with Nussbaum on very many points. Most theories in our playground are re-descriptions. You urge that we should create "strong, clearly formulated theories" to test. OK, how about "boy wants to kill father to make love to mother". It is certainly strong, and clearly formulated. However, I feel it lacks something. Even Freud felt so, and stuck in a hydraulic system pumping between different hypothetical reservoirs, id, ego and so on. Despite that, it remains unsatisfactory. Our problem is that we don't have good mechanisms. Perhaps only biology will provide those. I am tired of psychologists drawing boxes with arrows. In the meantime, we will have to wing it, trying our best to make sensible suggestions which we can then test, while being aware we are just skimming the surface.Now I will go back to my long postponed post on the theory of complexity.

I agree with the sentiment. As a organizational psychologist, drawing boxes is the thing to do. The problem is, questions in psychology are all very isolated, and require its own language. Maybe that IS the problem. But, we are never going to have a theory as grand as evolution, or relativity, or even macro-economic theories. So I'm torn whenever I talk about this issue, because in a perfect world, psychology is as precise, as theoretical, and as much of a "science" as biology, or physics, but...

"It's fine to say that we need better theory and then replication will follow, but that's just a nice thing to say, it's not a plan of action. Given the current academic context, it seems to me that replication is an important place to start, even if it doesn't, by itself, solve our problems."

Yes, that makes sense to me.

Also: doesn't replication help with formulating (more specific) theories? (i.e. when findings don't replicate it might point to important factors necessary for it to replicate?)

Also: I wonder if researchers already think they are contributing to theory building, and - testing. Take the social priming stuff: I gather the people engaged in that kind of research think they are contributing to theory building- and testing. If this is correct, then maybe one needs more specific guidelines/ statements concerning what better theory building, and - testing implies?

Also: At what point are "failed" replications seen as direct evidence for the probability that the original findings are so dependent on countless possible minor factors that they can basically be seen as random? Maybe it's just a faster way to get to that conclusion (compared to spending decades to find boundary conditions, and even more boundary conditions, and then interactions between boundary conditions, etc., etc.)

I also find the Bem findings very interesting. At what point does one say that certain findings are simply a waste of everybody's time, money, and effort? I certainly hope that Bem is going to spend the next few decades looking for boundary conditions, and providing more theoretical viewpoints. What's the difference between Bem's findings and, let's say, social priming findings?

I'm not so pessimistic about psychology lacking theory. Especially the field of mathematical psychology has provided many useful models and theories. For example, how about the diffusion model for information accumulation in speeded decision making (popular nowadays in neuroscience), models of reinforcement learning (originating from Thorndike's law of effect), Tversky & Kahneman's work on preferential choice, Busemeyer & Townsend's decision field theory, and other detailed and empirically testable models of perception, memory, attention etc. It is true that in some areas of psychology there is not sufficient theory. But if you want psychological theory, any issue of the Journal of Mathematical Psychology will give you plenty of it.

"But if you want psychological theory, any issue of the Journal of Mathematical Psychology will give you plenty of it."

Not really, as most of "it" isn't psychological theory at all in as much as a theory is to leave you feeling you now understand something that the mere empirical findings did not. And *that*, I suggest, is the problem: what amounts to a "theory" in psychology? If it is to be in contradistinction to a "mere" (?) empirical regularity, then what is it? Most of the explanations in experimental psychology amount to little more than accounting for one empirical phenomenon in terms of another: "oh, that is just priming, or negative priming, or Stroop, or etc.", but one never gets any real feeling of closure. Instead, one is left with, "Yes, but how does *that* work?"

Andrew's point, I think, is that ``replication'' per se does not help with *that* issue. We need more: not more data, but more theories: real theories, not the logical positivist f(x) statements in most mathematical psychology "theorizing", but real explanations that leave us feeling that we have learned something, but also which provide for a broad explanation for the extant data, importantly including the non-replications.

BTW, my point is not anti math psych (hardly, as I am often accused of that sin myself), just that "it" rarely rises to any form of a real explanation for most.

This almost reminds me of classic conceptions of rationalism in economic behavior such as being able to make psychological models of choice behavior in a Prisoner's dilemma game. Can a mathematical model predict the 'weight of' personal relationship between the actor and player 2, the 'amount of' jealousy of a competitor's gain and so on.

I think mathematical models have it's severe limitations to be about Psychological behavior in the same way that broader psychological theories predict complex social behavior. The question for me is whether we need to abandon vague theories that do not work and start at the bottom buidling up or take bad theories and try to improve them. For the latter I guess replication and theory forming could go hand in hand (but I am just a PhD student so I am not sure)