Hard on his heels now comes UCLA's Matthew Lieberman, who has published a piece in Edge on the replication crisis. Lieberman is careful to point out that he thinks we need replication. Indeed, he thinks no initial study should be taken on face value - it is, according to him, just a scientific anecdote, and we'll always need more data. He emphasises:"Anyone who says that replication isn't absolutely essential to the success of science is pretty crazy on that issue, as far as I'm concerned."

It seems that what he doesn't like, though, is how people are reporting their replication attempts, especially when they fail to confirm the initial finding. "There's a lot of stuff going on", he complains "where there's now people making their careers out of trying to take down other people's careers". He goes on to say that replications aren't unbiased, and that people often go into them trying to shoot down the original findings and this can lead to bad science:

"Making a public process of replication, and a group deciding who replicates what they replicate, only replicating the most counterintuitive findings, only replicating things that tend to be cheap and easy to replicate, tends to put a target on certain people's heads and not others. I don't think that's very good science that we, as a group, should sanction."

It's perhaps not surprising that a social neuroscientist should be interested in the social consequences of replication, but I would take issue with Lieberman's analysis. His depiction of the power of the non-replicators seems misguided. You do a replication to move up in your career? Seriously? Has Lieberman ever come across anyone who was offered a job because they failed to replicate someone else? Has he ever tried to publish a replication in a high-impact outlet? Give it a try and you'll soon be told it is not novel enough. Many of the most famous journals are notorious for turning down failures to replicate studies that they themselves published. Lieberman is correct in noting that failures to replicate can get a lot of attention on Twitter, but a strong Twitter following is not going to recommend you to a hiring committee (and, btw, that Kardashian index paper was a parody).

Lieberman makes much of the career penalty for those whose work is not replicated. But anyone who has been following the literature on replication will be aware of just how common non-replication is (see e.g. Ioannidis, 2005). There are various possible reasons for this, and nobody with any sense would count it against someone if they do a well-conducted and adequately powered study that does not replicate. What does count against them is if they start putting forward implausible reasons why the replication must be wrong and they must be right. If they can show the replicators did a bad job, their reputation can only be enhanced. But they'll be in a weak position if their original study was not methodologically strong and should not have been submitted for publication without further evidence to support it. In other words, reputation and career prospects will, at the end of the day, come down to the scientific rigour of a person's research, not on whether a particular result did or did not cross a threshold of p < .05.

The problem with failures to replicate is that they can arise for at least four reasons, and it can be hard to know which applies in an individual case. One reason, emphasized by Lieberman, is that the replicator may be incompetent or biased. But a positive feature of the group replication efforts that Lieberman so dislikes is that the methods and data are entirely open, allowing anyone who wants to evaluate them – see for instance this example. Others have challenged replication failures on the grounds that there are crucial aspects of the methodology that only the original experimenter knows about. To those I recommend making all aspects of methods explicit.

A second possibility is that a scientist does a well-designed study whose results don't replicate because all results are influenced by randomness – this could mean that an original effect was a false positive, or the replication was a false negative. The truth of the matter will only be settled by more, rather than less replication, but there's research showing that the odds are that an initial large effect will be smaller on replication, and may disappear altogether - the so-called Winner's Curse (Button et al, 2012).

The third reason why someone's work doesn't replicate is if they are a charlatan or fraudster, who has learned that they can have a very successful career by telling lies. We all hope they are very rare and we all agree they should be stopped. Nobody would make the assumption that someone must be in this category just because a study fails to replicate.

The fourth reason for lack of replication arises when researchers are badly trained and simply don't understand about probability theory, and so engage in various questionable research practices to tweak their data to arrive at something 'significant'. Although they are innocent of bad intentions, they stifle scientific progress by cluttering the field with nonreplicable results. Unfortunately, such practices are common and often not recognised as a problem, though there is growing awareness of the need to tackle them.

There are repeated references in Lieberman's article to people's careers: not just the people who do the replications ("trying to create a career out of a failure to replicate someone") but also the careers of those who aren't replicated ("When I got into the field it didn't seem like there were any career-threatening giant debates going on"). There is, however, another group whose careers we should consider: graduate students and postdocs who may try to build on published work only to find that the original results don't stand up. Publication of non-replicable findings leads to enormous waste in science and demoralization of the next generation. One reason why I take reproducibility initiatives seriously is because I've seen too many young people demoralized after finding that the exciting effect they want to investigate is actually an illusion.

While I can sympathize with Lieberman's plea for a more friendly and cooperative tone to the debate, at the end of the day, replication is now on the agenda and it is inevitable that there will be increasing numbers of cases of replication failure.

So suppose I conduct a methodologically sound study that fails to replicate
a colleague's work. Should I hide my study away for fear of rocking the
boat or damaging someone's career? Have a quiet word with the author of
the original piece? Rather than holding back for fear of giving offence
it is vital that we make our data and methods public: For a great
example of how to do this in a rigorous yet civilized fashion I
recommend this blogpost by Betsy Levy Paluck.

In short, we need to develop a more mature understanding that the move towards more replication is not about making or breaking careers: it is about providing an opportunity to move science forward, improve our methodology and establish which results are reliable (Ioannidis, 2012). And this can only help the careers of those who come behind us.

Saturday, 23 August 2014

This week saw the publication of a special issue of the International Journal of Language and Communication Disorders, focusing on labels for children with unexplained language difficulties. Two target articles, one by Sheena Reilly and colleagues, and one by me, are accompanied by an editorial by Susan Ebbels, twenty commentaries, and a final paper where Sheena and I join forces with Bruce Tomblin to try to synthesise the different viewpoints. These articles are free for anyone to access.

Terminological battles are often boring and seldom come to any consensus, so why are we putting time into this thorny issue? Quite simply, because it really matters. As we argue in the articles, having a label affects how a children are perceived, what help they are offered, and how seriously their problems are taken. 'Specific Language Impairment' has very poor name recognition compared to dyslexia and autism, despite being at least as common. Furthermore, unless we can agree on some common language, it's difficult to make progress in research, and to discover, for instance, the underlying causes of language difficulties, how common they are in different parts of the world, or what interventions work.

I was first confronted with the full extent of the problem when I tried to analyse the amount of research and research funding associated with different developmental disorders (Bishop, 2010). There are other conditions, notably autism and dyslexia, where there is plenty of debate about diagnostic criteria, or even about whether the condition exists. But even so, the terminology is reasonably consistent. For children's language difficulties, this is not the case - they can be described as cases of language difficulty, disorder, impairment, disability, needs or delay, with various prefixes such as 'developmental', 'specific' or 'primary'. Some researchers will use such labels with precise meanings, often excluding children who have co-existing conditions, whereas others use them more descriptively. This made it extremely difficult to do a sensible internet search to estimate the amount of research funding associated with children's language difficulties.

The confusion over labels has, I think, also contributed to the lack of public recognition of language difficulties in children. A couple of years ago, I joined together with Courtenay Norbury, Maggie Snowling, Gina Conti-Ramsden and Becky Clark with the goal of remedying this situation. We started a campaign for Raising Awareness of Language Learning Impairments (RALLI) (Bishop et al., 2012), and set up a YouTube channel to provide basic information. We spent some time debating what terminology to use: "Language learning impairment" was our preferred choice, but many of our videos talk of Specific Language Impairment, simply because that is a more familiar label. The lack of an agreed label proved a real stumbling block for our attempts at public engagement, and we decided that, as well as producing videos, one of our goals would be to get the terminology issue discussed more widely, in the hope of achieving some consensus. It was a very happy coincidence that Sheena Reilly and colleagues were crystallizing their own position on this question in an article in IJLDC, and that they, and the Editors, were willing to include my article, and the commentaries of other RALLI founders, in the published debate.

One thing that came across when reading commentaries on our articles was the disconnect between research and practice. One point on which I agree with Sheena and colleagues is that there is no justification for drawing a distinction between children whose language problems are comparable with below average nonverbal ability, and those who have a mismatch between good nonverbal skills and low language. Research has failed to find any difference between children with uneven or even nonverbal-verbal profiles in terms of responsiveness to intervention or underlying causes. Such a distinction is, however, widely used in educational and clinical settings to decide which children gain access to extra support in school. Another issue raised by the Reilly et al paper is whether it is logical to use other exclusionary criteria, and to distinguish, for instance, between children who do and don't have autistic features in association with a language problem. As Susan Ebbels noted in her editorial, in everyday settings "diagnostic labels and criteria were being used creatively in disputes over access to services both by those seeking to obtain services for children (often parents and their lawyers) who could be accused of ‘diagnostic shopping’ and also by those seeking to deny services (often due to financial constraints) who may use particularly restrictive criteria in order to reduce the number of children qualifying for services".

We can't afford to ignore this confused situation any longer. The time has come to have a wider debate on these issues, with the aim of reaching a consensus about how terms are used. The Royal College of Speech and Language Therapists has set up a moderated discussion forum where people can give their views on the best way forward. Please do consider adding your voice: it is important that all those affected by this issue have a say, whether you are a speech-language therapist/pathologist, psychologist, teacher, health professional, legal expert, policymaker, a parent of a child with language difficulties, or someone who has experienced language difficulties. We'd also love to hear from those outside the UK - whether English-speaking or not. You can access the discussion forum here.Finally, to raise awareness of this debate, during the week of 24th-31st August I will be taking over the @WeSpeechies Twitter handle as guest curator. On Tuesday 26th at 8.a.m. BST there will be a live twitter debate on this topic. Feel free to join in, even if you aren't a regular tweeter.

We've had a great week of interactions on Twitter. A transcript for the
week is available here.

I'll look through this and aim to organise the material in due course,
but meanwhile would encourage anyone who is interested to continue the
discussion on Twitter. I'm appending below some tweets that I generated
throughout the week to generate debate.

As noted above, the chat links in to a special issue of the Internat. J
Lang. Comm Dis which is free to access here http://t.co/ncTUaYvyoI.NB it is not all that obvious but there are
10 commentaries after each target article.

If you want to join the discussion on Twitter, feel free to comment at
any time, but, please include the #WeSpeechies hashtag, so we can aggregate
comments easily. Also if your comment relates to a numbered question, please
add Q1, etc so we can relate them.

Monday started with my attempt to summarise each of thetwenty commentaries in a Tweet-length
message.

Lauchlan/Boyle, ed psych view. Must ask: ‘Will label change the child's life
for the better? Aetiology often irrelevant

Bellair et al: community SALTs. No one label works for both research
& clinical. SLI has problems but we can manage them.

Mabel Rice: "SLI has yet to receive widespread adoption in clinical
practice, in spite of the great need for it." critical of DSM5: excluded
"well-researched category of SLI", included SCD, "with a minimal
research base"

Kate Taylor SLP. SLI underidentified. Changing the term won't resolve
the issue, which is one of measurement rather than label.

Conti-Ramsden: Any Consensus Panel on terminology must be international
and include voices from different languages,

Strudwick/Bauer http://t.co/GSY5Xwz283 Concern that labels don't capture
comorbidities; most ch with 'SLI' have other problems

Michael Rutter, psychiatrist "both clinical & research
classifications needed but they require a different approach"

Rutter: Specific’ implies ‘pure’ language impairment; "not
supported by any of the available evidence"

Larry Leonard: Many researchers already use broader definition of SLI:
do not use term to mean children have a pure profile. communicatn with the
public/other disciplines will be even harder if we adopt generic label
‘language impairment.

Parsons et al @wordaware Shockwaves through SALT profession if nonverbal
IQ criteria and delay/disorder distinction removed .Use of marketing approaches
to development of a new term, including consultation with parents &
young people.

Wright: legal perspective Much time spent in tribunal appeals arguing re
labels: eg is it delay or disorder, is it specific?

Questions for debate

On Tuesday we had a live twitter chat with four question topics, and
later in the week, I added further numbered question. Here is the total list –
we'd love to hear your thoughts on any or all of these:

Q1 What is your view on use of the diagnostic label SLI? Does it reflect
a medical model and is this appropriate.

Q2 is What are appropriate criteria for identifying children's language
problems