Archive for April, 2018

I wasn’t planning to write about the Chartered College Of Teaching again. Nobody involved seems to care about my criticisms, so I’m sure that when I write about it the only effect is that I publicise them and probably get them a few more members.

But no blogger can resist the chance to say “I told you so”, so I have to comment on the bizarre saga of Greg Ashman’s article on metacognition which has been all over his blog and Twitter lately.

Back in 2014 I wrote about plans for a new professional body for teachers. I discussed at length what it would take for it to be something other than a new version of the despised GTC(E) and the potential problems if it tried to represent too many interest groups, or particular ideologies rather than the profession. Some supporters of the plans suggested that if the College focused on disseminating research and evidence then that could avoid ensure that it wasn’t seen as a partisan interest group. I wrote this blogpost explaining how debates around the use of evidence were actually highly contentious and partisan. I gave examples of views of other people involved in education about education research and concluded:

Now if you know anything about my views, and what I consider to be the evidence that underpins them, I find it impossible to imagine that my disagreements with any of the above can be resolved by reference to evidence. I am not arguing here that I cannot be part of a College of Teaching which includes people with views like those above, but I am certain that no amount of evidence or research is going to allow us all to support a single College of Teaching that claims to be promoting “what the research shows”. Research and evidence are divisive, not unifying, forces in education.

The last few weeks have served to illustrate this. The Chartered College of Teaching has a publication called Impact. According to their website:

[Impact] supports the teaching community by promoting discussion around evidence in the classroom, and enabling teachers to share and reflect on their use of research.

Some great people are involved with it, and although I haven’t read it, I’ve heard wonderful things about the first issue. The second issue was to be edited by Jonathon Sharples of the EEF (the organisation I blogged about here). Among the topics they requested articles on was “Metacognition, self-regulation” which is one of those broad educational ideas (see “thinking skills”, “oracy” and “creativity” for other examples) which people build all sorts of teaching ideas around, without any teacher ever being clear precisely what it covers. The EEF has been promoting metacognition, for reasons that are somewhat mystifying, for a while now. Teacher, Greg Ashman, pointed out the problems with the EEF’s allegiance to this idea in a blogpost in January entitledIs ‘metacognition and self-regulation’ an actual thing?

Greg, without hiding his cynicism, suggested that this might end up being the main focus of the issue of Impact and was told “submit an abstract”.

As the call for papers says, we are looking for articles on lots of different topics relating to developing effective learners, including the relationship between skills and knowledge. Submit an abstract.

… the category of meta-cognition and self-regulation seems to have been stitched together from a range of different beasts, much like the mythical chimera… Practitioners should therefore be wary of any simplistic claims made for this category of intervention…

(The abstract can be found here). The abstract was accepted, and he was asked to write the article. He wrote it. It then went to peer review. It is at this point things got a little odd. One reviewer, Dylan Wiliam, claims to have said:

“The article is provocative, but essentially well-argued, and worth including as a prompt for debate. It would be great if someone from EEF could respond in a subsequent issue, because then it would mark out Impact as a forum for debate.”

The first sentence of this was sent to Greg. Another reviewer, in another brief comment claimed Greg’s argument was not clear. Another of the three peer reviewers, however, sent two pages largely arguing against Greg’s position on the grounds that a review of the wider literature would find that the approach the EEF had used was well-established, and therefore Greg was wrong to think that what the EEF had done was “astonishing”.

Reading these three reviews (you can find them with 4 other reviews here) you can’t hope but noticing that a lot of the problems are about a lack of clarity about what Impact is for. One reviewer recommended Greg’s article as a provocation for debate. One was scathing that it did not address the wider ideas of education researchers, but addressed only something that had been aimed at teachers. I can see both points of view, it all comes down to whether Impact is for teachers to debate ideas that affect them, or for education researchers to discuss research. I think this reflects the lack of clarity about what the Chartered College of Teaching is for; is it for teachers or for educationalists? Additionally, reviewers do not seem to have been clearly asked whether the article should be rejected, amended or accepted. Had it done so, it would have been 2 to 1 against, and perfectly legitimate to reject it outright.

Instead, Greg was asked to amend his article. He did so. It was then sent out to 4 more reviewers. I don’t know why. Worse, these additional reviewers did not seem to address the issues raised by the first reviewers, but commented on the tone of the piece and raised new issues about accuracy, which Greg didn’t have time to respond to (although he is now of the view that none of the points about accuracy were correct).

As a result, Greg could not address the further peer reviews and the article could not be published. A confused peer review process, and apparent confusion about the purpose of the journal, had served to exclude an interesting article, and a perspective relevant to teachers, from the journal, although not from the website. Greg, who I think had been sceptical from the beginning about whether the journal would ever accept his work, described on Twitter and in blogs what had happened.

Then two further odd things happened.

Firstly, supporters of the College began criticising Greg. It was assumed that he was bitter about rejection, rather than concerned about the process (which seemed to have wasted his time). People implied that Greg was so desperate to be published, that he was a bad loser seeking revenge for a personal blow to himself and his credibility. Given that Greg’s latest book is available for pre-order here; given the number of other people willing to publish Greg’s views, and given the praise from Dylan Wiliam and others for that article, such a line of attack seems implausible as well as unpleasant.

Secondly, the Chartered College of Teaching Twitter account commented on the matter in a long thread. In the thread they falsely claimed “At no point did reviewers take issue with the opinions”.

All reviews of the submission focused on accuracy, balance & tone to ensure it was a fit for the journal. At no point did reviewers take issue with the opinions. All agreed it was an important discussion. We agree & that's why we'll be including it, in full, on our website (2/6)

Since then, Greg has released the peer reviews and proved that this was false, and been attacked for that. Numerous supporters of the College have argued that a publicly funded professional body making a false statement about a teacher is not as unethical as a teacher releasing the evidence that the statement is false. More bizarrely, others have, apparently sincerely, claimed that the arguments the reviewer made against Greg’s views was not “taking issue with his opinons”.

Yes, really. Highlights of that discussion include people claiming that Greg’s opinions were not opinions but assertions, conclusions or claims and that arguing against his opinions was not taking issue with them, but challenging them, objecting to them or discussing the words used to express them. At times, those who defended the false statement could not even remember which bit of sophistry they were currently using:

These tweets, highlighted by Greg, are by a professor of education

I’m left amazed at the cult-like devotion to the Chartered College Of Teaching that exists among a small minority (many of whom aren’t teachers) who are willing to make themselves look silly rather than admit the College’s mistakes. I’m left appalled that the College has still not apologised to Greg for the false statement. I’m left grateful to Michael Fordham for this Twitter thread discussing the rights and wrongs of peer review in professional publications. But mainly I’m left smugly saying “I told you so”. Interpretations of evidence cannot unite the teaching profession. A professional association for teachers needs something else to underpin it other than research. I would suggest a belief in teacher professionalism would be a far better basis for building a professional association for teachers.

Like this:

There are two contrasting elements to the way schools respond to bad behaviour and to responses to wrongdoing in society generally.

One is that of justice. Those who cause direct harm to others, undermine legitimate authority, or deliberately violate rules for their own ends, deserve negative consequences for themselves. Criminals deserve to go to prison, or pay a fine or whatever. Those who mistreat or betray those around them, whether that’s their colleagues, friends or family, deserve a diminished relationship with those around them (either temporarily, or in the worst cases permanently). Badly behaved children deserve a detention, or to lose a treat, or whatever.

The other element is behaviour change. We want undesirable behaviour to stop. We want criminals to stop committing crime. We want friends who let us down to become more reliable. We want an inconsiderate spouse to become considerate. We want a badly behaved child to become well-behaved. We also want others, who see the results of undesired behaviour, to be deterred from that same behaviour.

Both these elements are essential.

If we ignore justice, then we undermine the extent to which we are responsible for our own actions. We are not treating people as if they have chosen their actions, if we do not think that they deserve to lose out for deliberate wrong actions. We can temper justice with mercy, but we cannot reward wrongdoing, or punish virtuous acts. Without justice we would also lose all sense of proportionality in our responses. If the only thing that will deter people from dropping litter is the death penalty, then if all we cared about was changing behaviour, execution would be legitimate. Or if the only action that would change a litter bug’s behaviour is chopping off a hand, then amputation would be legitimate. Justice, however, requires some correspondence between the harm (or potential harm) of behaviour and the sanction that it warrants. Justice accepts that it would be better for somebody to continue doing small wrong, and continue to suffer small, but deserved, punishments for it, than for them to be sanctioned so severely that they would be traumatised into the right behaviour, but with a large increase in the overall level of human suffering. Equally, it is justice that tells us that we should not attempt to change behaviour by appeasing or bribing wrongdoers. Perhaps a burglar would change his ways if given a million pounds; perhaps a rapist would stop their crimes if they could be provided an unending supply of consenting sexual partners, but justice demands that those who would harm others should not be “bought off”. There should not be rewards for a willingness to do wrong. Finally, it is through desert – through the notion that some things are deserved – that moral judgements are most clearly communicated. To say somebody can do wrong with impunity is to say the authorities, or the community, does not really believe those actions to be wrong, that either the rules and interests of the community don’t matter, or that violation of them is not a moral matter.

Equally, if we ignore behaviour change then we commit ourselves to writing-off those who once do something wrong. We would not be recognising that to fall short is normal for human beings and accepting that we can all do better. We would be failing to help support those who want to change, despite the common sense notion that our behaviour often becomes a habit and we often need help and encouragement to break free of our bad habits. We would also be ignoring the possibility of reducing the amount of wrongdoing. This would be both irrational (if actions are genuinely wrong, we would want fewer of them) and harmful to the community.

I believe that virtuous, rational, individuals designing a system of criminal justice, or rules for a club, or the behaviour system for a school, would attend closely to both these considerations. We would ask what sanctions are deserved and what systems communicate a clear moral judgement. But we would also ask what is likely to change an individual’s behaviour and deter similar behaviour on the part of others. However, we are not virtuous, rational individuals. We cannot easily separate moral judgements from what they say about ourselves. We are not content simply to aspire to be virtuous, we also seek to demonstrate our virtue to others. We like to show that we are kinder, more merciful, more just, than others and a situation like the above, where we have two aims, gives us that opportunity. When arguing over a system or an action, we can pick whichever of the two aims of justice and behaviour change best justifies our favoured course of action, and ignore the other. In fact, we can go further than ignoring the aim that weakens our position, we can deliberately misinterpret it.

Educated middle class people like ourselves, can easily imagine what it means to be only concerned with justice, but not changing people. We can easily picture somebody with no concept of mercy, no element of forgiveness, no belief in the improvement of the human condition. The political demagogue who has no positive vision of society, and is only interested in settling scores with those they consider to be the villains of the piece, is an archetype liberals can immediately bring to mind. Their “justice” actually causes harm and resentment, all the more so if we think those they target are actually just scapegoats.

However, we are far less adept at challenging those who would ignore justice. Those who would never hold somebody responsible for their actions. Those who would be outraged at continuing to punish somebody when it was clear that their behaviour was not changing. Those who would appease and excuse even the worst among us, rather than denounce them. And most of all, those who would see any notion of desert as indistinguishable from revenge. So pronounced is this tendency, that words such as “retribution” or “punitive” that originally referred to deserved punishment, are now widely understood to refer to revenge.

In schools, this is where a lot of problems lie. It is not universally accepted that children are responsible for their actions. It is not universally accepted that an important part of what needs to be done about wrong actions is moral judgement and punishment. And so, we often try to talk about behaviour without using the appropriate moral terms. Like the rest of society, we no longer know that “retribution” ever meant something different from revenge. Some are so confused about the word “punitive”, a word that literally refers to punishment, that they talk about non-punitive punishments. Some will avoid the word “punishment” or “sanction”, when “consequence” is a far less loaded term. Some will avoid the word “discipline”; why else would the phrase “behaviour management” ever have been coined? The word “sin”, one that so perfectly described the normal moral failings of humanity, is now seen as a relic of a superstitious part. “Moral” itself is often seen as an inappropriate and emotive term. One prominent progressive does not even approve of rewards, despite rewards being the more positive side of desert. Almost any term can be rejected as “unhelpful” or worse as “a label”, when people are signalling their virtue. And where words are not banned, they can be redefined, with “restorative justice” being one of the concepts most popular with those who oppose justice. And don’t get me started on those who seem to think the whole concept of reward and punishment was invented by behaviourists in the 1950s.

There’s little obvious to be done here, but next time you hear somebody say something along the lines of:

“I don’t believe in X, but I do use Y”

where both X and Y refer to deliberately inflicted undesirable consequences for breaking a rule, challenge it for the pious waffle it really is. Nobody really rejects “punishments” in favour of “consequences”; we just call it a consequence when we do it, and a punishment when somebody else does it. Nobody really eschews “discipline” in favour of “behaviour management”. Nobody actually replaces “detentions” with “time for reflection”. You either punish, or you let kids get away with it.

Like this:

I’ve written a couple of posts recently about OFSTED (What OFSTED still needs to do and OFSTED and Workload) which brought up the issue of workload. I identified two problems in particular that relate to marking. Firstly, OFSTED look to see if school policies are being followed consistently, even if those policies add to workload. Secondly, OFSTED inspectors look for evidence of students responding to feedback. As a result schools are introducing marking policies that involve teachers having to elicit responses from students when they mark books, then mark those responses. This is often referred to as “triple marking” (as the same piece of work may be visited three times).

While “triple-marking” is not necessarily a bad thing – teachers will legitimately want to help students draft and redraft work on some occasions – having to mark this way consistently has workload implications. Also, for such marking to happen consistently, teachers will have to carry out this process even where they see no benefit for their students. I have seen this happen in multiple schools, and, unlike some fads, it is not simply being done by the worst managers. Even managers who really care about workload and are doing everything they can to make the process easier, are still feeling obliged to introduce such policies because they expect inspectors to be looking for responses to feedback.

According to the blogpost about triple-marking I linked to above (and partially confirmed by the photo caption on this article) at one point OFSTED had clarified this matter and said:

Ofsted does not expect to see a particular frequency or quantity of work in pupils’ books or folders. Ofsted recognises that the amount of work in books will often depend on the age and ability of the pupils.

Ofsted does not expect to see unnecessary or extensive written dialogue between teachers and pupils in exercise books and folders. Ofsted recognises the importance of different forms of feedback and inspectors will look at how these are used to promote learning. [My emphasis]

However, more recent versions of the mythbusting guidance just say:

Ofsted does not expect to see a particular frequency or quantity of work in pupils’ books or folders. Ofsted recognises that the amount of work in books and folders will depend on the subject being studied and the age and ability of the pupils.

Ofsted recognises that marking and feedback to pupils, both written and oral, are important aspects of assessment. However, Ofsted does not expect to see any specific frequency, type or volume of marking and feedback; these are for the school to decide through its assessment policy. Marking and feedback should be consistent with that policy, which may cater for different subjects and different age groups of pupils in different ways, in order to be effective and efficient in promoting learning.

This redraft seems to have replaced a clear statement that triple marking is not necessary with one that emphasises consistency with a policy, and scrutiny of the effectiveness of feedback, which would explain why schools seem to have gone backwards on this issue.

Andrew Old: As long as OFSTED look for evidence of a consistent marking policy *and* students responding to feedback it will continue.

Sean Harford: It’s up to schools to have a sensible assessment policy: inspectors inspect against the policy Andrew. If schools carry on with triple marking policies then that’s what inspectors will look at. Nobody at Ofsted looking for triple marking.

Andrew Old: But you look for evidence of students responding to feedback don’t you?

Sean Harford: Not necessarily written feedback – see para 163 of the handbook, fifth bullet point. That could be ascertained by talking to pupils and teachers.

Andrew Old: But inspectors will be looking for it in books too, won’t they?

Sean Harford: Not necessarily; that depends on the school’s assessment policy.

Andrew Old: So inspectors won’t be looking in books for kids responding to feedback *unless* the policy implies that’s where it is to be found? And there should be no disadvantage in not making that part of the policy?

Sean Harford: Absolutely.

The bullet point Sean referred to is in a section that begins:

Inspectors will make a judgement on the effectiveness of teaching, learning and assessment in schools by evaluating the extent to which:

And then lists a number of bullet points, including the one Sean pointed out:

assessment information is used to plan appropriate teaching and learning strategies, including to identify pupils who are falling behind in their learning or who need additional support, enabling pupils to make good progress and achieve well

I think the problem lies in the very next bullet point:

except in the case of the very young, pupils understand how to improve as a result of useful feedback, written or oral, from teachers

When I had the Twitter conversation, I was delighted; the agreement that “there should be no disadvantage” to schools that are not triple-marking was particularly welcome. That said, it is still going to be down to schools to come up with and enforce marking and feedback policies that fit what OFSTED want when they judge the extent to which “pupils understand how to improve as a result of useful feedback”. While I can be happy that this doesn’t have to be “triple-marking”, I don’t think I know what that would look like.

Like this:

One of the many educational bodies Michael Gove dispensed with was the GTC(E), a government funded compulsory professional body for teachers, best summed up by Tom Bennett as “an expensive magazine that could sack you”. In line with a lot of the Gove reforms, even before this was carried out, people were already looking for a way to turn the clock back. A movement among the education establishment began, for a new teacher’s professional body, one that was more independent and not compulsory. An attempt to fund it through donations failed to reach a target of £250 000 and only raised pledges for £21000, and before long it was announced that it would indeed be another government funded body, with promises of a frankly ludicrous £5 million from government.

Having blogged a lot during the initial discussion about what sort of organisation it should be, I haven’t had much to say since it became clear that it was not the sort of organisation I thought teachers needed. I would prefer a grassroots organisation, dominated by classroom teachers, steering clear of educationalists and consultants, and concentrating on tapping into the expertise that already exists in the profession rather than looking to the people who already tell teachers what to do for more of the same. There are good people involved, and there are things going on that I can be positive about, but what is being created is still looks to me like what the education establishment thinks teachers should want, not what teachers actually want.

In particular, I remain critical of:

the existence of “associate members”, members who aren’t actually teachers;

the complete lack of attempts to balance SLT and unpromoted teachers within the organisation;

The establishment of the Chartered College of Teaching is an important development. The college opened in January 2017. The Department is providing funding of up to £5 million over four years. In the longer term, it expects the college to be self-sufficient through membership fees (currently £45 per year) and income from its activities. The college aims to recruit as quickly as possible and has a target of 18,000 members by April 2018.

It’s now April 2018 and so I have been asking about membership numbers:

It would appear they would have to grow by 50% just to reach the target they have already missed. And if 12 000 still seems impressive, remember the level of funding of the organisation and please note that it was indicated in January that 1000s of those members were students with free memberships, hundreds were non-teaching “professional affiliates”:

Hi James, thanks for your question and apologies if we missed a previous request. We currently have 6,100 paid memberships and the remainder are students. 600 of these are professional affiliates. We're delighted that teachers at all stages of their career are joining!

I do wish them well (honest), teachers do need a wide variety of organisations doing different things. I do, however, wonder how this particular organisations merits £5 million of public money.

Here’s my suggestion for how to improve teacher professionalism. Instead of giving £5 million to one organisation, give it directly to classroom teachers in the form of a voucher each for professional development, and let us decide which organisations and events to spend it on. The Chartered College of Teaching could compete for that money, but so could researchED, subject associations or any other group involved in working with the profession. Let’s empower the profession by trusting teachers to decide on their own professional development needs.

Like this:

A couple of days ago, I wrote this post The EEF were even more wrong about ability grouping than I realised which revealed that the Education Endowment Foundation, a government funded body looking at what works in schools, had through a series of mistakes and poor decisions, managed to turn a positive effect size of 0.12 for “ability grouping” into a negative effect size of -0.09 for “setting and streaming” and managed to obscure that they had done this. This figure was then widely shared with schools as evidence in favour of mixed ability teaching, for instance, headteacher Stuart Lock told me that the claim that the effect size was negative was shared with 150-200 headteachers at an event organised by a Regional Schools Commissioner.

Nobody has found any problem with the content of my analysis. The result was shared by schools minister Nick Gibb in this tweet:

This blog provides an excellent example of teachers engaging with research. Here, @oldandrewuk follows up his previous blog on the @EducEndowFoundn evaluation of setting and streaming, challenging what he believes to be some methodological errors: https://t.co/cqfZeZidHo

In my experience it has been very common to hear from educationalists, those conducting education research or teacher training, that setting didn’t work. So how did some current and former educationalists react to a teacher revealing a factual error in the work of some researchers?

Well, here are some replies and comments about Nick Gibb’s tweet.

From a research fellow at a school of education at a British university:

Absolutely scandalous! Not because teacher research shouldn’t be respected, nor because EEF is infallible & above criticism, but because Gibb opts for a mere blogger whose views fits his own, rather than beginning with a professional research outfit who just might be competent.

To make this absolutely clear, I am outraged at where this politician chooses to focus his attention. I am not saying that EEF is always right, and I am not saying that no blogger can ever offer a valid critique. But my focus is on the politician in this instance, not the blogger.

Those seeking evidence-informed teaching surely focus on professional research! If EEF or any other set of researchers offer flawed work then it should be critiqued-usually by other professional researchers. Why on earth is Gibb spending time on any blogs in this connection?

Of course politicians should listen to teachers.. it’s scandalous that they haven’t for years! But Gibb looks to blogs for academic research – for sure some bloggers are strong researchers, but shouldn’t Gibb’s first port of call be professional research in professional contexts?

…. it’s a matter of where Gibb should be spending his time and attention. And *normally* complex robust research needs the kind of professional space and vetting that academic publishing should provide, rather than a blog.

I agree that politicians are entitled to ideology.. and indeed educators are so entitled too. What is unacceptable is muddling up what research allegedly supports with ideology, and attempting to give the result of the muddle some kind of official status

If Gibb has found academic research that he sincerely believes to expose flaws in EEF research, then that is what he might helpfully highlight in the usual way that we use to refer to such research, rather than a blog.. don’t you think?

It’s not me you should be thinking about, but one of our Education Ministers, and whether you think he would be guilty of ‘academic snobbery’ if he first looked to professional researchers rather than bloggers so as to inform his education policies.

Gibb knows that potentially his comment can be seen by many teachers and parents, and may reasonably be used by citizens as providing clues as to ongoing educational policy development. He’s not unaware of the possible consequences of his posts

From somebody who had previously been a PGCE lecturer:

Sir. Do you not think that in your position you should be a little more impartial in sharing your views? I increasingly believe that you share only only views that support your own beliefs and that you dismiss evidence that contradicts those.

… to give your backing to one controversial blogger who is renowned for being rude and dismissive and less that accurate in his assumptions is not compatible with a Schools Minister’s role…

The one marvellous thing about having met with you and been on the receiving end of some of your own prejudiced and unfounded attacks but now retired is that I can question with courtesy but without fear.

I am not actually even expecting neutrality. Just reflection and balance which – and I may be wrong – is not neutrality but a mature, professional and more honest way of working with teachers and formulating Policy..

But I think you have to be a little bit more careful, researched and balanced when you hold high office. You cannot pick and choose according to personal prejudice.

No one was actually criticising content. Not in my [timeline]. No one was criticising [Old Andrew] personally (tho I openly feel his attacks and use of language are sometimes inappropriate which can damage his credibility) but many are concerned that Gibb will not look beyond his own beliefs.

This is about a Schools Minister who In my first hand experience makes Policy based on personal prejudice and those of his own unofficial advisers. This is what troubles me. He is blinkered beyond what is acceptable in his position.

It was really about questioning the selective way in which a Schools Minister promotes some bloggers and ignores and dismisses others and many teachers and academics. Personally I am interested in fairness, balance, accuracy and courtesy.

From “Senior Associate Innovation Unit” with an OBE for services to education (no evidence was given as to how it was cherry picking):

The responses to this tweet from people who actually teach, or research, for a living, should be concerning. You seem incapable of impartiality when it comes to people who say ‘evidence shows’. So don’t be surprised when people cynically dismiss your cherry-picking

From a senior lecturer in primary mathematics education (and no she did not give any evidence of cherry picking when asked):

If cherry-picking at/of research to support a pre-formed position, refusing to acknowledge a clearly-stated population focus, and claiming rigorous research to be ‘really pretty terrible’ with no justification is “excellent engagement” I’m teaching my students all wrong…

From a former lecturer in media and education (possibly paraphrasing others, but elsewhere they said the concerns were “entirely valid”):

As I understand it …. the issues are twofold. First, the seemingly special relationship between a blogger and a schools minister. Second, how this relationship elevates personal research (however informed this might be) above peer-reviewed/professional?

Okay, I admit, I have cherry picked the responses. Several educationalists were far more positive (particularly those more on the quantitative side of things), but I think the above suggests a certain tendency within the education establishment to see teachers as very much the junior partners in the enterprise of finding out what works in education, or letting politicians know about it.

Like this:

For quite some time now, if I mentioned my support for setting, people would refer me to the EEF toolkit. This supposedly neutral source looked at the meta-analyses and found that setting or streaming had a negative effect, supported by evidence of “moderate strength”. In fact, the EEF found a negative effect size of -0.09. Leaving aside all issues related to whether this is a good way to evaluate the issue, this was a surprising result given that John Hattie, in his book Visible Learning, had also looked at the meta-analyses, and found a positive effect of 0.12.

I explored this in this blogpost and found that the toolkit referenced the following meta-analyses.

It was noticeable that this seemed to hinge on the three meta-analyses that found a negative result. There were issues with all 3, but the biggest issue was that the most negative of the effect sizes, Gutierrez, R., & Slavin, R. E.’s (1992) finding of -0.34, was wrong. They actually found a positive effect size of 0.34. Now one would assume that this would make a huge difference to the -0.09 figure. I pointed this out.

However, setting and streaming is a unique Toolkit topic in having meta-analytical evidence showing a split positive/negative impact depending on pupils’ level of attainment. Presenting an overall average, which is the usual Toolkit approach when there is evidence of varying effects, would mask the fact that the impact was actually negative for lower attaining pupils, who are disproportionately from disadvantaged backgrounds.

Indeed, in the first version developed by Durham University for the Sutton Trust as the Pupil Premium Toolkit (2011), the impact estimate was presented as “+1 / -1”, in order to communicate this variability of impact according to pupil attainment. This approach had merits, but risked being confusing for teachers and senior leaders.

We therefore present the headline estimate for low-attainers, and then use the Toolkit text to explain the variation (and clearly state the source of the estimate). This is something we continue to review as we develop the Toolkit.

In the interests of transparency, by the way, we do want to highlight that we have corrected a ‘typo’ in the online version of the Toolkit – one of the meta-analyses referenced in the technical appendix has been incorrectly shown as -0.34 when it should have read +0.34. Our thanks to Andrew Old, whose blog highlighted the mistake. We’re happy to reassure users of our Toolkit that this was a transposition error only and so does not in any way affect the impact figures we report on setting and streaming.

It took me a while to work out what they were claiming, as it seemed so unlikely and the implications so ridiculous. It appears to be that:

The -0.09 figure is only based on the figures for low attainers.

This was clearly stated.

This would mean that the mistake with the -0.34 figure would not affect the result.

It would also mean the -0.09 figure was even more ridiculous than I had thought.

Considering the first point, if the figure for setting and streaming was based only on the figures for low attainers, then we are talking about evidence from only 2 meta-analyses. One of these, the one with the larger negative effect size, (Lou et al, 1996), was not about setting or streaming; it was about grouping children within classes, a form of mixed ability teaching. Which leaves us with one negative effect size of -0.06 (described by the author of that study as “close to zero”) and an effect size of -0.12 that isn’t actually for setting or streaming, being combined to get -0.09 for setting or streaming. We have a completely bogus figure which is rated as evidence of “moderate strength” against setting and streaming, while ignoring all the evidence that found positive effect sizes. And this has been achieved by cherry-picking the data to ignore studies which did not specify “low attainers”.

There are six meta-analyses suggesting that setting or streaming appears to benefit higher attaining pupils and be detrimental to the learning of mid-range and lower attaining learners. [my emphasis]

Within two weeks of that blogpost, the EEF released a (really pretty terrible) report on maths teaching, which included a section on ability grouping which repeated the -0.09 figure and claimed:

The EEF ‘Setting or Streaming’ toolkit draws on six meta-analyses (in addition to a range of single studies and reviews).

I did look at the original version of the Toolkit from 2011 mentioned in the blogpost. This cleared up one of the things that had most confused me, namely, how some unusually sensible academics from the CEM at Durham could have been involved in such flawed figures. The answer is that, while they were in my view too negative about ability grouping, they did, as described above, give two figures, a positive one for the average effect of setting (one that actually agreed with Hattie) and a negative one for low attainers. They also referred to “ability grouping” not “setting and streaming”, making the inclusion of Lou et al (1996) less of an obvious mistake.

So to summarise the origins of the negative effect size for setting and streaming in the EEF toolkit, we appear to have the following events:

Researchers in 2011 find a positive effect size for “ability grouping” but a negative effect size for low attainers from the 2 meta-analyses which specified low attainers. They report both figures.They include within class ability grouping in their studies, rather than just studies of setting and streaming.

This result is subsequently attributed to “setting and streaming” not “ability grouping” on the EEF website despite the figure for low attainers being dependent on a study which didn’t look at setting.

Also subsequently to point 1, the positive effect size from the 6 studies is removed and only the result for low attainers reported as a headline figure, without making it clear that only 2 meta-analyses (only one of which was relevant) were used for this figure.

The EEF website continues to claim incorrectly that the figure is based on 6 meta-analyses.

A “typo” which adds an extra larger negative effect size to the 6 studies, serves to conceal the fact that the figure does not reflect the 6 studies it is attributed to.

The EEF blog inaccurately claims that they “clearly state the source of the estimate”. Within a fortnight of this they publish a report claiming again to have used all 6 studies.

Now, to be honest, every one of the errors listed above could be an honest mistake. But every single one of them either misrepresented the research on setting and streaming, or obscured the source of the claims made. And that’s quite a few mistakes in the same general direction to have been made without some form of bias over the issue of setting, as well as a lot of carelessness.

I think the time has come for the EEF to admit they have really screwed up here, and misled a lot of people. I have heard their incorrect figure quoted in schools. I have seen it quoted in blogs. I have seen it quoted on Twitter. It is probably the most widely publicised result they have. And it’s wrong. And they still include it on their website. It needs to be withdrawn and replaced with the positive effect size of 0.12 for ability grouping that their researchers actually found.