How much do you weigh? I had a fascinating philosophical discussion today at Aviation House, home of everyone’s favourite bogeyman, Ofsted. Sean Harford was quite vociferous on the subject of weight after a sumptuous feast, although Mike Cladingbowl was a little more circumspect about his view of the merits of weighing oneself after a PIXL-style experience. Philip Moriarty and I had been invited into Ofsted’s HQ following discussions we had had online with Sean Harford and members of the RAISEonline team. There were no biscuits, but I baked Lemon Muffins. They were great, if I say so myself.

So, how much do you weigh? Sorry if it seems to be a personal question, but, well, think about the questions we ask of children, which are also a bit of an invasion of personal privacy. We subject children to tests and assessments at earlier and earlier ages, and they don’t get much of a say in the matter. You do, and it’s worth considering what your answer might be. Philip, who is a physicist, made the important point that the instrument used for measuring your weight is important. Not all weighing scales are accurate to the same degree, for example. Most can indicate more than ‘very light’, ‘light’, ‘medium’, ‘heavy’ and ‘very heavy’. And the conditions in which you were weighed would matter too, especially if you access to an Anti-Gravity chamber. We discussed fuzzy numbers, and the fact that a measurement reduced to a number is inherently uncertain. Your weighing scales might be inaccurate, either because of bias or random error. We also discussed the point in time at which you were measured. Clearly, your weight after an enormous blow out celebratory meal would be different to your weight if you had trained for a marathon. So what is your weight? Well, your weight is around about, mostly, sort of, about, well… about what it is. Sometimes more, sometimes less. Depends. What’s more if you weighed all the people on, say, your street at any given time and found a mean weight, it would change the next time you weighed them all. That’s because samples or group means vary around what might be called their true score. Although if your street was full of people under the age of, say, 18, the concept of a true score for mean weight loses quite a bit of meaning. You can see where I’m going here, I hope. We discussed the fact that weighs change, and measurement is troublesome, and means change all the time. Particularly with children. We discussed the presentation of data on student attainment and achievement too. In the world of Ofsted and RAISEonline, attainment is the score children were given in a test or by a teacher’s assessment. Both have serious problems. Achievement is the progress children have made compared to all the other children for whom the powers that be have scores, either as a whole or in smaller groups. It turns out that what we know now as the Data Dashboard was developed at Ofsted’s behest because RAISEonline is so difficult for non-statisticians to read. Governors struggle to understand what RAISE says about attainment and achievement, and the dashboard was an attempt to simplify the data. It was called the Governor Dashboard for a while, but then RMFFT developed theirGovernor Dashboard just to confuse things. With his experience as a parent governor, Philip was interested in the presentation of data in the Dashboards, so I’ll leave comment on that to him. I’m more interested in the way that schools have been compared to each other in RAISEonline and how this is used by Ofsted. To their credit, Mike Cladingbowl and Sean Harford were united in their strong defence of Ofsted’s use of data, and they are adamant that Inspectors haven’t used data to make their minds up before they go into schools. I’m still unconvinced, as we have plenty of evidence that that’s exactly what they do, whether they are aware of it or not. This is the Halo Effect in action, and it’s a big problem when trying to work out what might explain the success – or failure – of any organisation. So, back to your weight. Let’s say that I put you in a school, and the powers that be decide that by finding the mean weight of your class on particular days of the year, they can decide whether your school is any good or not. The rules are fairly well established. Higher mean scores are A Good Thing and lower ones are Not Good. You can imagine what would happen. And let’s assume, for sake of argument, that any given school was compared both collectively and in smaller groups against figures for all of the population for whom we can gather the data. We could also compare the mean for this school year with other school years. We could even draw little graphs of the ‘trend’ in mean weight. Mike Cladingbowl, to his eternal credit, was at pains to point out that he and Ofsted are aware that prior attainment is a huge factor in test scores. Taller, bigger children clearly have an inbuilt advantage in the weight stakes. Sean Harford acknowledged the effect of tutoring on attainment in school, and discussed a school which he had inspected which had good attainment and, in his view, under performing children who were being supported by their external tutors. Mike and Sean were not the only people in the meeting with us. We were joined by Those Who Shall Not Be Named. They were very nice, by the way, but they don’t need to be identified. One person was from the DfE’s statistics team, one from the RAISEonline team and a third appeared in voice only from Mike Cladingbowl’s phone. This was very useful, as these were real live statisticians. What’s more, they had read my blog. And even more than that, they could see the points I have been making about the dubious use of statistics in data crunching exercises such as RAISEonline. I can’t say too much, other than to say that I expect to see changes in the future. The weight of children in a class represents that class and nothing else, and can’t be said to be drawn from a wider population, because schools are located within their geographical region and admit children according to their own criteria. Weight is a function of wider factors located with the child, and the feeding and exercise regime drawn up by a given school has a marginal impact which is similar to the marginal impact added by other schools across the country. Both are drowned out by the pupils’ heights, weights, lifestyles, home life and so on, and on. The measurement of children is fuzzy, and the numbers can’t really be compared to anything other than themselves. I hope that those in power take note of this. Schools need support, not condemnation for factors outside of their control. The fact that one school gets lucky with a cohort's capacity to gain weight says little about schools which do not, especially in an era of increasingly homogenised schools all trying to please those who judge them. Mike Cladingbowl, who is leaving Ofsted at the end of the year to return to the chalkface, mooted the idea that Inspectors would not be given access to Attainment data until the end of the first day of an inspection. This would be a move forward, as Ofsted – if it is to gain any credibility within schools – has to move away from basing inspection on information which is shown to be subjective or simply wrong. This has happened with the grading of lessons to overwhelmingly positive feedback so far. It would be a step towards moving away from using data and trusting the inspection regime to take schools as they see them rather than what the data is ‘telling’ them. Sean Harford, who is taking over from Mike Cladingbowl, put forward the idea of running control inspections by more than one team of Inspectors. He has discussed this previously, and it would potentially help to reduce the wayward subjective input of a single inspection team. At the very least, a pilot to test the validity of Inspectors judgements would useful.The Nameless Ones asked what we should do about ‘data’. My view is that schools all work hard with the children we have in front of us. We teachers do our level best to treat children as individuals, and differentiate according to need. We should do the same for schools, inspecting without fear - of bad judgements based on spurious negative Halo Effects - or favour - good or bad, based purely on Halo Effects. Our children, whatever their weight, deserve that.

The weight analogy is very powerful. I think that in their heart of hearts everyone in the know, knows that the data doesn't/can't deliver what is really required (objective, sharp data to allow effectiveness to be discerned and compared). Once high stakes accountability appears on the scene then things become even more problematic. Government is very used to making the best it can of incomplete data, and incomplete ways of interpreting that data. It is often required in order to actually make policy decisions. But, in this case, the weight of negative unintended consequences has got way too heavy. These are seriously and significantly undermining the prime purpose of the school system. Drive through the full consequences of the deficiencies of using the data currently collected, and the result will be an abandonment of the summative accountability system....most of this data is only of use for formative purposes....and the demise of the high stakes accountability approach. Are Ofsted and Dfe really ready for that outcome??

Reply

Viv

25/10/2014 11:00:58 am

As a school governor I want to be able to trust that official data is telling me something worthwhile and I therefore really welcome RAISEonline/Ofsted/FFT statisticians engaging in the sort of dialogue you have clearly been having. Thank you for all your efforts and I look forward to hearing from officialdom about the changes you refer to.
Small schools do not have the pupil numbers to provide reliable statistical samples in a year group (sometimes even when combining data from several years) yet we are expected to drill down even further to analyse performance of groups – splitting our already small sample sizes to quite ridiculous levels. Schools up and down the country are tempted to mistake natural statistical variation for ‘trends’ and then waste time and resources on misdirected action plans.
An Ofsted clarification paper warning of the dangers of reading too much into data would be an excellent educational tool for school leaders, including governors; advisors; even inspectors?

Reply

Dan

4/11/2014 04:38:02 pm

I am a stressed out Teacher (secondary) and have been reading this and your other blogs with interest - and I think I understand the statistics points you have been making. It is wonderful to read that what has up to now been my uncomfortable gut feeling that "this cant be right" has a basis in mathematical orthodoxy.

What interests me now, is that if (as you hint) those overseeing our education system may be preparing to back away from current practices in the use of children's test score data to judge, what consequences will there be on other data-driven initiatives, for example at school level the focus on "Pupil Premium" performance gaps and of vital importance to teachers, the (so-called) "performance related pay".

It would be marvellous if this blog could "go viral" and ignite a more widespread discussion, which may lead to a return to sanity in the profession. Many thanks for your work, its given me renewed hope.

Reply

Harry Fletcher-Wood

26/11/2014 06:09:44 am

I'm really glad this meeting happened and there are 'real live statisticians' reading your blog. It looks like you were able to raise a number of things that might make it easier for schools and teachers to do their jobs properly: I await results with interest!

Reply

Leave a Reply.

Author

Me?
I work in primary education and have done for ten years. I also have children
in primary school. I love teaching, but I think that school is a thin layer of icing on top of a very big cake, and that the misunderstanding of test scores is killing the love of teaching and learning.