Wilshaw’s Annual Report Reminds Us of Ofsted Weaknesses

It is the season of heart-warming warming tradition, joy, pre-Christmas sales and crass householders generating enough global warming for a whole town with their shameless lighting displays. So what better way is there to prepare for the holidays than unwrapping the education sector’s annual ticking-off, as Michael Wilshaw issues his Annual Report on Schools?

Wilshaw, forever fighting the urge to tell us how he single-handedly turned his Hackney school into an outstanding beacon of excellence by recruiting middle class students from out of town, this year ripped in to schools with little sixth-forms. Apparently, students attending small school sixth forms “achieve considerably poorer results than those in larger sixth forms”.

Very Small Sixth Forms

Wilshaw said:

Outcomes of school sixth forms are very similar to those for sixth form colleges: between 92% and 93% of their academic cohorts attain at least two substantial level 3 qualifications (such as A levels). However, not all school sixth forms offer a high standard of education.

So school sixth forms do well compared to the sixth form colleges, but there is something missing from the analysis: any statistical data.

There is no indication in any of the data I could find on the Ofsted website, without trawling through thousands of individual inspection reports, of school sixth forms. How many, which grades and how that depends on the subjective factors Wilshaw mentions. In short, there is no way to verify his claim. And his history of position as a government political appointee doesn’t make me want to trust him.

But is there any reason to reject, rather than just ignore, his conclusions that very small school sixth forms are generally poor? Well, yes, there is.

Multipe Comparisons

When your primary measure turns up nothing of interest to you (“School sixth forms do as well as sixth form colleges”) then the temptation is to analyse sub-groups, to see if some poor performing categories are hiding in the data. Such data mining is hazardous, since any statistical test gets weaker when you start doing multiple comparisons.

The principle is straight-forward. Imagine you roll two dice. What is the chance of rolling two sixes in the first throw? It is 1/6 for each dice, so (1/6)x(1/6) = 2.7%. If you do indeed roll two sixes then you can legitimately suspect that the dice are loaded, since there should have been a 97.4% chance of not rolling two sixes. But, if you rolled ten pairs of dice, and one came up with two sixes, does that suggest that pair is loaded? No, since it was expected with greater than 50% probability.

If you would accept a 5% chance of occurrence as unlikely then you can consider it significant if that observation actually occurs. However, if you then repeat the experiment twenty times then you would expect one unlikely observation to happen more often and not. If you delve into a statistically insignificant data-set to look for significant sub-groups, then you must require much smaller significance levels, for example 1% for each of five additional tests. And it is very hard to reach that level of significance with ever smaller sub-group sample sizes.

Sub-group sample size

Only a small proportion of school sixth forms will be very small, so it is unlikely that the differences from the rest will reach statistical significance in their exam results. This has been noted by others already. Sam Freedman, director of research at Teach First and a one-time Gove adviser, said that the “certainty and confidence” of the inspectorate’s claims were “simply not justified by the available evidence”

A final factor is that GCSE grades are not an independent variable that can be reliably used is these analyses. The norm referencing that is used in the setting of grade boundaries, whilst being a good thing for a public examination system, tends to cap the overall attainment of students. It effectively prevents the average from rising, so that if some schools successfully game the exams, with a focus on cramming and exam prep, then other schools will have their results depressed. It doesn’t take much imagination to believe that very small sixth forms will not have the exam experience to be in the lucky former group.

Wilshaw’s Conceit

Michael Wilshaw is convinced that he has a unique insight into the purpose and nature of the nation’s education system and that his intuition is, by definition, the correct one. With this viewpoint he is not willing, or able, to look deeply into the statistics to find evidence that doesn’t conform to his preconceptions and political ambitions. His report is notable, along with the entire Ofsted data website, for the absence of any education research generated from university academics. What does he fear from a clear independent look at the system?