‘Reporting and Analysis for Improvement through school Self-Evaluation’, universally known as RAISEonline, is the main data tool used by schools and OFSTED. It requires improvement. Whereas it aims to “help schools and inspectors see how effectively a school is performing in terms of the achievement of its pupils,” this task is made more difficult because of the misunderstanding and misuse of basic statistical concepts. Following a meeting I had with Sean Harford and various statisticians from the RAISEonline team and the Department for Education, here are my initial suggestions for the steps required to reform RAISE. 1)Remove SIG+ and SIG- RAISEonline misuses a cornerstone of classical statistics, the significance test. This has always been a controversial procedure, and is much misunderstood by even those academics who use it. In order to use tests of statistical significance, random samples must be drawn from populations. RAISE does not do this and it should not use significance testing. The explanations of significance testing in RAISE are misleading and often completely wrong. In the current version of RAISE, readers are told that, “In RAISEonline, green and blue shading are used to demonstrate a statistically significant difference between the school data for a particular group and national data for the same group. This does not necessarily correlate with being educationally significant. The performance of specific groups should always be compared with the performance of all pupils nationally.” The key phrase used to say, “Inspectors and schools need to be aware that this does not necessarily correlate with being educationally significant.” But even this does not make it clear how different statistical significance is to everyday significance. Everyday significance roughly translates as ‘important’. Statistical significance does not mean ‘importance’. Given the general misunderstanding of what tests of significance actually indicate, it’s no surprise that RAISE struggles to make statistical significance clear. Sentences such as, “In many tables, green or blue shading is used where school results are statistically significantly above or below the national figure,”, “Statistical significance tests have been performed on the data using a 95% confidence interval. Where the school value differs significantly from the corresponding national value for this group, sig+ or sig- is shown. Where a school figure is significantly above or below that of the previous year an up or down arrow is displayed to the right of the figure” (in 2014 Secondary school reports) and “School performance is significantly higher than the national VA figure for this group/ School performance is significantly below the national VA figure for this group” (in 2014 primary reports) seem to show that the concept of significance is not understood by those responsible for RAISE. There is little wonder that teachers, head teachers, governors and Ofsted inspectors struggle to understand that statistical significance does not equate to importance. RAISEonline also uses the phrase “Relative Performance Indicators” in sections of the report, and this appears to be what many people take ‘sig+’ and ‘sig-‘ to mean. Statistical significance does not mean this, and tests of significance should be removed from RAISEonline to prevent confusion. Schools numerical indicators may indeed be relatively high or low compare to national averages, but this is not ‘statistically significant’. Those using RAISE should draw their own conclusions for the relative performance of a school without being misled by bad analysis.‘Sig+’ and ‘Sig-‘, and their green and blue indicators, should therefore be removed from RAISEonline. 2)Remove the word ‘trend’ and present data for at least seven years As I have discussed with the DfE and RAISE representatives, schools are comprised of independent cohorts drawn from specific populations. It therefore makes no sense to label performance indicators as representing a ‘trend’. The tests scores for each cohort are unique to that cohort and represent nothing but the cohort itself. Prior attainment and pupil effects account for 90% plus of an individual’s measured test scores and thus to compare cohorts of individuals across years makes no sense as the cohorts are independent. Data should not be presented in line graphs which imply that the performance indicators from one school year are related in any way to the previous or following years. It would make no sense to compare the 7 year olds a GP sees in one calendar year to the previous year’s 7 year olds. The same logic applies with school year groups. They are not dependent on each other and should not presented as if they were. 2014 RAISE reports state that “all tables show three-year trends, so the extent to which gaps are closing may be seen.” This is highly misleading. Three years is too short a period to see that individual schools years are independent and leads to incorrect conclusions being drawn by RAISEonline users. Additionally, children spend seven years in the majority of schools, not three and this should be a minimum period for data for multiple year groups.Data should be presented for the past seven years as a minimum, in order for more accurate assessments of the differences between cohorts to be made. 3)RAISEonline users should understand the implications of ‘average’ Average, by definition, means the ‘middle value’ since it is used in RAISEonline to indicate the numerical mean of numbers. Half of all schools are therefore, by definition, below average. RAISEonline asks users to consider questions such as: “Is absence below average? How much is it diminishing?Is the proportion of persistent absentees below average? Is it falling?Are levels of exclusion below average?”Is attainment above average?Is progress above average (1000)?Is attainment across each subject family or cluster, such as science, above average? These questions imply that a given school should expect to be either below average or above average in certain areas. A more honest question would be to ask where a school appears to sit compared to an ‘average’ or ‘typical’ school, and the neutral implication of average in the context of RAISE should be made clear.Since most schools are, by definition, close to average, the implications of this should be made much more clear in RAISEonline summary reports. There are further recommendations I will make in future posts. These are the initial areas which I suggest that those responsible for RAISEonline should address. As ever, all comments on the suggestions I have made are most welcome.

‘Sig+’ and ‘Sig-‘, and their green and blue indicators, should therefore be removed from RAISEonline.

Data should be presented for the past seven years as a minimum, in order for more accurate assessments of the differences between cohorts to be made.

Since most schools are, by definition, close to average, the implications of this should be made much more clear in RAISEonline summary reports.

There is a long and involved discussion about Significance Tests here, with lots of useful links https://twitter.com/Jack_Marwood/status/564886308666245120

James Pembroke has also written a useful post on Significance testing on RAISE here: http://sigplus.blogspot.co.uk/2015/02/50-shades-of-green.html

Reply

Jack Marwood

23/7/2015 02:07:41 am

The issue of statistical significance is slowly moving out into the open, and this post from Dave Thompson at the FFT is a useful, grown up contribution to this debate:
http://www.educationdatalab.org.uk/Blog/July-2015/Significance-tests-for-school-performance-indicato.aspx#.VbCgbdB-iNP

Reply

Your comment will be posted after it is approved.

Leave a Reply.

Author

Me?
I work in primary education and have done for ten years. I also have children
in primary school. I love teaching, but I think that school is a thin layer of icing on top of a very big cake, and that the misunderstanding of test scores is killing the love of teaching and learning.