This is the second in a series of three posts that explore the philosophy behind some of the Welsh curriculum reform proposals, in particular those relating to assessment (see Part 1). In this post we explore Successful Futures’ focus on formative assessment.

Assessment for Learning

​Principle 7 of Donaldson’s 12 Pedagogical Principles states that ‘Good teaching and learning means employing assessment for learning principles’. This raises questions around the crucial distinction between assessment for learning and assessment of learning. The former is usually referred to as ‘formative’ (identifying pupils’ weaknesses in order to find appropriate next steps), whereas the latter is ‘summative’ (giving an account of what pupils’ have learned). Donaldson observes that the distinction isn’t in the means of the assessment (e.g. verbal questioning vs. a written exam) but rather the use to which it is put – hence the same assessment can be used either formatively or summatively. Navigating the differing emphases of these two forms of assessment has been the cause of much controversy and confusion in education policy. Indeed, the OECD have demonstrated how all assessment systems they’ve surveyed struggle with the tension between their often-competing functions.

Recommendation 37 in Successful Futures advocates that ‘Assessment arrangements should give priority to their formative role in teaching and learning’. Following the popularisation of formative assessment practices by education professors Paul Black and Dylan Wiliam, the Department for Education (DfE) in England has attempted to implement various Assessment for Learning (AfL) initiatives, albeit with minimal long-term success. Daisy Christodoulou argues this is in fact a direct result of government involvement:

​When government get their hands on anything involving the word ‘assessment’, they want it to be about high-stakes monitoring and tracking, not low-stakes diagnostics. That is, the involvement of government in AfL meant that the assessment in AfL went from being formative to being summative…

​Donaldson contends this is precisely what’s happened in Wales, supporting the conclusion of a major OECD report on assessment that ‘…high-stakes uses of evaluation and assessment results might lead to distortions in the education process…’, and that ‘…it is important to design the accountability uses of evaluation and assessment results in such a way these undesirable effects are minimised’. Donaldson argues that because teacher assessments have been used explicitly for accountability at the end of the Foundation Phase, Key Stage 2 and Key Stage 3, their reliability is compromised and ‘there can also be serious adverse effects on the curriculum’. Consequently recommendation 68 of Successful Futures proposes that teacher assessment should no longer be reported to the Welsh Government – a proposal understandably applauded by the profession. It’s also worth noting that Kirsty Williams has now explicitly stated her intention to implement recommendation 68 in her response to the Children, Young People and Education Committee in February this year.

The Assessment Reform Group defines Assessment for Learning as ‘the process of seeking and interpreting evidence for use by learners and their teachers to decide where the learners are in their learning, where they need to go and how best to get there’. The removal of teacher assessment as an accountability measure supports Donaldson’s prioritisation of this practice. However, removing the requirement to report evidence of learning to government does not automatically mean that schools will have a better AfL culture in their classrooms. Practitioners will continue to be held accountable to their school leadership and ultimately to Estyn – they will still be required to provide evidence that their pupils are progressing at the expected rate. Some argue that any system which grades pupils with Outcomes and Levels encourages a culture of data-tracking that distorts the education process in the manner of which the OECD warns, although as we shall see, this critique is by no means unproblematic.

A New Model for Tracking Progress?

​In England the DfE removed National Curriculum levels altogether with the expressed aim of counteracting problematic assessment practices arising from their misapplication. However, levels haven’t been replaced and schools are now expected to devise their own assessment systems for monitoring pupil progress. This has led to a proliferation of systems that essentially offer tracking based on ‘levels with another name’, drawing considerable criticism from various education stakeholders. The recent Education Committee enquiry into primary assessment has illustrated the extent of the confusion schools face concerning assessment policy in England. The situation in Wales is more promising however, in that it offers schools an opportunity to develop a standardised assessment system in partnership, drawing on the expertise of international assessment experts. A key question facing the curriculum ‘Pioneer Schools’ tasked with this is: how will Progression Steps and Achievement Outcomes differ from Outcomes and Levels so that the pitfalls associated with ‘best-fit’ models of assessment can be avoided?

Successful Futures goes some way toward anticipating the ways in which the proposed assessment arrangements will differ from the current system. Donaldson describes learning as ‘akin to an expedition, with stops, detours and spurts’ – consequently, Progression Steps should differ from levels in providing more of a ‘road map’ for children’s learning. Moreover, in contrast to levels that are ‘based on a best-fit judgement of overall attainment in a subject at a specific point in time’, Progression Steps should be seen as a ‘staging post’ rather than a ‘judgement’. They will ‘be reference points and not universal expectations of the performance of all children and young people at fixed points’. Whilst Donaldson’s semantics strongly imply the new measures will avoid the pitfalls associated with levels, it’s not exactly clear from the report how this will be achieved. Especially since, as we would argue, the ‘distorting tendencies’ of assessment data aren’t intrinsic to levels per se, but rather a consequence of their misuse. We know that Progression Steps are to be assigned to the same intervals as the current Phase/Stage transitions and that they’ll be comprised of ‘a range of Achievement Outcomes for each Area of Learning and Experience’, written from the pupils’ point of view. Beyond these specifications, we have scarce concrete information as to how Donaldson’s proposed measures will differ from levels in such a way as to minimise ‘distorting effects’.

Do Descriptor-based Criteria Distort Assessment Data?

Christodoulou argues that the ‘distorting effects’ of best-fit judgements using levels is a problem with descriptor-based assessment more generally. She cites various reasons why it’s difficult to get valid formative data from descriptors. Firstly, they’re based on descriptions of pupils’ performance at different stages, as opposed to an analysis of what we know about how particular skills develop (which is arguably more useful for formative purposes). Moreover, they tend to describe skills in a generic way, so that the descriptors can apply to a range of content that isn’t specified. As we saw earlier, Christodolou argues that the specific detail within each subject ought to be the focus, and that skills need to be broken down into their smaller components to be taught effectively. Finally she contends that because descriptor-based assessment doesn’t distinguish between short-term performance and long-term learning, it doesn’t allow teachers to make valid inferences about whether pupils need further instruction on a given topic. Christodoulou’s suggestions for improving the quality of formative assessments include teaching with textbooks, low-stakes diagnostic testing and ‘comparative judgement’ for assessing pupils’ work.

​Maths Education specialist Christian Bokhove critiques Christodoulou’s appraisal of descriptor-based assessment as failing to distinguish poor implementation of good policy from merely bad policy. Whilst Bokhove acknowledges the drawbacks of descriptor-based systems cited by Christodoulou, he’s less convinced that her proposed alternatives don’t suffer from comparable limitations. Moreover, he highlights the absence of engagement with research aimed at mitigating the drawbacks – something the Assessment Foundation has sought to implement with our assessment tools. Bokhove also sees Christodoulou’s pitting of more generic, descriptive feedback against detailed analysis as a false dichotomy – there’s no reason why good systems wouldn’t employ both. This brings us to a commendable aspect of Donaldson’s assessment proposals. As we’ve seen already, he’s very keen to avoid extremes when it comes to models of assessment that are often polarised by academics. It appears that where descriptor-based assessment has dominated assessment policy, it has been unhelpfully made to serve both formative and summative ends. Donaldson has proposed that teacher assessment not be used as an accountability measure precisely to mitigate against distortions and shift the focus to formative applications, but crucially he calls for the new assessment arrangements to ‘promote the use of a wide range of techniques that are appropriate to their purpose’.

For example, Successful Futures emphasises the importance of both self- and peer-assessment. The former enables children to take more responsibility for their learning and equips them to become lifelong learners who can assess their own progress and discern their next steps. Peer-assessment is useful for pupils because it requires them to cultivate a deeper understanding of the nature of the learning they’re evaluating. Another emerging approach Donaldson commends is online ‘adaptive testing’. This has the advantage of dynamically adjusting the level of difficulty in the sequence of questions to account for patterns in responses, and as such lends itself well to formative applications.
To ensure these and other assessment techniques are applied consistently and coherently, Donaldson supports the OECD’s recommendation that ‘a nationally agreed assessment and evaluation framework’ be developed:

​In line with the wider Review proposals, the framework should ‘…aim to align curriculum, teaching and assessment around key learning goals and include a range of different assessment approaches and formats…’. It should be clear about the formative and summative roles of assessment and distinguish between those activities whose place lies in learning and teaching and those that will contribute to self-evaluation, external accountability and national monitoring. In particular, it should explain how the components of the assessment framework address issues of validity and reliability in the methods used.​

​Having a unified framework for assessment policy across Wales is certainly a laudable prospect. Exactly what this will look like and how prescriptive it should be isn’t specified in Successful Futures. Hopefully the framework will offer clear guidance on mitigating the drawbacks of ‘best-fit’ assessment criteria that Christodoulou has highlighted. Yet as education commentator Phillip Dixon observes, it ‘has become something of a truism… that the problems that Welsh education faces are essentially to do with poor delivery’. With Strand 2 of the curriculum development process underway, the spectre of implementation looms, as we await the first drafts of the AoLEs.

In the next post we’ll consider some of the outputs of Strand 1 of the curriculum reform process, which is now complete. In particular we’ll examine the ‘interim report’ from the Assessment and Progression working group to see if it offers any advance on the questions raised up to this point.