This page contains the findings of systematic reviews undertaken by review groups linked to the EPPI-Centre

Assessment by teachers

Reliability

High-weight evidence was found for the following:[1]

The reliability of portfolio assessment where tasks were not closely specified was low.

The finer specification of criteria, describing progressive levels of competency, has been shown to be capable of supporting reliable teachers' assessment (TA) while allowing evidence to be used from the full range of classroom work.

Studies of the National Curriculum Assessment (NCA) for students aged 6 and 7 in England and Wales in the early 1990s, found considerable error and evidence of bias in relation to different groups of students.

Study of the NCA for 11-year-olds in England and Wales in the later 1990s shows that results of TA and standard tasks agree, and are to an extent consistent with the recognition that they assess similar but not identical achievements.

The clearer teachers are about the goals of students’ work, the more consistently they apply assessment criteria.

When rating students' oral proficiency in a foreign language, teachers are consistently more lenient than moderators, but are able to place students in the same rank order as experienced examiners.

Validity

High-weight evidence was found for the following:[1]

Teachers' judgement of the academic performance of young children are influenced by the teachers' assessment of their behaviour; this adversely affects the assessment of boys compared with girls.

The introduction of TA as part of the national curriculum assessment initially had a beneficial effect on teachers' planning and was integrated into teaching; subsequently, however, in the later 1990s, there was a decline in earlier collaboration among teachers and sharing interpretations of criteria, as support for TA declined and the focus changed to other initiatives.

The validity of a science project as part of 'A' level examinations for assessing skills different from those used in regular laboratory work was reduced when the project assessment was changed from external to internal by teachers.

Teachers' judgements guided by checklists and other materials in the Work Sampling System were found to have high concurrent validity for assessment of kindergarten (Kg) to Grade 3 students.

Teachers' judgements of students' performance are likely to be more accurate in aspects more thoroughly covered in their teaching.

Conditions that affect reliability and validity [1]

There is bias in teachers' assessment relating to student characteristics, including behaviour (for young children), gender and special educational needs; overall academic achievement and verbal ability may influence judgement when assessing specific skills.

There is variation in the level of TA and in the difference between TA and standard tests or tasks that is related to the school. The evidence is conflicting as to whether this is increasing or decreasing over time. There are differences among schools and teachers in approaches to conducting TA.

There is no consistent pattern suggesting that assessment in one subject is more or less reliable than in another.

It is important for teachers to follow agreed procedures if TA is to be sufficiently dependable to serve summative purposes. To increase reliability, there is a tension between closer specification of the task and of the conditions under which it is carried out, and the closer specification of the criteria for judging performance.

The training required for teachers to improve the reliability of their assessment should involve teachers as far as possible in the process of identifying criteria so as to develop ownership of them and understanding of the language used. Training should also focus on the sources of potential bias that have been revealed by research.

Teachers can predict with some accuracy their students' success on specific test items and on examinations (for 16-year-olds), given specimen questions. There is less accuracy in predicting 'A' level grades (for 18-year-olds).

Detailed criteria describing levels of progress in various aspects of achievement enable teachers to assess students reliably on the basis of regular classroom work.

Moderation through professional collaboration is of benefit to teaching and learning as well as to assessment. Reliable assessment needs protected time for teachers to meet and to take advantage of the support that others, including assessment advisers, can give.

Impact on students [2]

When teachers' assessments are used for external purposes, there is high weight for the following:

Older students respond positively to summative assessment of their coursework by teachers, finding the work motivating and being able to learn during the assessment process.

Students need more help, in the form of better descriptions and examples, to understand the assessment criteria and what is expected of them in meeting these criteria.

The impact depends on the high stakes use of the results.

The impact will be affected by the way teachers interpret their roles as assessors and by their orientation towards improving the quality of students' learning or maximising their marks.

When teachers' assessments are used for internal purposes, there is high weight for the following:

Feedback from earlier assessment impacts on the effort that students apply in further tasks of the same kind; effort is motivated by non-judgemental feedback that gives information about how to improve.

The way in which teachers present classroom assessment activities may affect students' orientation to learning goals or performance goals.

Changing teachers' assessment practices to include processes and explanations can lead to better student learning.

Using grades as rewards and punishments is harmful to students' learning by encouraging extrinsic motivation.

Impact on teachers and the curriculum [2]

When teachers' assessment is used for external purposes, they vary in how they respond to being given the role of assessor and the approach they take to interpreting external assessment criteria; strict adherence to the regulations leads them to be less concerned with students as individuals.

When teachers' assessment is used for internal purposes, the introduction of assessment techniques that require students to think more deeply leads to changes in teaching that extend the range of students' learning experiences.

When teachers' assessment is used for internal purposes, close external control of teacher assessment inhibits teachers from gaining detailed knowledge of their students.

New assessment practices are likely to have a positive impact if teachers find them of value in helping them to learn more about their students and to develop their understanding of curriculum goals; time to experience and develop some ownership of practices enhances their positive impact.

When high-stakes judgements are associated with teachers' assessment, one effect is for teachers to reduce assessment tasks to routine events and restrict students' opportunities for learning from them; high stakes encourage some teachers to give high grades where there is doubt, which may not be in the students' interests.

Shared criteria for assessing specific aspects of achievement lead to positive impact on students and on teaching; in the absence of such guidance, there is little positive impact on teaching and a potential negative impact on students.

Summative assessment by teachers has a more positive impact on teachers and teaching when integrated into practice than when concentrated at a certain occasion.

Opportunities that enable teachers to share and develop their understanding of assessment procedures enable them to review their teaching practice and their view of students' learning and of subject goals; such opportunities need to be sustained over time and should preferably include provision for teachers to work collaboratively across as well as within schools.