Assessment by linear ordering

This appear to be a new trend: proposals to give up traditional assessment of students’ performance and simply rank them in linear order. I am aware of some other proposals of that kind, even more large-scale and dramatic.

This is from an abstract of a paper submitted by Ian Jones of Loughborough University to a BSLRM Day Conference on 11 June 2011 in Leeds:

Using paired comparisons to assess mathematical ability. Mathematics is seen as easy to assess reliably because exams typically comprise lots of short items that can be marked right or wrong. This has a fragmentary effect on teaching and learning in mathematics classrooms. I present results in which an alternative method to traditional marking, called paired comparisons, was used to assess Key Stage 4 mathematics. The paired comparisons method is based on holistic expert judgements of candidates’ mathematical abilities, and is therefore potentially better suited than mark schemes to assess open-ended problem-based tasks. The rank orders of scripts generated by paired comparisons had high reliability coefficients and correlated strongly with rank orders generated by traditional marking. These findings suggest paired comparisons offer promise for improving the quality of school mathematics exams and so reducing the negative backwash effect of the current summative assessment system.

Unfortunately, “paired comparison” of 30 students involves (30 * 29)/2 = 435 comparisons. Therefore use of some sorting algorithm is unavoidable. What are good practically usable sorting algorithms for ordering based on pretty fuzzy expert judgements? For Markov chain approaches, how many comparisons of randomly chosen pairs are needed for getting result within the given confidence interval if individual judgments are made with errors? There was undoubtedly some serious research around that. A random serach in Google gave me a very old reference:

Ordered sets, whether of stimuli or Ss, are characterized by certain regular properties in the relations between pairs of their members. An incomplete matrix of pair-wise preference judgments may thus be completed. The necessary condition is that the relations between adjacent members in the order be known. If the S is interacting with a computer, the computer can choose the pairs to be presented so as to minimize the number of judgments necessary to define the final order. Mental test data have a formal resemblance to incomplete order relations, so that the principles presented here may also be applicable to computer-interactive testing. Computer-interactive systems must take account of the unreliability of human responses and the limited capacity of computers.

Are there around good, practicably usable and mathematically sound software packages that do that? Actually, an on-line Java applet would be great.

Responses

You probably already know this, but: in the chess world there’s something called the ‘Swiss system’, intended for tournaments with many players, and time for only a few rounds. From experience I can say that it works remarkably well.