Flickr Badge

Saturday, March 10, 2007

Broken statistics: Normalising test scores

The state government recently announced that a common entrance test for admission into engineering courses will be abolished and replaced by normalisation. You see, there are many different boards, each with its own tests of differing difficulties. We need a method for comparing the scores from different boards for processing admission to colleges.

Lets say that we have two boards, Board A and Board B. The scores in the two boards are as follows

Board A

Board B

100

98

100

83

98

82

95

81

93

80

Looking at the above scores, it seems that the test paper for Board A was a lot easier. If we compare the scores directly, Board A students have an unfair advantage. That's why we need to normalise the scores in order to properly judge the relative merits of the students from different boards.

The proposed normalisation scheme (according to the newspaper) is as follows:

The ratio between the highest marks constitutes a multiplication factor

This multiplication factor is applied to all the scores

To get back to the example, the multiplication factor is 100/98. The table now looks like this

Board A

Board B

Normalised Board B

100

98

98x(100/98) = 100

100

83

83x(100/98) = 84.7

98

82

83.7

95

81

82.7

93

80

81.6

Now we can see why this method is so broken. Although a casual glance tells us that Board B's test paper was a lot harder, the scores after normalisation have hardly changed! This is because one person got a good score of 98. This single data point is an exception to the rule, yet it has influenced the process so much as to render the normalisation completely meaningless.

This is an example of broken statistics. The top mark is usually an outlier and its a bad idea to calculate statistics of some data based on the outlier values.

I'm pretty amazed that they adopted this method of normalisation. Surely some statistician must have brought up this issue??

So what can be done?

I'm not a statistician, but here are some ideas that come to mind.

Fitting to a normal curve

How this works is to take the top mark and map it to 100, take the bottom score and map it to 0, and then map the intermediate scores based on a normal distribution with mean 50 and some experimentally obtained standard deviation. The two distributions can then be compared.

Drawbacks: This only works if the score distribution is normal! Usually it is not. The graph is generally skewed towards higher marks, as there are a lot more people passing the test than failing it. A common mistake is taking a non-normal distribution and fitting it to a normal curve.

Percentiles

Another scheme that is used is percentiles. The percentile is the percent of people who scored below you. So a 95 percentile means that 95% of the population who took the test are below you. Or in other words, you are in the top 5% of the population. Then, instead of comparing the absolute marks, you compare the percentiles.

This is like comparing rank, except that it normalises the fact that different number of students might have taken the two tests.

Drawbacks: A big drawback with percentile is that it can break near areas of high density. Take the above example again

Score

Percentile

98

80

83

60

82

40

81

20

80

0

As you can see, only 4 points separates the 0 percentile with 60 percentile. Of course, the effect is pronounced in this example because the sample size is so small. The same thing happens to a lesser degree in larger samples if the data is very dense in certain parts of the distribution.

Conclusion

Neither of the above solutions are particularly satisfying. Both introduce distortions of mapping one distribution onto another. In one case we are mapping a non-normal distribution to a normal one, in the other we are mapping it to a linear distribution.

The ideal solution would be to find out the actual distribution for test scores. Once that is done, both sets of scores can be equalised using the parameters of that distribution and compared. Since the distribution will be the same for both sets of test scores, the mapping will not introduce any distortion and the comparision will be fair.

No comments:

About Me

I am the founder of Silver Stripe Software where we develop web based SaaS products. We've developed three products - Tool For Agile suite of products for teams that follow a lean or agile process, Tour My App a product for SaaS developers to provide in-app guided tours for their users, and Sequence, a tool to take actions based on user behaviour.
I do a bit of programming, some photography once in a while and like to do some cooking at times.