If you enjoy the writing found at this blog, please check out The Research Digest, - http://digest.bps.org.uk/ - where we continue to post new research every weekday, on the workplace and other domains where psychological science matters.

Pages

Thursday, 31 January 2013

Do test cheats matter if you test enough people?

Over the past decade, the cheapness and convenience of online testing has seen its usage grow tremendously. Its critics raise the openings it makes for cheaters, who might take a test many times under different identities, conspire with past users to identify answers, or even employ a proxy candidate with superior levels of the desired trait. Its defenders point to countertactics, from data forensics to follow-up tests taken in person. But the statistical models employed by researchers Richard Landers and Paul Sackett suggest that in recruitment situations, the loss of validity due to online cheating can be recovered simply due to the greater numbers of applicants able to take the test.

Landers and Sackett point out that test administrators normally intend to select a certain volume of candidates through testing, such as the ten final interviewees. The accessibility factor of online testing could allow you to grow your candidate pool, say from 20 to 50. Considering these numbers, its possible to now select those that scored better than 80% of the other candidates, rather than merely those in the top half. And if some of your candidates cheat, oomphing their scores to the 82nd percentile when they only deserve the 62nd, that's still a better calibre than the 50-or-better you would have been prepared to accept from your smaller face-to-face pool.

Landers and Sackett moved from these first principles to modelling out some realistic large data sets containing a range of true ability scores. They considered sets where cheating gave a small (.5 SD improvement) or large (1 SD) bonus to your test score; against this was another factor, how much your natural ability influenced your likely to cheat, from no relationship, r=0, into increasingly strong negative relationships, from -.25 to -.75, modelling the idea that weaker performers are more likely to cheat. And finally, they varied the prevalence of cheating in increments from zero up to 100%.

The researchers ran simulations in each data set by picking a random subset - the 'candidate pool' - and selecting the half of the pool with better test scores. In the totally honest datasets, the mean genuine ability score of selected candidates was .24. but that value was lower for sets that contained cheaters, as some individuals passed without deserving it. Landers and Sackett then added more candidates into each pool, allowing pickier selection, and reran the process to see what true abilities were obtained. In many data sets the loss of validity due to cheating was easily compensated by growth of applicant pool. For instance, if cheating has only a modest effect and is only mildly related to test ability (r= -.25) then doubling the applicant pool yields you genuine scores of .24 even when 70% of candidates are cheating, and higher scores when the cheaters are fewer in number, such as .31 for 30% cheaters.

Great...but wait. there are two important take-aways relating to fairness. It's true that if we're getting .31 averages instead of .24, our selected candidates should be more job-capable, even some of those who did cheat, and that's a win for whoever's hiring. But in the process we've rejected people who by rights deserved to go through. Essentially, this is a form of test error, and so not a uniquely terrible problem, but it's one we shouldn't become complacent about just because the numbers are in the organisation's favour.

Secondly, and as anyone trained in psychometric use will be aware, increasing selection ratios from top 50% to top 25% is no casual prerequisite. Best practice is that without evidence, such as an inhouse validity study, cut-offs on a single test should be capped at the 40th percentile, meaning you pass 60% of candidates. In particular, raising thresholds can have adverse impact on minority groups, on whom many tests still show differentials (although these are closing over time). As minorities tend to make up a minority of any given applicant pool, such differentials can easily squeeze the diversity out of the process before you even get a chance to sit down with candidates and see what they have to offer in a rounded fashion.

Nevertheless, this paper brings a fresh angle to the issue of test security.