Future Look: Real-Time Individual Test Inspection

Do you remember the little pieces of paper you sometimes find in pocket of your new shirt or pants? The paper says something like “Inspected by No. 11.” I believe it simply means that Inspector No. 11 found that piece of clothing to be built properly, and placed the little piece of paper in the pocket. It feels good to me to know that the shirt was individually inspected and found to meet some standard. That’s good quality control and good customer value.

Even though you don’t see such inspection notices on everything you buy, you certainly expect that when you buy something new, it has been inspected before you use it.

So what does this have to do with IT certification tests? Well, let me try to explain it by starting at the end, with the final output of a certification test: the test score.

Unfortunately, some test scores are good and others are bad (that is, defective). They are good if they actually represent the knowledge or skills of the test- taker. Otherwise they are bad. What would cause a score to be bad and not represent the skills of the test-taker? There are several causes, actually:

The person cheated in some way. Perhaps he only used a brain-dump site to prepare for the test. The score would represent what was memorized from the brain-dump site, but not overall knowledge or skill.

The person hired someone else to take the test. The score would then represent the hired person, not the name associated with the score.

The test-taker didn’t take the test seriously, perhaps because she just wanted to see what the questions were like or was sent in by an illegal test-prep company to steal questions.

The person could have been sick during the test, making it difficult to think clearly and work hard during the test.

Something happened during the administration of the test to distract the test-taker, causing strange errors during the test.

So, from my point of view, a score is not good or bad because it is a high score or a low score (although I’m sure that’s one way you could look at it). It is good if it is valid and bad if it is not valid. I would like every score report issued following IT certification exams to say: “This score is valid and can be used for certification decisions.” During the actual test, it would be ideal if algorithms were running that evaluated the performance of the test and test-taker, and were able to judge properly and make that statement. It’s like the clothing inspector looking over the garment and then using the little piece of paper.

A bad test would not even provide the test score. It would simply state, “This test event is not valid.” How can a bad test score be identified and exposed? It’s possible using statistical methods and comparing unusual test events to the majority of normal tests taken.

Normal test-takers–whether they are high scorers, low scorers or somewhere in between–take the test seriously, do not cheat, are not sick, do not hire others to take the tests and are not distracted during the test. They work through the test in a normal fashion, reading each question, spending a typical amount of time to work things out and selecting or creating an answer.

Other test-takers answer questions very strangely, sometimes picking very unlikely answers, sometimes getting very difficult questions correct and missing easy ones. They also spend unusual amounts of time on questions, using less time on questions that should take more time, and vice versa. I saw a test record once where the test-taker spent less than three seconds on each of 50 questions and passed because he got most of the questions correct! How it was done, I haven’t a clue, but I do know that the test produced a score that should not have been accepted and used for certification. Using statistics properly, such a test would be very easy to detect.

It benefits us all, particularly you and the rest of the vast majority of normal test-takers out there, if we can provide a way to evaluate each testing event to make sure the score was properly produced. Unusual test events (such as those produced by cheating, illness or equipment malfunction) will not have the negative impact they have today by allowing defective scores to be used for certification decisions.

David Foster, Ph.D., is president of Caveon (www.caveon.com) and is a member of the International Test Commission, as well as several measurement industry boards.