I would like to take this opportunity to continue the stretching by expanding on a topic they discuss in chapter two, which defines both criterion-referenced testing (CRT) and norm-referenced testing (NRT). A norm-referenced test (NRT) is one in which test outcomes (e.g., grades or pass/fail) are determined based on each examinee’s score relative to the other examinees. Although this practice is uncommon (and arguably unethical) it is occasionally still used today. For example, some of the state Bar exams are norm-referenced. Typically, the top X percent of examinees are awarded a passing mark, regardless of how competent or incompetent the group of test takers was that took the exam together. In other words, if a prospective lawyer was to take an exam along with the most competent group of graduates, then (s)he would have less chance achieving a passing mark than (s)he would have if (s)he took the exam alongside a group of bottom feeders. Does it surprise you that the legal profession would endorse something out-dated, scientifically unsupported and arguably unethical?

On the other hand, a criterion-referenced test (CRT) is a test composed of specific objectives, or competency statements. This type of test is common in licensure and certification. The passing rates for CRTs vary with each test cycle since examinees are evaluated based on their competency relative to a criterion-referenced passing standard (aka cutscore). There are many other attributes of these two types of tests beyond their scoring methodology, and I’ll leave it up to future posts to expand upon these.

One other type of test that Shrock and Coscarelli refer to is a mastery test, a test where most examinees answer the vast majority of the content correctly. K-12 classroom tests are commonly designed this way. The distribution of scores for a mastery test looks similar to this (Insert distribution). I think that it is important to point out that mastery tests are a form of criterion-referenced tests. In other words, Criterion-Reference Test Mastery Test. See below for a visual representation of this.

Types of Tests

So, what do we call a non-mastery CRT? To be honest, I don’t know. I have heard people refer to them as non-mastery tests or non-mastery, criterion-referenced tests.

Mastery tests are useful in the corporate training world where the content domains are small (typically measured in class hours) and the shelf-life of the training programs and tests are generally short (measured in months or years). However, they are NOT optimal for certification (corporate or non-corporate).

Mastery Test Curve

Why should a corporation build a non-mastery, criterion-referenced test? There are two primary reasons.

If constructed properly, non-mastery, criterion-referenced tests provide more information than a simple pass/fail result. Non-mastery, criterion-referenced tests are competency measurement instruments. Just as a ruler measures the length of an object, a non-mastery, criterion-reference test can measure the competency of an individual. This ruler can be used to measure the competency of individuals or the difficulty of the test questions which can provide valuable feedback to the training program or corporation.

When the level of mastery changes, it is much easier to change the level of competency required to achieve mastery, than it is to write new content or a whole new exam.

… would it be accurate to say that a mastery test attempts to be a more comprehensive evaluation of subject matter in a specific area, with a cut-score tied to gross “mastery” of the content area (eg; specified by some % between 70-100%); and conversely that a CRT is prioritized by desired competencies within the subject matter (ie; sampled), and the cut-score operates as a professional judgement about the attainment of those competencies?

Yes, a Schrock and Coscarelli Mastery Test does contain a lot of easy items relative to the average examinee (which is inline with the common use of the term Mastery Test). I agree with you Matt that this would be less than optimal for certification exams, where content domains are large. But, in corporate training, where content domains are small, I would agree with Sharon and Bill that we should design the training and exam content around specific and clearly definable criterion. When executed properly, this should yield a mastery test distribution. The difference between the two scenarios is the size of the content domain and the need to sample a test takers performance from that domain. Bob Hunt touched on that in his comment. I would encourage Bob to use a different term than CRT though. As I stated in the initial post, we really don’t have a good term in the industry, but this is really a non-mastery CRT.

I just found this thread and wanted to make a few comments: 1. The criteria in criterion should be objective and, in a corporate world, job based. 2. Mastery comes in when you expect/want most to succeed–like training an airline pilot. The score for mastery can be determined in a systematic manner using any or all three general approaches to standard setting–rarely done I might add in most companies and I think all schools. The tendency is to rely on passing scores that evolve from somewhere in our school days–70% for me at Central Catholic but way to low for the local surgeon I think. AND the organizations that seek mastery usually rely on systems approaches to the design of training, e.g. Gagne’s Events of Instruction.