Major Uses of Item Analysis

Item analysis can be a powerful technique available to
instructors for the guidance and improvement of instruction. For this
to be so, the items to be analyzed must be valid measures of
instructional objectives. Further, the items must be diagnostic, that
is, knowledge of which incorrect options students select must be a
clue to the nature of the misunderstanding, and thus prescriptive of
appropriate remediation.

In addition, instructors who construct their own examinations may
greatly improve the effectiveness of test items and the validity of
test scores if they select and rewrite their items on the basis of item
performance data. Such data is available to instructors who have their
examination answer sheets scored at the Computer Laboratory Scoring
Office.

Item Analysis Reports

As the answer sheets are scored, records are written which contain each
student's score and his or her response to each item on the test.
These records are then processed and an item analysis report file is
generated. An instructor may obtain test score distributions and a list
of students' scores, in alphabetic order, in student number order,
in percentile rank order, and/or in order of percentage of total
points. Instructors are sent their item analysis reports from as e-mail
attacments. The item analysis report is contained in the file IRPT####.RPT,
where the four digits indicate the instructors's GRADER III file.
A sample of an individual long form item analysis lisitng is shown below.

Item Analysis Response Patterns

Each item is identified by number and the correct option is
indicated. The group of students taking the test is divided into upper,
middle and lower groups on the basis of students' scores on the test.
This division is essential if information is to be provided concerning
the operation of distracters (incorrect options) and to compute an
easily interpretable index of discrimination. It has long been
accepted that optimal item discrimination is obtained when the
upper and lower groups each contain twenty-seven percent of the
total group.

The number of students who selected each option or omitted
the item is shown for each of the upper, middle, lower and total groups.
The number of students who marked more than one option to the item is
indicated under the "error" heading. The percentage of each group who
selected each of the options, omitted the item, or erred, is also
listed. Note that the total percentage for each group may be other than
100%, since the percentages are rounded to the nearest whole number
before totaling.

The sample item listed above appears to be performing well. About two-thirds
of the upper group but only one-third of the lower group answered the item
correctly. Ideally, the students who answered the item incorrectly should
select each incorrect response in roughly equal proportions, rather than
concentrating on a single incorrect option. Option two seems to be the
most attractive incorrect option, especially to the upper and middle
groups. It is most undesirable for a greater proportion of the upper
group than of the lower group to select an incorrect option. The item
writer should examine such an option for possible ambiguity. For the
sample item on the previous page, option four was selected by only five
percent of the total group. An attempt might be made to make this
option more attractive.

Item analysis provides the item writer with a record of
student reaction to items. It gives us little information about the
appropriateness of an item for a course of instruction. The
appropriateness or content validity of an item must be determined by
comparing the content of the item with the instructional objectives.

Basic Item Analysis Statistics

A number of item statistics are reported which aid in
evaluating the effectiveness of an item. The first of these is the index
of difficulty which is the proportion of the total group who got the
item wrong. Thus a high index indicates a difficult item and a low
index indicates an easy item. Some item analysts prefer an index of
difficulty which is the proportion of the total group who got an item
right. This index may be obtained by marking the PROPORTION RIGHT
option on the item analysis header sheet. Whichever index is selected
is shown as the INDEX OF DIFFICULTY on the item analysis print-out. For
classroom achievement tests, most test constructors desire items with
indices of difficulty no lower than 20 nor higher than 80, with an
average index of difficulty from 30 or 40 to a maximum of 60.

The INDEX OF DISCRIMINATION is the difference between the
proportion of the upper group who got an item right and the proportion
of the lower group who got the item right. This index is dependent upon
the difficulty of an item. It may reach a maximum value of 100 for an
item with an index of difficulty of 50, that is, when 100% of the upper
group and none of the lower group answer the item correctly. For items
of less than or greater than 50 difficulty, the index of
discrimination has a maximum value of less than 100. The
Interpreting the Index of Discrimination
document contains a more detailed discussion of the index of discrimination.

Interpretation of Basic Statistics

To aid in interpreting the index of discrimination, the
maximum discrimination value and the discriminating efficiency are
given for each item. The maximum discrimination is the highest possible
index of discrimination for an item at a given level of difficulty. For
example, an item answered correctly by 60% of the group would have an
index of difficulty of 40 and a maximum discrimination of 80. This
would occur when 100% of the upper group and 20% of the lower group
answered the item correctly. The discriminating efficiency is the index
of discrimination divided by the maximum discrimination. For example,
an item with an index of discrimination of 40 and a maximum
discrimination of 50 would have a discriminating efficiency of 80. This
may be interpreted to mean that the item is discriminating at 80% of
the potential of an item of its difficulty. For a more detailed
discussion of the maximum discrimination and discriminating efficiency
concepts, see the Interpreting the Index of Discrimination
document.

Other Item Statistics

Some test analysts may desire more complex item statistics.
Two correlations which are commonly used as indicators of item
discrimination are shown on the item analysis report. The first is
the biserial correlation, which is the correlation between a student's
performance on an item (right or wrong) and his or her total score on
the test. This correlation assumes that the distribution of test scores
is normal and that there is a normal distribution underlying the
right/wrong dichotomy. The biserial correlation has the characteristic,
disconcerting to some, of having maximum values greater than unity.
There is no exact test for the statistical significance of the biserial
correlation coefficient.

The point biserial correlation is also a correlation between
student performance on an item (right or wrong) and test score. It
assumes that the test score distribution is normal and that the
division on item performance is a natural dichotomy. The possible range
of values for the point biserial correlation is +1 to -1. The Student's
t test for the statistical significance of the point biserial
correlation is given on the item analysis report. Enter a table of Student's t
values with N - 2 degrees of freedom at the desired percentile point
N, in this case, is the total number of students appearing
in the item analysis.

The mean scores for students who got an item right and for
those who got it wrong are also shown. These values are used in
computing the biserial and point biserial coefficients of correlation
and are not generally used as item analysis statistics.

Generally, item statistics will be somewhat unstable for
small groups of students. Perhaps fifty students might be considered a
minimum number if item statistics are to be stable. Note that for a
group of fifty students, the upper and lower groups would contain only
thirteen students each. The stability of item analysis results will
improve as the group of students is increased to one hundred or more.
An item analysis for very small groups must not be considered a
stable indication of the performance of a set of items.

Summary Data

The item analysis data are summarized on the last page of
the item analysis report. The distribution of item difficulty indices
is a tabulation showing the number and percentage of items whose
difficulties are in each of ten categories, ranging from a very easy
category (00-10) to a very difficult category (91-100). The
distribution of discrimination indices is tabulated in the same manner,
except that a category is included for negatively discriminating items.

The mean item difficulty is determined by adding all of the item
difficulty indices and dividing the total by the number of items. The
mean item discrimination is determined in a similar manner.

Test reliability, estimated by the Kuder-Richardson formula number
20, is given. If the test is speeded, that is, if some of the students
did not have time to consider each test item, the reliability estimate
may be spuriously high.

The final test statistic is the standard error of measurement.
This statistic is a common device for interpreting the absolute accuracy
of the test scores. The size of the standard error of measurement depends
on the standard deviation of the test scores as well as on the estimated
reliability of the test.

Occasionally, a test writer may wish to omit certain items from
the analysis although these items were included in the test as it was
administered. Such items may be omitted by leaving them blank on the
test key. The response patterns for omitted items will be shown but the
keyed options will be listed as OMIT. The statistics for these items
will be omitted from the Summary Data.

Report Options

A number of report options are available for item analysis data. The
long-form item analysis report contains three items per page. A standard-form
item analysis report is available where data on each item is summarized on one
line. A sample reprot is shown below.

ITEM ANALYSIS Test 4482 125 Items 112 Students

Percentages: Upper 27% - Middle - Lower 27%

Item

Key

1

2

3

4

5

Omit

Error

Diff

Disc

1

4

7-23-57

0- 4- 7

28- 8-36

64-62- 0

0-0-0

0-0-0

0-0-0

54

64

2

2

7-12- 7

64-42-29

14- 4-21

14-42-36

0-0-0

0-0-0

0-0-0

56

35

The standard form shows the item number, key (number of the
correct option), the percentage of the upper, middle, and lower groups
who selected each option, omitted the item or erred, the index of
difficulty, and the index of discrimination. For example, in item 1
above, option 4 was the correct answer and it was selected by 64% of
the upper group, 62% of the middle group and 0% of the lower group. The
index of difficulty, based on the total group, was 54 and the index of
discrimination was 64.

Item Analysis Guidelines

Item analysis is a completely futile process unless the results
help instructors improve their classroom practices and item writers
improve their tests. Let us suggest a number of points of departure in
the application of item analysis data.

Item analysis gives necessary but not sufficient information
concerning the appropriateness of an item as a measure of intended
outcomes of instruction. An item may perform beautifully with respect
to item analysis statistics and yet be quite irrelevant to the
instruction whose results it was intended to measure. A most common
error is to teach for behavioral objectives such as analysis of data or
situations, ability to discover trends, ability to infer meaning, etc.,
and then to construct an objective test measuring mainly recognition of
facts. Clearly, the objectives of instruction must be kept in mind when
selecting test items.

An item must be of appropriate difficulty for the students to whom it
is administered. If possible, items should have indices of difficulty
no less than 20 and no greater than 80. lt is desirable to have most
items in the 30 to 50 range of difficulty. Very hard or very easy items
contribute little to the discriminating power of a test.

An item should discriminate between upper and lower groups. These
groups are usually based on total test score but they could be based on
some other criterion such as grade-point average, scores on other
tests, etc. Sometimes an item will discriminate negatively, that is, a
larger proportion of the lower group than of the upper group selected
the correct option. This often means that the students in the upper
group were misled by an ambiguity that the students in the lower group,
and the item writer, failed to discover. Such an item should be revised
or discarded.

All of the incorrect options, or distracters, should actually be
distracting. Preferably, each distracter should be selected by a
greater proportion of the lower group than of the upper group. If, in a
five-option multiple-choice item, only one distracter is effective, the
item is, for all practical purposes, a two-option item. Existence of
five options does not automatically guarantee that the item will
operate as a five-choice item.