Assessment

From Wikipedia, the free encyclopedia

It has been suggested that
Performance-based assessment be
merged into this article or section. (Discuss)

Assessment is the process of documenting, usually in
measurable terms, knowledge, skills, attitudes and beliefs. This
article covers educational assessment including the work of
institutional researchers, but the term applies to other
fields as well including health and finance.

Types

Assessments can be classified in many different ways. The
most important distinctions are: (1) formative and summative;
(2) objective and subjective; (3) criterion-referenced and
norm-referenced; and (4) informal and formal.

Formative and summative

There are two main types of assessment:

Summative assessment - Summative assessment is generally
carried out at the end of a course or project. In an
educational setting, summative assessments are typically
used to assign students a course grade.

Formative assessment - Formative assessment is generally
carried out throughout a course or project. Formative
assessment, also referred to as educative assessment,
is used to aid learning. In an educational setting,
formative assessment might be a teacher (or
peer) or the learner, providing feedback on a student's
work, and would not necessarily be used for grading
purposes.

Ipsative assessment - Ipsative assessment is a style of
assessment (or testing) in which the tested individual is
compared to him- or her-self through time.

Summative and formative assessment are referred to in a
learning context as "assessment of learning" and "assessment
for learning" respectively.

A common form of formative assessment is diagnostic
assessment. Diagnostic assessment measures a student's
current knowledge and skills for the purpose of identifying a
suitable program of learning. Self-assessment is a form
of diagnostic assessment which involves students assessing
themselves. Forward-looking assessment asks those being
assessed to consider themselves in hypothetical future
situations.

Objective and subjective

Assessment (either summative or formative) can be objective
or subjective. Objective assessment is a form of questioning
which has a single correct answer. Subjective assessment is a
form of questioning which may have more than one current answer
(or more than one way of expressing the correct answer). There
are various types of objective and subjective questions.
Objective question types include true/false,
multiple choice, multiple-response and matching questions.
Subjective questions include extended-response questions and
essays. Objective assessment is becoming more popular due to the
increased use of online assessment (e-assessment)
since this form of questioning is well-suited to
computerisation.

Criterion-referenced and
norm-referenced

Criterion-referenced assessment, typically using a
criterion-referenced test, as the name implies, occurs when
candidates are measured against defined (and objective)
criteria. Criterion-referenced assessment is often, but not
always, used to establish a person’s competence (whether s/he
can do something). The best known example of
criterion-referenced assessment is the driving test, when
learner drivers are measured against a range of explicit
criteria (such as “Not endangering other road users”).
Norm-referenced assessment (colloquially known as "grading
on the curve"), typically using a
norm-referenced test, is not measured against defined
criteria. This type of assessment is relative to the student
body undertaking the assessment. It is effectively a way of
comparing students. The IQ test is the best known example of
norm-referenced assessment. Many entrance tests (to prestigious
schools or universities) are norm-referenced, permitting a fixed
proportion of students to pass (“passing” in this context means
being accepted into the school or university rather than an
explicit level of ability). This means that standards may vary
from year to year, depending on the quality of the cohort;
criterion-referenced assessment does not vary from year to year
(unless the criteria change).

Informal and formal

Assessment can be either formal or informal.
Formal assessment usually a written document, such as a test,
quiz, or paper. Formal assessment is given a numerical
score or grade based on student performance. Whereas,
informal assessment does not contribute to a student's final
grade. It usually occurs in a more casual manner, including
observation, inventories, participation, peer and self
evaluation, and discussion.

Standards of quality

The considerations of
validity and
reliability typically are viewed as essential elements for
determining the
quality of any assessment. However, professional and
practitioner associations frequently have placed these concerns
within broader contexts when developing
standards and making overall judgments about the quality of
any assessment as a whole within a given context.

Testing standards

In the field of
psychometrics, the
Standards for Educational and Psychological Testing[1]
place standards about validity and reliability, along with
errors of measurement and related considerations under the
general topic of test construction, evaluation and
documentation. The second major topic covers standards related
to fairness in testing, including
fairness in testing and test use, the
rights and
responsibilities of test takers, testing individuals of
diverse
linguistic backgrounds, and testing individuals with
disabilities. The third and final major topic covers
standards related to testing applications, including the
responsibilities of test users,
psychological testing and assessment,
educational testing and assessment, testing in
employment and
credentialing, plus testing in
program evaluation and
public policy.

Evaluation standards

In the field of
evaluation, and in particular
educational evaluation, the
Joint Committee on Standards for Educational Evaluation
[2]
has published three sets of standards for evaluations. The
Personnel Evaluation Standards[3]
was published in 1988, The Program Evaluation Standards
(2nd edition)
[4] was published in 1994,
and The Student Evaluation Standards[5]
was published in 2003.

Each publication presents and elaborates a set of standards
for use in a variety of educational settings. The standards
provide guidelines for designing, implementing, assessing and
improving the identified form of evaluation. Each of the
standards has been placed in one of four fundamental categories
to promote educational evaluations that are proper, useful,
feasible, and accurate. In these sets of standards, validity and
reliability considerations are covered under the accuracy topic.
For example, the student accuracy standards help ensure that
student evaluations will provide sound, accurate, and credible
information about student learning and performance.

Validity and reliability

A
valid assessment is one which measures what it is intended
to measure. For example, it would not be valid to assess driving
skills through a written test alone. A more valid way of
assessing driving skills would be through a combination of tests
that help determine what a driver knows, such as through a
written test of driving knowledge, and what a driver is able to
do, such as through a performance assessment of actual driving.
Teachers frequently complain that some examinations do not
properly assess the
syllabus upon which the examination is based; they are,
effectively, questioning the validity of the exam.

Reliability relates to the consistency of an assessment. A
reliable assessment is one which consistently achieves the same
results with the same (or similar) cohort of students. Various
factors affect reliability – including ambiguous questions, too
many options within a question paper, vague marking instructions
and poorly trained markers.

A good assessment has both validity and reliability, plus the
other quality attributes noted above for a specific context and
purpose. In practice, an assessment is rarely totally valid or
totally reliable. A ruler which is marked wrong will always give
the same (wrong) measurements. It is very reliable, but not very
valid. Asking random individuals to tell the time without
looking at a clock or watch is sometimes used as an example of
an assessment which is valid, but not reliable. The answers will
vary between individuals, but the average answer is probably
close to the actual time. In many fields, such as medical
research, educational testing, and psychology, there will often
be a trade-off between reliability and validity. A history test
written for high validity will have many essay and
fill-in-the-blank questions. It will be a good measure of
mastery of the subject, but difficult to score completely
accurately. A history test written for high reliability will be
entirely multiple choice. It isn't as good at measuring
knowledge of history, but can easily be scored with great
precision. We may generalise from this. The more reliable is our
estimate of what we purport to measure, the less certain we are
that we are actually measuring that aspect of attainment. It is
also important to note that there are at least thirteen sources
of invalidity, which can be estimated for individual students in
test situations. They never are. Perhaps this is because their
social purpose demands the absence of any error, and validity
errors are usually so high that they would destabilise the whole
assessment industry.

Controversy

The assessments which have caused the most controversy are
the use of
High school graduation examinations, which first appeared to
support the defunct
Certificate of Initial Mastery, which can be used to deny
diplomas to students who do not meet high standards. They argue
that one measure should not be the sole determinant of success
or failure. Technical notes for
standards based assessments such as Washington's
WASL
warn that such tests lack the reliability needed to use scores
for individual decisions, yet the state legislature passed a law
requiring that the
WASL
be used for just such a purpose. Others such as Washington State
University's
Don Orlich question the use of test items far beyond
standard cognitive levels for testing ages, and the use of
expensive, holistically graded tests to measure the quality of
both the system and individuals for very large numbers of
students.

High stakes tests, even when they do not invoke punishment,
have been cited for causing sickness and anxiety in students and
teachers, and narrowing the curriculum towards test preparation.
In an exercise designed to make children comfortable about
testing, a Spokane, Washington newspaper published a picture of
a
monster that feeds on fear when asked to draw a picture of
what she thought of the state assessment. This, however is
thought to be acceptable if it increases student learning
outcomes.

Standardized multiple choice tests do not conform to the
latest education standards. Nevertheless, they are much less
expensive, less prone to disagreement between scorers, and can
be scored quickly enough to be returned before the end of the
school year. Legislation such as
No Child Left Behind also define failure if a school does
not show improvement from year to year, even if the school is
already successful. The use of
IQ tests has been banned in some states for educational
decisions, and norm referenced tests have been criticized for
bias against minorities. Yet the use of
standards based assessments to make high stakes decisions,
with greatest impact falling on low-scoring ethnic groups, is
widely supported by education officials because they show the
achievement gap which is promised to be closed merely by
implementing
standards based education reform. Many states are currently
using testing practices which have been condemned by dissenting
education experts such as
Fairtest and
Alfie Kohn.

See also

Course evaluation is a series of questions given to
students to evaluate the instruction of a given course.

Evaluation is the process of looking at what is being
assessed to make sure the right areas are being considered.

Grading is the process of assigning a (possibly mutually
exclusive) ranking to learners.

Educational
measurement is a process of assessment or an evaluation
in which the objective is to quantify level of
attainment or competence within a specified domain. See the
Rasch model for measurement for elaboration on the
conceptual requirements of such processes, including those
pertaining to grading and use of raw scores from
assessments.

Educational evaluation deals specifically with
evaluation as it applies to an educational setting.
No Child Left Behind (NCLB) is a government program that
requires educational evaluation.

Educational psychology

Electronic portfolio is a personal digital record
containing information such as a collection of artifacts or
evidence demonstrating what one knows and can do.