Chapter 3

Controversy Over Polygraph Testing Validity

INTRODUCTION

The validity of polygraph examinations to detect
deception has long been a controversial issue
(cf. 108,136,194,195). Since development of polygraph
techniques almost 80 years ago, their use
both within and outside the Federal Government
has been the focus of numerous judicial opinions
and, as well, legislative and executive branch
debate. Polygraph examinations have been advocated
as a way to ascertain guilt of criminal suspects,
to exculpate innocent suspects, to protect
national security, and to maintain employee honesty.
Polygraph examinations have, at the same
time, been criticized for providing inaccurate and
misleading information, for failing to detect security
risks (167), for interfering with the rights of
private citizens (128), and for lowering employees’
morale. At the center of controversy over the use
of polygraph examinations is the question of its
validity: does a polygraph examination actually
identify truthful and nontruthful individuals?

Recent interest in polygraph examinations and
their validity stems from efforts to broaden Federal
Government use. The Department of Defense
(DOD), in late 1982, drafted revisions to existing
regulations (5210.48). DOD proposed expansion
of the use of polygraph tests for preemployment
screening and periodic or aperiodic testing of
employees who have access to highly classified
information. Currently, only the National Security
Agency (NSA) and the Central Intelligence
Agency (CIA) are able to use polygraph tests in
this way, Expanded use of polygraph testing in
all Federal agencies was made explicit in a Presi-
dential National Security Decision Directive (Mar.
11, 1983, NSDD-84). In part, the directive requires
agencies and departments which handle classified
information to revise existing regulations to permit
use of polygraph examinations as part of internal
investigations of unauthorized disclosure of
classified information. Prior to the directive, investigations
of unauthorized disclosures had to
be referred to the Department of Justice (DOJ).
Employees who refuse to submit to a polygraph
examination could, if NSDD-84 is implemented,
be subject to adverse consequences. In October
1983, DOJ announced that administration policy
would also permit Government-wide polygraph
use in personnel security screening of employees
(and applicants for positions) with access to highly
classified information.

Proposals to expand use of polygraph examinations
to maintain national security have renewed
the debate about the appropriateness of various
polygraph techniques and their ability to detect
deception. In order to provide a context for the
present evaluation of scientific evidence on the
validity of polygraph testing, previous assessments
of accuracy of polygraph testing are reviewed
in this chapter. Legal precedents regarding
polygraph testing and congressional hearings on
its use, both within and outside of Government,
are briefly considered. The chapter also describes
scientific criteria for establishing validity and
reviews other efforts to evaluate
the scientific literature on testing.

JUDICIAL REVIEWS

When courts have been called on to resolve disputes concerned with use of polygraph examinations, they have had to consider both the technique’s validity and whether its use, however
valid, interfaces with other vaIues that the law
seeks to protect. The varying decisions reached
by State appellate courts and Federal circuits (see
8) may in large measure reflect varying beliefs
about the validity of polygraph examinations. Indeed,
for many years, the leading case on the admissibility
of novel scientific evidence (Frye v.
United States (58)) was a case about the admissibility
of polygraph evidence, and the opinion centered
on the question of validity. The issue of how
a court is to decide the question of any scientific
technique’s validity has brought the Frye test into
question in recent years and makes salient the
problem of establishing judicial standards for
assessing validity (60).

Polygraph Findings as Evidence

The Frye case involved a 19-year-old defendant
convicted of robbery and murder. Prior to his
trial, a well-known psychologist and one of the
originators of polygraph testing, Dr. William
Marston, administered a “systolic blood pressure
test" to detect deception (e.g., 114). Dr. Marston
determined, on the basis of this test, that Frye was
truthful when he denied involvement in the robbery
and murder. The trial judge, however, refused
to permit Dr. Marston to either testify about
the examination or conduct a reexamination using
the blood pressure test in court.

Frye appealed his conviction on the grounds
that relevant exculpatory evidence had not been
admitted. The appeals court, however, concurred
with the initial trial court judgment. The court
reasoned that the systolic blood pressure deception
test was validated only by “experimental"
evidence and was not based on a “wellrecognized
scientific principle or discovery." The decision
stated that, “while courts will go a long way in
admitting expert testimony deduced from a well-recognized
scientific principle or discovery, the
things from which the deduction is made must be
sufficiently established to have gained general ac-ceptance
in the particular field in which it belongs.
Just when a scientific principle crosses the line between
experimental and demonstrable is difficult
to define."

Ironically, Frye’s conviction was later reversed
when another man confessed to the crime, thereby
providing Frye with more convincing corroboration
of his denials of guilt. This did not settle the
case, however, and recent discussion of the facts
of the case indicate that Frye was, indeed, guilty.
The crude polygraph examination conducted by
Marston, thus, appears to have yielded an inaccurate
conclusion.

The Frye test is still used as precedent in most
Federal courts. Subsequent opinions (in areas
other than the polygraph) have tried to better define
that line between “experimental" and “demonstrative"
stages of a scientific innovation. For example,
the court in United States v. Stifel (190)
held that “neither newness nor lack of absolute
certainty in a test suffices to render it inadmissible
in court. " In a second case, United States v.
Brown (189), the court also seemed to be concerned
with validity: “The fate of a defendant in
a criminal prosecution should not hang on his
ability to successfully rebut scientific evidence
which bears an ‘aura of special reliability and
trustworthiness, ’ although, in reality the witness
is testifying on the basis of an unproved hypothesis
in an isolated experiment which has yet to
gain general acceptance in its field." The Frye test
has been held to be too high a hurdle by some
trial courts, which have replaced it with the test
for admissibility of expert testimony generally:
“testimony by a witness as to matters which are
beyond the ken of the layman will be admissible
if relevant and the witness is qualified to give an
opinion as to the specialized area of knowledge"
(190).

A closely related question for the courts has
been who should determine whether some procedure
has gained general acceptance in its field,
Some have held that the courts must look to the
judgment of the scientific community (e.g., 191).
In other decisions, the court refused to “surrender
to scientists the responsibility for determining the
reliability of (scientific) evidence, " and that “a
determination of reliability cannot rest on a process
of ‘counting (scientific) noses.'"

Saks and Van Duizend (145) concluded that
whichever set of tests is employed, the courts are
in a weak position to assess validity directly or
to count scientific noses. The result has been:
1) general deference by the courts to the judgments
of scientific communities; and 2) “numerous incongruities
. . . where less reliable scientific and
technological information is admitted but the admission of demonstrably more reliable techniques
is delayed until the requisite consensus has
formed" (145; see, also, 60).

When the courts examine polygraph testing,
they are faced with a series of dilemmas. To which
“particular field" of expertise can the courts turn:
physiology, psychology, polygraph? If they look
to the data themselves, what are they to make of
it? As the present report suggests, validity assessment
involves a complex situation and technique-specific
answer. Even if a final, single accuracy
rate could be established, how should a court use
it. How accurate must a diagnostic or predictive
technique be to be deemed valid for evidentiary
purposes? Regularly admitted psychiatric evidence
is widely recognized (including by the U.S. Supreme
Court, see Addington v. Texas, (2)) as having
accuracy rates comparable to flipping coins
(e.g., 55, 208). In Barefoot v. Estelle (13) the Supreme Court acknowledged that psychiatric predictions
of dangerousness and violent behavior
do not exceed an accuracy level of 33 percent (see
118). Yet, this evidence was held admissible in
Barefoot and sufficiently valid to uphold a decision
to execute a convicted person.

In summary, then, the courts have found themselves
disagreeing on methods to establish validity
for purposes of admissibility of evidence, where
the critical focus of such judgment should rest.
In addition, courts are inconsistent about what
decision to make on the basis of judicial findings
of fact regarding the validity of a diagnostic or
predictive device.

Laws Regulating Polygraphs
in Employment Settings

As described in chapter 2, screening employees
is the most frequent application of polygraph testing.
Many employers argue that use of polygraph
testing for preemployment screening, periodic
checking, and to resolve actual thefts is necessary.
Internal crime has been estimated to cost private
industry up to $10 billion annually (see 172), and
polygraph testing is regarded as a cost-effective
tool. Employers argue that screening applicants,
and periodic checking of employees, are the most
efficient ways to control pilferage, embezzlement,
poaching, and other forms of theft. The need for
polygraph testing is felt particularly in industries
which have high risk of theft and fraud (e. g., commercial
banks), high turnover (supermarkets,
other retail operations), or both.

According to Ansley (8), the use of private polygraph
testing is limited by statute in 18 States
plus the District of Columbia. Most of these laws
seek to protect employees from being requested,
required, demanded, or subjected to polygraph
examinations by their employers. Employers are
reported to be able to find ways around these
laws. For example, employers may tell the employee
that they suspect them of theft, but that
if the employee can find a way to demonstrate
innocence, the employer will not discharge the
employee. In addition to polygraph validity, other
polygraph-related concerns include issues of voluntariness,
invasions of privacy, being compelled
to inform on other employees, inhibiting union
activity, and the polygraph as a cover for racism
and sexism. This list does not exhaust concerns
that have been expressed.

A survey of 143 private firms by Belt and Holden
(25), regarding their use of polygraph testing,
yielded a number of interesting findings. Twenty
percent of respondents reported using polygraph
examinations for preemployment screening,
periodic surveys, and investigations of specific
onsite crimes. It is interesting that of reasons
given for using or not using polygraph tests, users
ranked moral or ethical considerations last and
efficiency first; nonusers, however, ranked validity
and reliability second in importance, cost third,
and the availability of qualified operators fourth
in importance. The survey found a positive relationship
between a State having a licensing requirement
for polygraphers and employers’ use
of polygraph testing. According to Ansley (8), 25
States have licensing requirements for polygraphers;
licensing is optional in one State.

Although there is testimony that use of polygraph
testing reduces employee crime (172), no
formal cost-benefit analyses appear to have been
conducted. In addition, there is no research on
the predictive validity of polygraph results
(72,144). Although employee issues are critical to
proposed Government uses of polygraph testing,
few data are available on Government employees
(see chs. 4 and 5).

One additional area of controversy has concerned
employee rights and employer-employee
relationships. The general matter of invasion of
privacy is particularly pertinent in preemployment
screening and periodic checking. In preemployment
screening, the range of questions that may
be asked has been subject to particularly heavy
criticism. Questions have been reported to include
items concerning union activity, sexual preference,
and family problems (169); and, in addition,
willingness to make a commitment to the job
(144){ and whether the respondent has ever been
tempted to steal (71). During periodic checking,
respondents are sometimes asked not only about
their own possible improper behavior (e.g., underringing
in supermarkets), but also about their
level of job satisfaction, intention to remain with
the employer, and activities of their fellow employees
(204). There is some concern about
whether prejudices of the polygraph examiner
based on racial, ethnic, and gender stereotypes
bias employees’ responses (144). These assertions
do not appear to have been researched. And no
related claims under Title VII of the Civil Rights
Acts have been upheld.

One argument against the use of polygraph examinations
in the employment situation is that it
destroys the trust relationship between employers
and employees, and creates employee dissatisfaction.
However, the few employee surveys that
have been conducted have not supported this argument.
Apparently, five studies have examined
whether the use of the polygraph causes private
sector employees to be dissatisfied. In one study
(144), 96 percent of applicants were willing to take
a polygraph examination to get a job, 86 percent
of the applicants thought the preemployment examination
was fair, and 88 percent were willing
to take it routinely as a condition of employment.
A problem with the study was that applicants
were surveyed immediately after taking the polygraph
examination so they may have thought their
responses were part of the screening process. In
the one known survey of Federal employees, the
Air Force (183a) surveyed individuals who had
volunteered to participate in a pilot project on the
use of the polygraph for counterintelligence/security
examinations. About 99 percent of the respondents
felt that the examination was fair, and
were willing to take an examination for counterintelligence
purposes.

FEDERAL DEBATE OVER POLYGRAPH VALIDITY

Concern about and debate over Federal Government
use of the polygraph have emerged at
several points during the past 20 years. As shown
in figure 1, the history is essentially one of legislative
concern triggered by some executive branch
proposal or action regarding polygraph testing.
The questions raised by Congress have included
constitutional and ethical as well as validity issues.
However, the scientific validity and reliability of
polygraph testing has been and is a central congressional
concern. This chapter briefly describes
the history of Federal Government involvement
with the issue of polygraph validity.

The 1960’s

Congressional interest first intensified in 1963
when controversy developed over an executive
branch proposal to use lie detectors to find the
source of unauthorized disclosures of sensitive or
classified information, sometimes known as
"leaks" (192). The then chairman of the House
Committee on Government Operations asked the
Foreign Operations and Government Information
subcommittee to study the Federal Government’s
use of polygraphs. The study found that, excluding
the National Security Agency and Central Intelligence
Agency (for which information was
classified), Federal agencies had conducted 19,796
polygraph examinations in 1963. In 1964, the subcommittee
held hearings and received testimony
from private polygraphers, researchers, and Federal
officials. In a 1965 report (167), the House
Committee on Government Operations concluded
that there was no scientific evidence to support
the theory of the polygraph, and that the research
evidence as to its accuracy was inadequate. The
committee recommended that further research be
conducted and training for polygraph examiners
be upgraded, and that the President establish an
interagency committee to study and work out solutions
to problems posed by Federal Government
use of polygraphs.

Later in 1965, an interagency polygraph committee
of representatives from DOD, CIA, DOJ,
Bureau of the Budget (now Office of Management
and Budget), Office of Science and Technology
(now the Office of Science and Technology Policy),
and other executive agencies was established.
The interagency committee concluded that: 1)
there was insufficient scientific evidence concerning
the validity and reliability of polygraph testing;
and 2) the use of the polygraph constituted
an invasion of privacy of the individual being interrogated.
The committee recommended that the
“use of the polygraph in the executive branch
should be generally prohibited, and permitted
only in special national security operations and
in certain specified criminal cases" (166). The recommendations
made at that time concerning personnel
screening were promulgated as Civil Service
regulations on regulating the use of polygraphs
in personnel investigations of competitive service
applicants and appointees to competitive service
positions (ch. 736, app. D, of the Federal Personnel
Manual). According to these regulations,
which are still in effect, only executive agencies
with highly sensitive intelligence or counterintelligence
missions directly affecting the national security
such as “a mission approaching the sensitivity
of that of the Central Intelligence Agency"
are permitted to use the polygraph for employment
screening and personnel investigations of applicants
for and appointees to competitive service
positions. All other uses of a polygraph to screen
applicants for and appointees to competitive positions
are forbidden.

The regulations also set forth steps for determining
whether agencies met the criteria of having
a highly sensitive mission, and stipulated that
approval to use the polygraph would be granted
only for l-year periods. Agencies intending to use
the polygraph for personnel screening were required
to prepare regulations and directives meeting
certain minimum standards. The minimum
standards included directives concerning the specific
purposes for which the polygraph may be
used, and directives that a person to be examined
must be informed as far in advance as possible
of the intent to use the polygraph and of the fact
that refusal to consent to a polygraph examination
will not be made a part of the person’s personnel
file.

Also in response to the House Government
Operations Committee’s 1965 report, DOD proposed,
and in part undertook, an extensive polygraph
research program. And in July 1965, DOD
issued directive 5210.48 (177) to regulate the conduct
of polygraph examinations and improve selection,
training, and supervision of its polygraph
operators. Some of the results of the DOD research
program were later reported in a scientific
journal (29), but other reliability and validity
studies proposed were never carried out (183).
Between 1967 and 1973 a number of bills were
introduced which would have either limited the
questions that could have been asked or banned
altogether polygraph use by Federal agencies
(170). None of these bills was enacted.

The 1970’s

Ten years after the 1964 hearings, this same
House Government Operations subcommittee
conducted another review of polygraph use by
Federal agencies (169). In 1974 hearings, the subcommittee
found that the use of polygraphs in the
Federal Government had declined substantially
since 1963. In fiscal year 1973, a total of 6,946
examinations were conducted, including 3,081 by
NSA. This compared to 19,796 in 1963, excluding
NSA and CIA. Tne subcommittee also found that
there was not much additional research on polygraph
validity. The only federally funded studies
conducted had been those reported by the DOD
Joint Services Group (183), and these studies were
considered by DOD to be inadequate for determining
the validity and reliability of Federal
polygraph testing.

In a 1976 report based partly on the 1974 hearings,
the House Government Operations Committee
concluded that “the nature of research undertaken,
both federally and privately funded, and
the results therefrom, have done little to persuade
the committee that polygraphs . . . have demonstrated
either their validity or reliability in differentiating
between truth and deception, other
than possibly in a laboratory situation" (171). The
1976 report concurred with the 1965 report that
“There is no ‘lie detector’ “ (171). Because of the
polygraph’s “unproven technical validity" and the
suggestion that the “inherent chilling effect on individuals
subjected to such examination clearly
outweighs any purported benefit to the investigative
function of the agency," the Committee
recommended a complete ban on the use of polygraphs
by all Federal Government agencies for
all purposes. However, 13 committee members
dissented, asserting both that the hearings had
been held during an entirely different Congress,
and participated in by an entirely different group
of Members, and that, while testimony at the
hearings represented a wide diversity of views,
no witness had urged prohibition of the polygraph
for all purposes. The dissenters urged adoption
of the recommendations originally proposed and
voted on by the members who had participated
in the hearings. These recommendations would
have, in part, prohibited the use of polygraphs
in all cases except “1) those clearly involving the
Nation’s security, and 2) those in which agencies
can demonstrate in compelling terms their need
for use of such devices for their law enforcement
purposes, and that such uses would not violate
the fifth amendment or any other provision of the
Constitution."

The concern with scientific validity and its
implications for the Federal Government’s use of
polygraph testing arose again in 1979 at hearings
held on preemployment security clearance procedures
by the House Permanent Select Committee
on Intelligence, Subcommittee on Oversight (175).
The subcommittee found that there had been insufficient
research on the accuracy of the polygraph
technique in screening job applicants and
that “gaps in the statistics kept by the intelligence
services do not make it possible to make the clear
judgment that the polygraph is unique and indispensable"
(173). The Director of Central Intelligence
(DCI) was urged to conduct a study to validate
the accuracy of the polygraph for preemployment
screening. DCI did conduct a study in 1980
to examine the utility of polygraph tests, but it
was not a validity study (165).
As shown in figure 1, in addition to interest in
Federal use of polygraphs, Congress has shown
interest in the use of polygraph examinations by
private employers, in part because of constitutional
and privacy issues (see, e.g., 169,172, 173;
the Privacy Protection Study Commission Report
(128) mandated by Public Law 93-579; and several
laws introduced since 1967), Various congressional
committees have questioned the validity of
polygraph testing in a private employment context,
in particular as a condition for employment.
Nevertheless, attempts to enact Federal legislation
regulating the use of polygraph examinations by
private employers and/or the Federal Government
have not been successful.

The 1980’s

In the recent past, the executive branch has
again taken initiatives concerning the Federal use
of polygraph testing. In April 1982, a DOD select
panel reviewed the DOD personnel security program
(180) and expressed dissatisfaction because
of inconsistency in polygraph use across component
programs (as did the U.S. Congress (173)),
and the lack of reinvestigations. The panel observed
that military personnel, unlike civilians,
were appointed to NSA and allowed access to
Sensitive Compartmented Information (SCI) without
undergoing a polygraph examination. In addition,
personnel could continue to get clearances
throughout their careers without ever being subject
to reexamination. The DOD panel recommended
a broadened application of the polygraph
for security screening purposes, and selective use
of counterintelligence scope polygraph examinations
during periodic reinvestigations. The panel
noted that the recommended expanded use of the
polygraph would require changes in DOD Directive
5210.48.

On August 6, 1982, the Office of the Deputy
Secretary of Defense (39) issued a memorandum
requiring employees with SCI access to agree to
submit to polygraph examinations on an aperiodic
basis, and revised DOD Directive 5210.48 accordingly.
Later in 1982 and again in early and mid-1983,
further revisions to DOD Directive 5210.48
were drafted (181). In 1983, the President issued
a National Security Decision Directive (NSDD-84)
also authorizing broader use of the polygraph.
Congress responded to these developments by
conducting several sets of hearings, by requesting
OTA and General Accounting Office studies, and
by passing an amendment to the DOD appropriations
authorization bill (S.675) putting a moratorium
until April 15, 1984, on any revisions to
DOD Directive 5210.48 retroactive to August 5,
1982. On October 19, 1983, DOJ announced a
new administration polygraph policy that would
permit further expansion in polygraph use. The
DOD draft revisions, NSDD-84, and administration
polygraph policy are discussed in more detail
below.

Draft Revisions to DOD 5210.48

The draft revisions to the DOD polygraph regulations
have gone through several iterations. For
the purposes of this validity study, a primary proposed
revision (as of the March 1983 draft) is to
authorize the use of the polygraph for determining
initial and continuing eligibility of DOD
civilian, military, and contractor personnel for
access to highly classified information (SCI and/or
special access). The use of the polygraph in determining
continuing eligibility would be on an
aperiodic (i. e., irregular) basis (Ml).

Also, the proposed revisions provide that refusal
to take a polygraph examination, when
established as a requirement for selection or
assignment or as a condition of access, may, after
consideration of all other relevant factors, result
in adverse consequences for the individual. Adverse
consequences are defined to include nonelection
for assignment or employment, denial
or revocation of clearance, or reassignment to a
nonsensitive position.

Technically, these expanded uses of the polygraph
are considered to be part of personnel security
investigations. Use of the polygraph within
DOD is already authorized under the existing 1975
version of 5210.48 for various criminal, counterintelligence,
and intelligence purposes.

A detailed review of the proposed changes is
beyond the scope of this technical memorandum.

NSDD-84

On March 11, 1983, the President issued a National
Security Decision Directive intended, according
to DOJ officials, to help safeguard against
unlawful disclosure of properly classified information.
One provision of NSDD-84 requires that
persons with authorized access to classified information
sign a nondisclosure agreement, and that
persons with access to SCI must also agree to prepublication
review. These provisions are outside
the scope of this memorandum, as is a full analysis
of NSDD-84.

With respect to the polygraph, NSDD-84 in
effect authorizes agencies and departments to
require employees to take a polygraph examination
in the course of internal investigations of
unauthorized disclosures of classified examinations.
NSDD-84 also provides that refusal to take
a polygraph test may result in adverse consequences.
NSDD-84 permits administrative sanctions,
including denial of security clearance, to
be applied even when a person is not subject to
a criminal investigation (184).

Administration Polygraph Policy

On October 19, 1983, DOJ announced a comprehensive
administration policy on Federal agency
polygraph use. The policy authorizes polygraph
testing:

1. as a condition of initial or continuing
employment with or assignment to agencies
with highly sensitive responsibilities directly
affecting national security;
2. as a condition of access to highly sensitive
categories of classified information;
3. to investigate serious criminal cases; and
4. to investigate serious administrative misconduct
cases including unauthorized disclosure
of classified information (185a).

The policy in essence authorizes use of the polygraph
on a Government-wide basis for the expanded
polygraph uses proposed by DOD. Thus,
for example, the policy provides agency heads
with the authority to give polygraph examinations
on a periodic or aperiodic basis to randomly
selected employees with access to highly sensitive
information, and to deny such access to employees
refusing to take a polygraph exam.

SCIENTIFIC VALIDITY AND POLYGRAPH RESEARCH REVIEWS

Thus, recent polygraph policy actions have
renewed interest in and debate over the scientific
validity of the polygraph. Reviews of scientific
literature form the principal means to cumulate
research findings and are especially important in
order to assess the validity of polygraph testing.
Single research studies, no matter how well conducted,
cannot answer global questions about validity
and must be considered in relation to other
evidence. Both because research evidence about
polygraph testing has rapidly increased, especially
within the last 10 years, and because there have
been disagreements about the nature of evidence
about polygraph testing, there have been a number
of such reviews. These reviews are important,
because they are frequently cited in both legal and
legislative considerations and because they serve
to shape future research.

Underlying each of the reviews is the application
of a set of criteria, only sometimes made explicit,
regarding the validity of individual studies
and their implications for overall assessments of
polygraph testing accuracy. As introduction to
the scientific reviews, the nature of these criteria
is described. The reviews, themselves, are then
summarized and a preliminary analysis of discrepancies
among reviews is presented. More detailed
analysis of individual validity studies is provided
in chapters 4 and 5.

Definitions of Scientific Validity

Validity

The validity of polygraph testing means, in
nontechnical terms, accuracy of the test in detecting
deception and truthfulness. The problem of
assessing polygraph validity is especially difficult,
not only because polygraph tests take a number
of forms, but also because validity has different
dimensions and can be measured in a number of
ways. There are, as a result, a number of different
forms of validity associated with polygraph examinations
depending on the type of polygraph
test as well as on its use (e. g., employee screening
v. investigation of a criminal suspect). These
difficulties underlie, in part, the failure to have
developed assessments of polygraph validity that
are accepted by the scientific community.

In order to make explicit the criteria for validity
used in this assessment, below are described several
dimensions of validity and how they are assessed.
This description is based both on standards
for psychological/psychometric tests (cf. 3,5) and
criteria to evaluate research designs (cf. 41,147).
Although criteria for validity can be described objectively,
it should be noted that it is essentially
a qualitative judgment as to whether (or, to what
extent) a given criterion is met. In addition,
assessments of the “preponderance" of evidence
necessary in order to assess the overall validity
of polygraph testing are similarly subjective, In
chapters 4 and 5, a systematic analysis of available
research is attempted, although it should be
recognized that there are a number of ways to
conduct such evaluations, each of which may
yield a somewhat different outcome.

Reliability

Assessment of any test’s validity is based on the
assumption that the test consistently measures the
same properties. This consistency, known as reliability,
is usually the degree to which a test yields
repeatable results (i. e., the extent to which the
same individual retested is scored similarly).

Reliability also refers to consistency across examiners/
scorers. A reliable polygraph test should
yield equivalent outcomes when subjects are retested
and, as well, be scored similarly by individuals
other than the initial examiner. For example,
if a polygraph examiner reviewed a set of
charts and concluded that a subject was deceptive,
any other polygraph examiner should be able
to review the same charts and conclude that deception
was indicated. This illustrates interrater-reliability.
Such reliability might be affected by
the amount and type of training of examiners.

The present study focuses primarily on validity
because if a testing procedure is not measuring
what it purports to measure (validity), it matters
little that it can measure the same thing again
and again. Examiners who consistently agree that
they are seeing “deception" may in fact be measur-
ing anxiety or some other form of arousal. Reliability
is, however, a necessary condition for validity
to be established. A test that is valid will,
necessarily, be reliable.

Construct Validity

Construct validity refers, in broad terms, to
whether a test adequately measures the underlying
trait it is designed to assess. A polygraph test
is designed to detect deception. It is therefore important
to clearly define the construct of deception,
and distinguish it from other concepts such
as guilt.

To measure construct validity, it is necessary
to both describe the construct and show its relation
to a conceptual framework. Construct validation,
thus, requires that a test be based on some
theory or conceptual model. Since different types
of polygraph tests have different theoretical bases
(see ch. 2), there are multiple forms of construct
validity for the polygraph. Construct validity is
established by various means. Most importantly,
based on theoretical predictions of how items
should interrelate or how other tests should inter-correlate,
actual evidence (e. g., scores from similar
tests) is examined. If no such predictions are
possible, it is impossible to establish construct
validity.

Criterion Validity

Although from a theoretical point of view construct
validity is most important, from a practical
point-of-view, criterion validity is the central
component of a validity analysis. This aspect of
validity refers, in the case of polygraph examinations,
to the relationship between test outcomes
and a criterion of ground truth. In this respect,
criterion validity is what is meant by test accuracy.
In the absence of construct validity evidence,
however, it is difficult to determine to what extent
criterion validity data can be generalized. In
some situations, it is not clear which aspects of
a test are responsible for accuracy, and what factors
cause a test to be inaccurate.

Research Design

The above validity criteria are those which are
typically assessed in considering evidence about
the usefulness of a test. A related set of validity
crtieria are also used to evaluate the validity of
any single study design. These research design criteria
include, most importantly, internal and external
validity (cf. 41,147).

Internal validity refers to the degree to which
a study has controlled for extraneous variables
which may be related to the study outcome. External
validity refers to the established generalizability
of a study to particular subject populations
and settings. Internal validity in the case of
a study of polygraph testing is usually enhanced
by the presence of control groups. Typically, such
conditions of an experiment permit analysis of
variables such as different question formats. In
most field studies, internal validity is difficult to
establish since the investigation cannot control or,
in many cases, have definitive knowledge about
whether a subject is guilty or innocent.

External validity is simply the nature of the subjects
and settings tested. The broader the population
examined and the type of setting investigated,
the wider that study’s results can be generalized.
In a parallel way, the more similar the research
situation to the “real life" situation, the greater
a study’s external validity. Evidence about external
validity is developed both from investigations
that test a broad range of subjects and situations
and from investigations that identify subject and
setting interactions with polygraph test outcomes.
The broader the population examined and the
type of setting or the more similar it is to the situation
for which one wants to use a test or a theoretical
construct, the greater a study’s external
validity.

False Positives and Negatives

With any test, the possibility exists of false positives
and negatives. False positives are decisions
that individuals are being deceptive when they are
providing truthful responses, Their charts are
scored as showing a “deceptive" reaction for some
other reason. False negatives are decisions that individuals
are not being deceptive when in fact they
are being deceptive. There are a number of reasons
why such false outcomes might be obtained
and, in part, they depend on the criteria (e. g.,
amount of physiological change) used to indicate
deception or truthfulness.

The rate of false positives or negatives is
sometimes difficult to establish because, in research
studies, a number of criteria for deception/
nondeception may be applied. Thus, for example,
in studies which employ numerical scoring
for polygraph charts, depending on the scoring
system (e. g., cutoff points), different diagnoses
will be made. The rate of false positives and
negatives may also depend on the examiner’s perception
of the “base rate" of guilt/innocence.

In some cases, the examiner will deal mostly
with deceptive subjects (e. g., in certain criminal
investigation contexts) and, thus, may be predisposed
to make false positive diagnoses, In other
settings (e. g., some personnel screenings), an examiner
may test only a small number of deceptive
subjects and, then, may be predisposed to
false negative decisions. Regardless of rates,
assessment of conditions that centribute to either
type of error is a focus of the research literature.

Reviews of Polygraph Validity

Since at least 1973, a number of polygraph researchers
and psychologists interested in physiological
detection of deception have reviewed available
scientific literature to assess the validity and
reliability of polygraph testing. Most such reviews
focus on studies of criterion validity, although a
growing number of investigations deal with construct
validity. The most important difference
among these criterion studies has to do with
whether they are conducted in actual field situations
or in "analog" situations.

Field Studies

For purposes of this technical memorandum,
field studies are those studies or “naturally"
occurring polygraph test situations; i.e., studies
in which the researcher does not exercise experimental
control over the situation in which the
crime or other event occurred. Not exercising experimental
control means that the researcher does
not systematically assign people to conditions of,
for example, guilt or innocence. We refer hereto
“field" studies but others (e.g., 7) use the terminology
“real" cases (v. “laboratory"). Abrams
(1) differentiates between the laboratory and "actual
criminal cases."

In polygraph field studies, polygraph examiners’
decisions are compared against some post hoc
determination of whether suspects are guilty or
innocent; i.e., “ground truth. " These post hoc
determinations may, in different studies, consist
of confessions by the presumably guilty party,
decisions by a panel of attorneys or judges assembled
specifically for a particular study who base
their decisions on investigative files excluding
references to polygraph decisions, judicial outcomes
(dismissals, acquittals, convictions), as well
as other criteria. The fact that determinations of
guilt or innocence are made post hoc makes drawing
conclusions from field studies difficult (126).
In real life situations, truth is seldom available
(62).

Attempts to use confessions, panel judgments,
judicial outcomes, and other criteria as indicators
of truth have their own problems. Individuals
may confess to crimes which they did not commit
(108). In addition, individuals are sometimes
falsely convicted (34). Panel decisions may be generalizable
only to cases in which sufficient investigative
information is available to make a decision
without the addition of polygraph testing.
One can never be certain that the panel decision
is indeed correct, and the panel and the polygraph
examiner may have been exposed to the same
prior information (62). Thus, while field studies
provide the most direct evidence about polygraph
test validity, they have been criticized because
they do not adequately meet the standards of
“ground truth" to establish criterion validity.

Comparison of Reviews

A number of independent reviews (listed in
table 2) of the field evidence on polygraph testing
were assessed in order to determine reasons for
differences among reviews. The reviews differ in
a number of respects. In part, reviewers’ conclusions
differ because they include different kinds
of studies and even different studies (despite, in
several cases, having had the same studies available
to them). In addition, some reviews differentiate
between accuracy in detecting deceptive v.
nondeceptive subjects, emphasizing the problems
of false positives and false negatives; others aggregated
the overall accuracy rates across both
groups of subjects. Finally, there are differences
in the way accuracy rates were calculated, in particular,
how inconclusive are handled. Each of
these differences has important implications for
the conclusions developed by the reviews.

Several reviews (1,81) conducted 5 to 10 years
ago reported relatively positive conclusions based
on an evaluation of the scientific literature.

Abrams (1) in 1973 reviewed reports of the
polygraph’s accuracy dating from 1917, including
anecdotal as well as experimental data. He calculated
approximate estimates of overall accuracy
from this data, noting, however, that “it is almost
meaningless to total and average these findings
because of the great discrepancy in experimental
paradigms and the instruments employed." He reported
that in studies with complete verification
of ground truth, diagnoses were 100 percent correct.
In other field studies prior to 1963 Abrams
calculated an accuracy rate of 98 percent. in
laboratory experiments prior to 1963, Abrams
estimated the average accuracy rate of 81 percent.
Averaging the results of the reports between 1963
and 1973, Abrams estimate of laboratory and field
research accuracy was 83 and 98 percent, respectively.
Horvath’s (6) review in 1976 used somewhat
more stringent criteria in selecting data than
did Abrams. His review does not include an overall
average accuracy rate calculated across studies.

The early positive views of the polygraph’s
worth have recently been challenged by Lykken
(108) and, to some extent, by Ben-Shakhar, et al.
(28). Lykken in 1981 challenged the theoretical
assumptions of the most prevalent question technique,
the control question technique (CQT), and
asserted that an average 50-percent false positive
rate supported his theoretical challenge. Lykken,
however, continues to believe that particular polygraph
techniques are useful (i.e., the detection
of guilt by measuring physiological arousal) and
offers the use of the guilty knowledge technique
as a way to increase overall validity. Adoption
of Lykken’s suggestion would preclude the use of
the polygraph for preemployment testing and periodic
checking.

Ben-Shakhar, et al. ’s (28), analysis also limited
their assessment of the polygraph to CQT. Their
1982 assessment of existing polygraph field research
indicated that polygraph testing was 83 to
84 percent accurate for guilty suspects and 76 to
81 percent accurate for innocent suspects. As a
result, Ben-Shakhar, et al., concluded that examiners
tend to value detection of guilty suspects
highly, even at the risk of falsely classifying innocent
suspects; their conclusion concurs with
Lykken’s. Ben-Shakhar, et al,, in conductng their
review, employ a utility theory approach based
on Bayes’ theorem. They predict dramatically different
utility rates based on different base rate
assumptions.

Although these recent reviews, by authors who
are not professional polygraphers, cast doubt on
the validity of at least the most common polygraph
technique, a more recent review by Ansley
(7) comes to the most positive conclusions since
those of Abrams. Ansley’s 1983 review is an important
review because it represents the views of NSA's, chief polygraph examiner. (NSA conducts
the largest number of polygraph examinations of
any Federal agency, ) As shown in table 2, Ansley
concludes that field research shows a 97.2-percent
validity rate and laboratory research a 93.2-percent
validity rate. Based on these validity calculations
as well as separate calculations for reliability
and utility, Ansley concludes that the polygraph
is “clearly an excellent adjunct to the selection
process."

Unfortunately, for the most part, polygraph reviews
contained in table 2 do not explicitly state
their study selection criteria (see 63). The result
is that a number of different studies have been
included in various reviews, each of which presents
different problems for interpretations of
validity. The kinds of studies include reports of
single criminal investigations in which the actual
solution to the crime is the criterion for validity;
studies in which “blind" polygraph interpreters
compare their polygraph chart evaluations to
“ground truth" as established by confession; and
studies in which the judgment of legal professionals,
actual judicial outcome, or in one case,
the judgment of a single psychologist, is used to
establish ground truth.

Some reviews do specify criteria for exclusion.
Lykken, for example, does not include studies of
single criminal investigations. Abrams, on the
other hand, includes in his review a number of
such studies (e. g.,30,103). Lykken’s reasoning
was that in single criminal investigations, the examiner
has a large chance of being accurate (depending
on the number of suspects) merely by
calling everyone innocent. The fact that other
reviewers do not include Bitterman and Marcuse,
and other such reports, implies that they accept
Lykken’s evaluation of the usefulness of such studies
as indicators of validity. It is possible that
results of such reports could be useful in assessing
polygraph screening of large numbers of individuals
in specific incident cases, such as might
be the case in unauthorized disclosure investigations.
However, additional factors limit the external
validity of Bitterman and Marcuse and
other such studies. In Bitterman and Marcuse, for
example, the investigators were psychology professors
apparently conducting their first polygraph
tests, and they did not use accepted polygraph
procedures or instruments. There are no recent
systematic studies of specific incident investigations
involving a large number of suspects.

There is strong disagreement among reviewers
about whether another group of studies should
be included as indicators of validity, These studies
were conducted with records selected from the
files of the John E. Reid & Associates polygraph
firm. A group of cases was used which the authors
considered to be “verified" by confession of the
guilty suspect (in most cases they were also verified
by some form of corroboration (37)). The
polygraph charts in these cases are then reinterpreted
by a group of polygraphers who are “blind"
to (i. e., do not know) the suspect’s guilt or innocence.
The degree of agreement of the “blind"
evaluators to verify guilt or innocence is the test
of validity. Two reviewers (Horvath, Lykken) explicitly
excluded the group of studies conducted
based on Reid files. Horvath excluded them because
they used confessions as a criterion (confessions
not being independent of the polygraph
examinations), and Lykken because both examiners
and “blind" evaluators were polygraphers from
the same firm. His claim was that the studies were,
thus, “merely demonstrations that Reid’s examiners
score charts in a similar way" (108) and so
were estimates of reliability rather than validity.
However, reviews by Raskin and Podlesny (138)
and Ben-Shakhar, et al. (27), each use all four Reid
studies to assess validity.

Conclusions about the validity of the polygraph
may depend on whether the reviewer attends to
the average accuracy rate or to the accuracy for
guilty and innocent subjects separately. The conclusions
of all decision statistics contributes to the
ability to make an accurate assessment of polygraph
testing validity, particularly in view of the
concern over both high false positive and high
false negative decisions. If, for example, the innocent
correct rate is 80 percent but the remaining
20 percent consists of inaccurately calling
innocent subjects guilty, a different policy conclusion
may be drawn than if the remaining 20
percent consists of“inconclusive" or of false
negatives. In some cases (e. g., preemployment
screening), inaccurately designating nondeceptive
people as deceptive may have worse consequences
for the employee than inaccurately deciding that
deceptive individuals are nondeceptive. In some
cases (e. g., a heinous crime by a potential repeat
offender, infiltration by a foreign agent), a false
negative may have serious consequences.

In only two reviews (Ben-Shakhar, Lykken) are
summary percentages provided in terms of the
percent accurately detected for both guilty and
innocent; in other reviews, these figures are presented
as the average percent of accurate detections.
In some cases, the percent inaccurately
"detected" as nondeceptive (when they were really
deceptive) or deceptive (when they were really
nondeceptive) as well as percent inconclusive
were also reported by reviewers. But for purposes
of clarity these have been omitted from table 2.

Another reason reviews differ about the results
of the same studies is the fact that they make different
decisions about the base rate of subjects or
cases that are included. If, for example, a panel
cannot make a decision about 30 percent of the
cases (e. g., 22), some reviewers will omit the
number of nonagreements from the number included
in the accuracy rate and base accuracy
percentages on only the remaining cases. This accounts
for the difference between Horvath and
Ben-Shakhar, et al., analyses of the Barland and
Raskin results. In other studies (and reviews of
those studies, e.g., Ansley, Abrams) inconclusive
polygraph results are excluded from the analysis.
This has the effect of inflating the accuracy rates.

Apart from the different base rates on which
most of the reviewers calculated accuracy rates
(see above), one source of different accuracy rates
applies uniquely to Ansley (7). In any case in
which there is not 100-percent accuracy, the
Ansley review computes validity by dividing the
difference between the accuracy rate and 100 percent
(the so-called error rate) in half and adds half
of the difference to the accuracy rate. Ansley uses
this procedure on the grounds that on the basis
of chance, errors were probably half in favor of
the panel (or other criterion measure) and half in
favor of the examiners. For example, in the Bersh
study, half of the difference between the typically
reported 92.4-percent rate and 100 percent is
7.6 which Ansley divides in half, leaving a validity
rate of 96.2 and an error rate of 3.8 percent. The
same method is used for the Peters, Elaad, and
Widacki studies, for which the preadjustment validity
rates are 90.2, 96.6, and 91.6 percent,
respectively. Each of these studies, particularly
Elaad (see ch. 4), have other problems of interpretation
as well.

CONCLUSIONS

Central to legal, legislative, and scientific assessment
of polygraph tests are their validity. Yet,
despite many decades of judicial, legislative, and
scientific discussion, no consensus has emerged
about the accuracy of polygraph tests. One explanation
is that scientific criteria for validity deal
with a number of dimensions and that the criteria
vary widely among specific research studies. In
order to assess overall polygraph examination
validity, it will be necessary to examine details
of each of the relevant studies, Such analysis is
presented in chapters 4 and 5.

Another explanation is that polygraph testing
has been viewed as a single technique. Thus,
despite testimony (e.g., 137) which urged differential
consideration of polygraphs used in, for example,
employment screening and criminal investigations,
the scientific evidence for particular purposes
has not been differentiated. As is demonstrated
by the analysis of scientific literature (here
and in chs. 5 and 6), in assessing validity it is
necessary to separate clearly the purposes for
which polygraph examinations are conducted and
the types of techniques employed.