!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
CHANCE News 3.15
(16 Oct to 4 Nov 1994)
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Prepared by J. Laurie Snell, with help from Jeanne
Albert, William Peterson and Fuxing Hou, as part of the
CHANCE Course Project supported by the National Science
Foundation.
Please send comments and suggestions for articles to
jlsnell@dartmouth.edu
Back issues of Chance News and other materials for
teaching a CHANCE course are available from the
Chance Web Data Base in the Multimedia Online
Document Library at the Geometry Center
(http://geom.umn.edu/) or from the Geometry
Center Gopher (geom.umn.edu) in Geometry Center
Resources.
=======================================
A career is nothing to leave to chance.
American Statistical Association
=======================================

FROM OUR READERS
We have no suggestions from our readers this time but
we owe thanks to Goran Djuknic for submitting a previous
article "How Numbers Can Trick You" This should remind
you that some of the best items have come from our
readers!

FROM THE INTERNET
Since we are still talking about the "Bell Curve" you
might want to read the article "The Case for
Conservative Multiculturalism"part1part2part3
by Herrnstein and Murray in the October 31 issue of
New Republic. It can be read on www from the Electronic
Newstand (gopher://gopher.internet.com:2100/11/).
You can also find a lot of discussions of the "Bell
Curve" on news groups such as sci.stat.consult and
sci.pychology. There were also some interesting
letters to the editor in The New York Times Book
Review on November 13 about their review of "The
Bell Curve" . I will mention some of these next time.

A reader asks Marilyn whether he was being "taken" in a
dice game, where he bet on his throw of a single die
coming out higher than an opponent's. The reader gives
a symmetry argument that led him to think the game was
fair: "If he [opponent] throws a 1, there are five
numbers higher (2,3,4,5,6); but if he throws a 6, there
are five numbers lower (1,2,3,4,5). If he throws a 2,
there are four numbers higher; but if he throws a 5
there are four numbers lower. And if he throws a 3,
there are three numbers higher; but if he throws a 4
there are three numbers lower. It looks the same to me..."
Marilyn correctly points out that the reader's chances
of winning were only five out of twelve. The above
argument overlooks ties, on which the opponent wins.
<<<========<<

"The public needs to understand that the DNA finger-
printing controversy has been resolved. There is no
scientific reason to doubt the accuracy of forensic DNA
typing results." The article cites these remarks by
Eric Lander and Bruce Budowle made in their article in
the current issue of "Nature". The remarks are
particularly noteworthy because Lander had been a critic
of the lack of scientific standards for DNA
fingerprinting in earlier use. And Budowle, chief
scientist for the FBI, has been a staunch defender of
its use. The latest standards, however, use deliberately
conservative assumptions about the possibility of a
"four-point match" (match at four genetic loci), which
can still provide odds of more than six million to one
against accidental misidentification. These are based on
the "ceiling principle" recommended by the 1992 report
of the National Research Council which may be described
as follows.
If a population is made up of sub-populations with
different gene frequencies, then the independence
assumption cannot be assumed. The NRC proposed that gene
frequencies be determined for the various possible sub-
populations. Then, to estimate the probability for
agreement at several sites, choose, for each gene, the
highest frequency among all subpopulations and multiply
these "worse case" probabilities. This is called the
ceiling principle.
The authors suggest that the independence computation
could also be given in the trial, but with the
understanding that the truth might well be somewhere in
between this and the estimate given by the ceiling
principle.
James E. Stars, professor of law and forensic sciences
at George Washington University, still criticizes the
standards, maintaining that "it's not scientists who are
supposed to give the benefit of the doubt. It's the
jury and the law that are supposed to give the benefit
of the doubt."
DISCUSSION QUESTIONS
1. What do you think Stars meant by his remark?
2. Another report on this article asserts that a
spokesman for the FBI once said that it had done six
proficiency tests without an error and so its error rate
was zero. How high an error rate could they have, in
fact, and still have a reasonable chance for six tests
in a row to be without an error?
3. Do you think this article was timed to affect the
Judge's decision in ruling on the admissibility of DNA
evidence in the Simpson trial? If so, is that bad?
<<<========<<

This article cites a number of recent cases in which
scientific evidence plays a role in court decisions: if
there is a 1 in 40,000 chance that the recovered DNA is
not O. J. Simpson's, does this constitute reasonable
doubt? [see preceding article]; which of the
conflicting studies on cancer risk and cellular phones
is to be believed?; does Gulf War Syndrome exist?
The article reviews the Frye admissibility standard for
scientific evidence and the 1993 Daubert v. Merrill Dow
Pharmaceuticals case, after which the Supreme Court
ruled that judges needed to be more active gatekeepers
rather than leaving juries to sort out scientific
claims.
The article concludes with the following interesting
quote, relevant to the previous article. With regard to
DNA evidence, Albert A. Scherr of the Franklin Pierce
Law Center notes: "When juries make their decisions,
it's always been more like the decision of whether to
marry someone. Relying on a number makes it a lot
easier, and more like buying a house. If you put a
number on reasonable doubt,--if you make the decision so
quantifiable--you'll totally change the legal system."
DISCUSSION QUESTION
Do you think a number can be put on reasonable doubt?
If so what number would be reasonable?
<<<========<<

A study in the Journal of the National Cancer Institute
reports that the annual risk of breast cancer for a 40-
year-old women is 0.6 per 1000 among women who have had
an abortion, as compared to 0.4 per 1000 for women who
have not.
Higher risk was associated with abortions performed when
women are younger than 18 or older than 30. The risk
was not found to be associated with number of abortions
or number of live births or miscarriages. These
findings were base on interviews with 845 breast cancer
patients and 961 healthy women of the same age group.
This is another example of a kind of study that
epidemeologists call "case control studies". In the
last Chance News, we asked how the relative risk is
determined in such a study. I confess that I did not
know the answer but it appears to be as follows:
The "relative risk" of an abortion for breast cancer is:
P(breast cancer|abortion)/P(breast cancer|no abortion)
This can be written as the product of the two terms:
P(abortion|breast cancer)/P(no abortion|breast cancer)
and
P(no abortion)/P(abortion).
We can estimate P(abortion|breast cancer) by the
proportion of the breast cancer cases who say they had
an abortion. We estimate the P(abortion) by the
proportion of the control cases who said they had an
abortion. (This is reasonable if the incidence of
breast cancer in the population is realtively small)
From these we can estimate the two terms whose
product estimates the relative risk.
DISCUSSION QUESTION
(1) The findings are preliminary, and a co-author of the
report said it would be "premature" for women to make
any abortion decision based on the study. Do you think
the difference found in the study is large enough to be
taken seriously?
(2) Other reports on this study described the outcome of
the trial by saying that the risk of breast cancer for
women under 40-years old who had an abortion was 50%
higher than for a women who had not had an abortion. Is
that consistent with the information given in this
report? Which method of reporting the results do you
think is most informative?
(3) A commentator on this study remarked that the small
difference could be due to errors from the interview
process. She suggested that a person who had had cancer
would be more apt to admit having an abortion than one
who did not. Why might this might be true?
(4) In an observational study the relative risk is directly
estimated from the outcome of the study. How might you
think of this case control study as an observational study?
<<<========<<

This article describes a report by Arnold Barnett of MIT
Sloan School of Management, called "How Numbers are
Tricking You" which appeared in Oct issue of MIT
Technology Review. Barnett says the ideal way for
readers to be better served by numbers is to use the
newspaper as a prod for future research--assuming the
newspaper gives the source. Even when this is not
possible, common sense is called for. [This almost
sounds like a plug for a CHANCE course!]
Barnett cites an example of ambiguity from an Associated
Press story which appeared in the Globe last spring [and
in Chance News], reporting that outbursts of anger can
double your risk of heart attack. Since the conclusion
was based only on interviews with recent heart attack
victims, Barnett wonders about healthy people who get
angry. Might not they benefit from letting off steam in
a way that reduced their heart risk?
The author gives the readers lots of good advice about
how to handle articles that include statistical
analysis, concluding with: "Don't be frightened by
statistics, just apply common sense."
DISCUSSION QUESTION
The article gives a number of cases where people have
challenged the Globe's use of statistics. One example
is the following: An editorial plugging handgun
controls, mentioned that deaths by firearms exceeded
38,000 a year. A reader suggested it would have been
helpful to mention that the number is more than twice
the one usually given because it includes 18,526
suicides. Do you agree?
<<<========<<

The current New England Journal of Medicine reports the
result of the study that was stopped early, showing
that, when a pregnant women who has the HIV virus takes
AZT, it helps prevent the virus being passed on to the
child.
The study involved 477 pregnant women infected with the
HIV virus. Half were given the AIDS drug AZT and the
other half a placebo. Only 8.3% of the children of
women who took the drug were infected with the virus as
compared to 25.5 percent of the children of women who
had the placebo.
Editorials in the NEJM recommend that HIV testing of
pregnant women be encouraged but should be voluntary.
Experts are quoted on both sides of the issue of
mandatory testing of pregnant women. On the one hand it
seems likely that this would save the lives of many of
the children; but on the other hand, to be effective,
this would require also mandatory treatment for the
mother which to some, is a significant violation of
civil rights.
DISCUSSION QUESTION
(1) In New York State, a group has been set up to make
recommendations to the Governor relating to this issue.
What would your recommendation be?
(2) Studies of the effectiveness of AZT in extending the
life of those who have the HIV virus have not been very
encouraging. Should this make a difference in
considering mandatory treatment of the mother in an
attempt to save the child?
<<<========<<

A study being published in the "Journal of the American
Medical Association" reports that high cholesterol for
men and women over 70 seems not to affect the chance of
a heart attack or dying of heart disease or, for that
matter, of anything else. They followed 997 men and
women from 1988 to the end of 1992. A third of the
women and a sixth of the men had high cholesterol
levels.
This result is found to be a bit surprising, given other
studies that show that high cholesterol is a risk factor
for heart disease in middle-aged men and women. One
explanation is high cholesterol does not affect the
arteries for some and those for which it does have been,
to some extent, selected out before reaching their 70's.
DISCUSSION QUESTION
Snell has higher than average cholesterol level but will
be 70 in January. Should he go back to eating hot fudge
sundaes after his birthday?
<<<========<<

Two studies, designed to test the effectiveness of the
drug heperin routinely used to treat heart attacks, were
stopped (in April) early when it was noticed that high
doses of the drug were causing lethal bleeding in some
of the patients.
Doctors are just now being publicly alerted to this
danger. Authors of the study explain that the delay was
caused by concern about their work being peer-reviewed
and also to allow for a careful presentation that would
not discourage physicians from using the drug in
moderate amounts. On the other hand, the delay was
criticized because it was known that doctors were at
times using doses larger than those that had been found
to produce unacceptable results.
DISCUSSION QUESTION
Do you think the delay in publicizing this result was
justified?
<<<========<<

This is review of the following two books coming out of
the sex survey we reported in the last chance news.
SEX IN AMERICA A Definitive Survey
By Robert T. Michael, John H. Gagnon,
Edward O. Laumann and Gina Kolata
Little, Brown. 300 pp. $ 22.95
THE SOCIAL ORGANIZATION OF SEXUALITY
Sexual Practices in the United States
By Edward O. Laumann, John H. Gagnon,
Robert T. Michael and Stuart Michaels
University of Chicago. 718 pp. $ 49.95
The first book is a carefully written account of the
survey but with a minimum of technical statistics and
obviously aimed at the best selling list.
The second book is a more technical report of the survey
but also written with the intent of being read by the
non-expert. Indeed, it is, in many ways, a fine primer
on survey sampling. The authors discuss in detail how
they went about making the decision on the nature and the
size of the sample they chose, how they decided between
a telephone survey, interviews, written forms etc. They
also discuss possible biases and how they planned to
check for these. Like the "Bell Curve," they often take
the time along the way to explain in quite simple terms
the statistical techniques they use to determine sample
sizes, look for correlations etc.
This reviewer does a good job of highlighting some of
the inevitable problems in a study of this kind.
Because of the withdrawal of federal support the
researchers had to decrease the size of the sample from
the proposed 20,000 to about 9,004. They had to eliminate
1,141 because they belonged to empty buildings. Again
for financial reasons they had to limit the study to
English speaker adults between 18 and 59. This
eliminated another 3,514. Of the remaining 4,369
households, the researchers interviewed people in 3,432
of them with a response rate of 78.6 percent.
One more for financial reasons, they chose not to cover
people in institutions such as college dormitories,
military barracks or prisons which might cause problems
with the use of their survey for AIDS policies, which was
one of their main objectives.
The reviewer comments on the obvious difficulties of
telling if people are being truthful. The researchers asked
for written responses for some of the people interviewed
and checked them with the interview but he is not convinced
that this is much of a check.
The book itself gives a careful discussion of all these
problems but does put a bit of a positive spin on them.
The reviewer concludes that "The National Health and
Social Survey represents a mountain of hard work, but
its findings should be greeted with more skepticism than
its authors would like."
DISCUSSION QUESTIONS
(1) In discussing possible biases in their survey in the
the more technical book, the authors state that: "Only 6
percent of the interviews took place with the spouse or
other type of sex partner present, and an additional 15
percent had other people present ... These 'others'
were overwhelmingly likely to be children or
stepchildren of the respondent... When interviewed
alone, 17 percent of the residents reported having two
or more sex partners in the past year, while only 5
percent said so when their partners were present during
the interviews."
After discussing how they looked into this potential
problem, the authors remark: "On the basis of these
bivariate analyses, we cannot conclude that the presence
of others caused the reporting differences in the sense
of suppressing the truth."
What else could cause the difference?
(2) The authors list seven possible explanations for the
fact that the median number of sex partners since age 18
was 6 for men and 2 for women and comment on which they
feel is most likely. See if you can come up with some
of their explanations and which is most likely.
<<<========<<

>>>>>==========>>
The Bell Curve.
Intelligence and Class Structure
in American Life
by Richard J. Herrnstein and
Charles Murray
Free Press, 845 pp. $30.00

I promised last time to try to say what is in this book
but my review is getting as long as the book so I will
give only Part I this time and continue next time.

INTRODUCTION
Tests to measure intelligence (called cognitive ability
here) play a central role in this book. Thus, in their
introduction, the authors discuss the history and the
controversies surrounding attempts to measure
intelligence. Modern theory traces its beginnings to
Spearman. Spearman noticed that performances on tests
attempting to measure intelligence were positively
correlated. To explain this, he postulated the existence
of a single variable that he called g which is a persons
general intelligence. It is a quantity like height or
weight that a person has and that varies from person to
person. When you take a test to measure intelligence
your score is a weighted sum ag + bs + e of the
factors g , s, and e with g your general intelligence,
s a measure of your intelligence relating to this
particular test and e a random error. If you take
several different tests, g is common to all of them
and causes the positive correlation. The magnitude
of a tells you how heavily the test is "loaded" with
general intelligence -- the more the better. This
simple model is only consistent with a very special
class of correlation matrices (those with rank 1) and so
had to be generalized to include more than one kind of
g. This led to the development of factor analysis as a
mathematical model for what is going on. It also led to
the development of IQ tests to measure intelligence.
The controversies over the use of IQ began when it
was proposed that they be used to justify sterilization
laws in an attempt to eliminate mental retardation, and
immigration laws to favor the Nordic stock. It
continued when Arthur Jensen suggested that remedial
education programs (begun in the War on Poverty) did not
work because they were aimed at children with relatively
low IQ, largely inherited and therefore difficult to
change. Then followed debates over whether differences
in IQ were due mostly to genetic difference or to
differences in environment culminating in Stephen Jay
Gould's best seller "The Mismeasure of Man". Gould
concluded that "deterministic arguments for ranking
people according to a single scale of intelligence, no
matter how numerically sophisticated, have recorded
little more than social prejudice." While the authors
admit that Gould's ideas still reflect a strong public
sentiment about IQ tests, they feel that it bears very
little relation to the current state of knowledge among
scholars in the field.
Finally, the authors discuss current attempts to
understand intelligence, describing three different
schools.
THE CLASSICIST: intelligence as a structure.
This school continues to extend the work of Spearman
using factor analysis and assuming, as Spearman did,
that some kind of general intelligence is associated
with each individual. Workers in this school continue
to try to understand the physiological basis for the
variables identified by factor analysis and to
improve methods of measuring general intelligence.
THE REVISIONISTS: intelligence as information
processing.
This school tries to figure out what a person is doing
when exercising intelligence, rather than what elements
of intelligence are being put together. A leading
worker in this field, Robert Sternberg writes: "Of
course a tester can always average over multiple scores.
But are such averages revealing, or do they camouflage
more than they reveal? If a person is a wonderful
visualizer but can barely compose a sentence, and
another person can write glowing prose but cannot begin
to visualize the simplest spatial images, what do you
really learn about those two people if they are reported
to have the same IQ?"
THE RADICALS: the theory of multiple intelligences.
This school led by Howard Gardner, rejects the notion of
a general g and argues instead for seven distinct
intelligences: linguistic, musical, logical-
mathematical, spatial, bodily-kinesthetic, and two forms
of "personal" intelligence. Gardner feels that there
is no justification for calling musical ability a
talent and language and logical thinking intelligence.
He would be happy calling them all talents. He claims
that the correlations that lead to the concept of g come
precisely because the tests are limited to questions
that call on these two special aspects of
intellegence.
Herrnstein and Murray consider themselves classicist and
state that, despite all the apparent controversies, most
workers in the field of psychometrics would agree with
the following six conclusions that they feel are
consequences of classical theory.
(1) There is such a thing as cognitive ability
on which humans differ.
(2) All standardized tests of academic aptitude
or achievement measure this general factor
to some degree, but IQ tests expressly
designed for that purpose measure it most
accurately.
(3) IQ scores match, to a first degree, whatever
it is that people mean when they use the work
intelligent or smart in ordinary language.
(4) IQ scores are stable, though not perfectly
so, over much of a person's life.
(5) Properly administered IQ tests are not
demonstrably biased against social economic,
ethnic, or racial groups.
(6) Cognitive ability is substantially
heritable, apparently no less than 40
percent and no more than 80 percent.
The authors stress that IQ tests are useful in studying
social phenomena but are "a limited tool for deciding
what to make of any individual."
THE DATA USED IN THIS BOOK
Throughout the book the authors make use of data from
the National Longitudinal Survey of Youth (NLSY) started
in 1979. This was a representative sample of 12,686
persons ages 14 to 21 in 1979. This group has been
interviewed annually and the authors use the data
collected through 1990.
REVIEWER NOTE
While this study was meant to follow labor trends, a
number of other groups used the subjects for their
studies. One of these provided the IQ data necessary
for this book. The army had been using a test called the
Armed Services Vocational Battery (ASVB) since 1950 to
help in the selection of recruits and for special
assignments. It had been suggested that the volunteer
army was selecting a group less well qualified than the
Army obtained by the draft. To check this they decided
to administer the ASVB to the sample chosen for the
NLSY. This study was called the Youth Profile and was
administered by the National Opinion Research Council.
The results showed that the volunteer army was getting a
higher quality army, as measured by these tests, than
the draft, but at the same time found significant
differences between the performance of various ethnic
groups. A study of these differences and a summary of
explanations for them are provided in "The profile of
American youth : demographic influences on ASVAB test
performance" by R. Darrell Bock and Elsie G.J. Moore.
It is interesting to compare their analysis with that of
the authors of this book.
The ASVB has ten subtests which vary from tests you
might find on an IQ vocational test, such as automobile
repair and electronics. The Armed Forces Qualification
Test (AFQT) is made up of the four subtests of the
ASVB: word knowledge, paragraph comprehension,
arithmetic reasoning, mathematical knowledge. The
authors show in an appendix that this test has the
properties of a very good IQ test . In particular, they
found that over 70% of the variance on the AFQT could
be accounted for by a single factor, g, which they
identify with general intelligence.
CHAPTER I. COGNITIVE CLASS AND
EDUCATION 1900-1990
In this part the authors provide a number of graphs that
show there is a cognitive sorting process going on in
education. While many more students are going to
college, a higher and higher proportion of the really
bright students are going to a few select schools. We
see graphs that exhibit the following:
(1) In the twentieth century the prevalence
of college degrees increased rather
continuously from 2% to 33%.
(2) From 1925 to 1950 about the same
percentage (55%) of the top IQ quartile
of the high school graduates went to
college. Starting in 1950, this percentage
increased dramatically from 55% to over
80% in 1980
(3) In 1930 the mean IQ scores for all
those attending college were about .7
standard deviations above the mean and
those attending Ivy League and Seven
Sisters colleges were about 1.3 standard
deviations above the mean. In 1990 the
mean IQ for all attending college had
remained about same a .8 standard deviations,
while the mean IQ for the Ivy League and
Seven sisters mean had increased to 2.7
standard deviations above the mean.
Since these graphs display standardized scores, the
authors spend some time explaining the concepts of mean
and standard deviation in the test and provide a more
complete discussion in an appendix.
The authors express the fear that the clustering of the
high IQ students in a small number of colleges will make
them isolated from and unaware of the real world.
CHAPTER 2. COGNITIVE PORTIONING BY
OCCUPATION
The point of this chapter is that jobs sort people by
cognitive ability in much the same way that colleges do.
A group of twelve professions: accountants, architects,
chemists, college teachers, dentists, engineers,
lawyers, physicians, computer scientists,
mathematicians, natural scientists, social scientists
are considered to be "high-IQ professions". The mean IQ
of people entering these professions is said to be about
120, which is cutoff point for the top decile by IQ.
The authors provide a graph showing that, until 1950,
about 12% of the top IQ decile were in these jobs. Then
the percentage significantly increased, reaching about
38% in 1990. They link this to education with another
graph showing that the proportion of the CEO's with
graduate training remained around 10% until 1950 when
the proportion increased dramatically to about 60% in
1976. Combining these observations the authors
conclude that, at mid-century, the bright people were
scattered throughout the labor force but, as the century
draws to a close, a very high proportion of these people
are concentrated within a few occupations paralleling
the cognitive portioning by education.
CHAPTER 3. THE ECONOMIC PRESSURE TO
PARTITION
This chapter is devoted to showing that IQ is a good
predictor of job performance. It is the first use of
correlation, and an appendix devoted to explaining the
concept of correlation is available for the reader not
familiar with this concept.
The authors discuss a number of studies showing that the
correlation between IQ and job performance is typically
at least .4 and often more. They point out that the
military offers huge data sets for these studies, since
everyone in the military must take the ASVB tests (and
hence also the AFQT IQ test ) and members of the
military attend training schools where they are measured
for "training success" at the end of their schooling,
based on measures that amount to job assessment skills
and knowledge. In these studies, the average correlation
between IQ and job performance is about .6. By looking
at the high correlation between the g factor for the IQ
test and job performance, they conclude that the g
factor is the key to success in these jobs.
Modern studies in the civilian population are typically
done by meta-analysis of small studies leading to
results similar to those found in the military studies.
An exception was in a report of the National Academy of
Sciences "Fairness in Employment Testing", which
reported a correlation of only about .25. The authors
suggest that this is because researchers for this study
did not apply corrections for restricted range which
they feel was appropriate for the purposes of their
study. (Restricted range means that your sample
did not include reasonable numbers from the entire
range of possible scores). When these corrections are
made they say that the correlation would increase to
around .4, consistent with other studies.
The authors also compare various predictors for job
performance and report the results of a study that
showed that the highest correlation between a predictor
and job performance rating was the cognitive test score
(.53) followed by biographical data (.37), reference
checks (.26), education (.22), interview (.14), college
grades (.11) and interest (.10).
The chapter concludes by remarking that the Supreme
Court decision of Griggs v. Duke Co. in 1971, which
severely limited the use of IQ tests for job selection,
is costing the American economy billions of dollars.
REVIEWER NOTE.
The main issues referred to in the Griggs v. Dude
decision is the possibility of so-called "disparate-
impact" lawsuits. These are lawsuits that challenge
employment practices that unintentionally but
disproportionately affect people of a particular race,
color, religion, sex or national origin.
The supreme court has twice changed the ground rules set
up in the Griggs v. Duke decision. The current rules
related to these suits are governed by the Civil Rights
act of 1991. According to this law, if a plaintiff
shows that a specific part of the employment practice
disproportionately affects a particular group, then the
employer must be able to demonstrate that the employment
practice or criterion in question is consistent with a
business necessity (whatever that means).
In order to prove disparate impact using statistical
comparisons, the comparison must be with racial
compositions of the qualified people in the work force,
not the racial composition of the entire work force.
When multiple employment criteria are required and it
can be argued that they cannot be separated, then the