Tuesday, 28 May 2013

In
Woodley, te Nijenhuis, and Murphy (2013, in press) we argue that intelligence
has declined substantially since Victorian times, based on a meta-analysis of
simple reaction time. An exchange of ideas started at several blogs. We hereby
reply to the blogposts of Scott Alexander and HBD Chick, reacting to an earlier
post made by us.

A paper
has come to our attention that provides strong evidence against the supposed
representativeness problem across cohorts (e.g. Alexander, 2013). The study in
question is that of Wilkinson and Allison (1989) using a sample of 5,324
visitors to the London Science Museum, which is situated at the exact site of
Galton’s 19th century Anthropometric Laboratory in South Kensington.
All visitors undertook psychophysical
testing on a simple reaction time-measuring apparatus, just as the people in
Galton’s study did. Of these mixed-sex participants 1,189 were aged between 20
and 29, and are thus highly similar to the age range
employed in our own study. Their simple RT mean was substantially slower than the weighted 1889 RT mean (245 ms vs. 194.06
ms), and furthermore the mean of this sample falls very close to the meta-regression-estimated mean across studies for
the late 1980s (approximately 250 ms, see: Figure 1 in Woodley, te Nijenhuis
& Murphy, 2013). The remarkable features of this study are the ways in
which it replicates virtually every significant demographic aspect of Galton’s
study.

There
is the issue of a participation fee. Galton is known to have requested a
participation fee of 3 pennies (approximately £5 in modern UK currency). The London Science Museum required the
payment of an admissions fee right up until December 2001. Furthermore it still
requires the payment of fees of £6to
£10for access to some special
exhibitions (London Science Museum, 2013a). The Wilkinson and Allison (1989)
study was in fact conducted as part of a special exhibition entitled Medicines for Man, which was hosted by
the Museum from the early 1980s (Medicines for Man Organizing Committee, 1980).
Therefore participation fees were employed in the case of both studies.

There
is strong evidence for the demographic convergence between the two studies.
Johnson et al. (1985) indicate that whilst Galton’s sample included persons
from all occupational and socioeconomic groups in Victorian London, it was
nonetheless skewed towards students and professionals, and both groups could
fairly be described as solidly White and middle class. In the last decades of
the 20th century, museum attendance in the UK exhibited precisely
the same skew in terms of sociodemography. Eckstein and Feist (1992) for example
noted that most UK museum visitors are drawn from White and upper-middle-class
populations. Furthermore Hooper-Greenhill (1994) observed that the largest
minority ethnic groups in the UK (i.e. Asians and Afro-Caribbeans) are
underrepresented amongst museum visitors. In acknowledging this issue, a House
of Commons report in 2002 stated that free admission to museums would unlikely
‘… be effective in attracting significant numbers of new visitors from the
widest range of socio-economic and ethic groups’ (House of Commons report,
2002, p. 23).

The
presence of this self-selection amongst visitors strongly harmonizes the studies of Galton and Wilkinson and Allison.
Add to this the fact that participation fees were employed in both cases, the
fact that the geographical locations were exactly the same and finally the fact
that the age demographic of interest (i.e. twenty-somethings) were intensively
sampled in both cases (i.e. 3,410 in the case of Silverman’s subset of Galton’s
sample and 1,189 in the case of Wilkinson and Allison). The net of this is that
the studies become even more strongly
convergent in terms of comparing like with like. Thusthe argument of more heterogeneous samples
visiting museums in the 1980s compared to more restricted samples visiting museums
in the 1880s is critically weakened. The principal objections that can be
leveled against this are as follows.

Firstly
there is the issue of tourism. Most tourists to the UK are from the US and
Europe (Tourism 3B), meaning that they are likely to be both ethnically and
socioeconomically matched to the majority of the participants in this study
(i.e. UK citizens). In fact, international arrivals in the United Kingdom in
1990 show that of the 439 million inbound tourists, 60% were European in origin
and 21% emanated from the Americas. Hence, 81% of the tourist population came
from groups which are highly ethnically similar to the British. Only 12% came
from Asia and the Pacific with a meager 3% coming from the Middle East and 2%
from Africa (Tourism 3B). In sum, it is unlikely that tourists being tested in
the 1989 study were substantially ethnically different from the typical UK
museum visitor. Based on current statistics from the Science Museum, the
preponderance of visitors hail from the UK (69%) and the preponderance of those
are from Greater London (44%; London Science Museum, 2013b). Historically,
especially prior to the 1990s this figure would have been much higher, owing to
far lower levels of tourism to the UK (in 1990 international tourism levels
were less than half the current levels, >940
million per year, BBC, 2013). This means that in all likelihood well over 70%
of the participants in Wilkinson and Allison’s study would have been British,
and the overwhelming majority of these would have been White, upper
middle-class and from London. The overwhelming majority of the international visitors
would have been ethnically and broadly socioeconomically matched to the British
visitors.

Secondly
is the issue of instrumentation. Galton utilized a pendulum chronoscope with a
temporal resolution of around a centi-second (i.e. 1/100th of a
second, or 0.01 seconds). The electronic apparatus employed by Wilkinson and Allison
in all likelihood had a higher resolution (post-1908 chronoscopy at least had
the potential to be accurate to a single milli-second; Haupt, 2001), however a
centi-second level only resolution in Galton’s apparatus cannot account for the
substantial discrepancies between these two studies.

Thirdly,
Galton’s sample was single person-single trial, whereas Wilkinson and Allison’s
study employed two practice trials followed by 10 trials per person for the
purposes of averaging. This protocol would almost certainly have enhanced the
reliability of Wilkinson and Allison’s data relative to Galton’s (Jensen,
1980); however in both cases we are dealing with aggregates. Strong biases
(i.e. jumping the gun vs. slow to start) have the potential to cancel each
other out when employing these sorts of very large datasets, as these sources
of error are distributed in a Gaussian fashion. This means that aggregate-level
mean-wise comparisons are appropriate for comparisons between data exhibiting
different coefficients of reliability coupled with very large Ns.

On
this basis Wilkinson and Allison’s (1989) study must be considered an excellent
replication of Galton’s study. Its mean reaction time for the relevant age
cohort is almost precisely where our meta-regression predicts it should be.
This is clearly strong supporting evidence for the robustness of the increase
in simple RT latency produced to date and so puts even more nails in the coffin
of those who argue that the trend can be accounted for by lack of
representativeness across cohorts.

Medicines
for Man Organizing Committee. (1980). Medicines
for Man: A Booklet Based on an Exhibition at the Science Museum about Medicines
- how They are Discovered and how They Work, how They are Made and Tested, how
They are Prescribed and Dispensed, and how Laws Control Their Use. London,
Science Museum.

Woodley,
M. A., te Nijenhuis, J., & Murphy, R. (2013). Were the Victorians cleverer
than us? The decline in general intelligence estimated from a meta-analysis of
the slowing of simple reaction time. Intelligence.
doi:10.1016/j.intell.2013.04.006

While some researchers toil over birth cohorts,
diligently tracking every child born in a particular week, others go searching
for exceptional children and track them instead. More fun, I suppose. They do
so to answer the question: is intelligence all it’s cracked up to be? Even more
pedantically: is there any real difference between those who get a high score
on an intelligence test compared with those who get an extremely high score, or
is being reasonably bright good enough for most purposes in ordinary life?

Youth
identified before age 13 (N = 320) as having profound mathematical or verbal reasoning
abilities (top 1 in

10,000)
were tracked for nearly three decades. Their awards and creative accomplishments
by age 38, in combination with specific details about their occupational
responsibilities, illuminate the magnitude of their contribution and professional
stature. Many have been entrusted with obligations and resources for making
critical decisions about individual and organizational well-being. Their
leadership positions in business, health care, law, the professoriate, and STEM
(science, technology, engineering, and mathematics) suggest that many are
outstanding creators of modern culture, constituting a precious human-capital
resource. Identifying truly profound human potential, and forecasting differential
development within such populations, requires assessing multiple cognitive
abilities and using atypical measurement procedures. This study illustrates how
ultimate criteria may be aggregated and longitudinally sequenced to validate
such measures.

The Lubinski and Benbow gang have been tracking very
bright kids for ages, and the results are clear: being brighter than 99.25 % of
the general population, whilst all very well in itself and an almost guaranteed
passport to a productive and happy life, doesn’t amount to all that much. Such
people have a modest sufficiency of intellect, but no more. For a real impact,
you have to be brighter than 99.75% of humanity. Those in the latter category
have four times the impact of their less able colleagues. They publish more,
have more doctorates, register more patents, and have more impact on their
disciplines. How can such a small margin make such a difference? Well, once you
are that far out on the right tail of the normal distribution you move quickly
from being 1 in 1000 to being 1 in 10,000. Galton referred to those in the last
category as having achieved “eminence”. These are “scary bright” minds.

Earlier work looked at some key early academic
achievements, but now the research has looked at their worldly successes in
mid-career. The authors give a long list of what their prodigies have achieved,
and it is clear that they have been busy, successful, and are well plugged into
the commanding heights of American academia and industry.

So, once you achieve the eminence of being 1 in
10,000 are these paragons of intellect on a level playing field, comrades of
equal brilliance? No. A little extra something is required, and those that have
it shoot ahead of even this advanced class. A few of them take most the prizes:
one or two raise disproportionate amounts of research money, and account for several
major advances. There are degrees of brilliance among this cognitive elite. So,
what is it like to scoop the top prizes? Although the paper does not report any
personal testimonies or self-evaluations (these may follow in a later paper,
perhaps) it would not be surprising to find that many of them are probably very
happy with their achievements. In my view, some aspects of academic life make
for perpetual uncertainty, as if the peer review process can never be turned
off.

A physicist of my acquaintance who made a habit of
inviting Nobel Laureates to his departmental seminars found that they often
doubted that they had deserved the accolade, and feared that the assembled
physicists would spot the error of attribution the moment they began their
lecture. Driving them to the department, he had to do his best to calm their
nerves. So, even among this select crew, there are orders of precedence.
Mercifully, as a visiting psychologist following on from the Nobel Laureates I
was spared any such harsh evaluation, and accepted for my peripheral
entertainment value.

So, to summarise the results of this rich source of
research results on intelligence, if you get asked: “What does IQ mean, really?”
you will find that Lubinski and Benbow have many of the answers.

Just for the record, by the age of 38 I had achieved
(at the most generous and inclusive count) 28 rather slight publications,
tenure, and one promotion, but few citations, no patents, no companies founded,
and no perceptible impact either on my discipline or the course of Western
civilization. All that may change soon, with any luck, but if you have passed
38, then look back and check your comparative achievements, and if you are
approaching 38, remember that the clock is ticking.

It has become part of popular wisdom that expertise
requires 10,000 hours of practice. Some sages further imply that if you have
the discipline to put in those ten thousand hours you can achieve success in
any calling. This notion fits in nicely with the general nostrum that “you can
be anything you want to be”.

The basis for this work (Ericsson, 1993) was to
identify student musicians of high, middling and low achievements and ask them
to look back at their careers to estimate how long they had practiced. The same
approach was used to look at chess players. The results were written up to
suggest that the hours of practice were the main causal variable. Other
possible causal variables were not investigated in significant depth.

However, there were one or two problems with this
notion. First, there was plentiful evidence that very many people tried to
practice for many hours, and then gave up. In my own case, when I gave up
trying to learn to play the piano after several years, my class mates who had
been listening against their will on the other side of the wall while doing
their homework were universally grateful. My fruitless hunt for the correct
note had been pure torture for them. I had the wish, but not the wherewithal. Second, if you study those who are forced to
practice by pushy parents (generally lauded by journalists for having trained
their children to excel in chess) you find two things: the best chess playing children
in the family have studied for much longer (one standard deviation more) but they have spent many more hours in
practice than grandmasters who have no difficulty beating them soundly in chess
competitions. Forced drill and endless practice pays relatively meagre results,
while talent and substantial practice soar ahead.

The
authors have taken a more open-minded approach to the subject. They have looked
at practice in student musicians and chess players, and their best estimate is
that practice accounts for one third of the variance. The other two thirds are
up for grabs. It is likely, but not directly measured, that talent contributes
to the remaining two thirds. There is plentiful data showing that good
musicians and chess players are much brighter than average, and we know from
more mundane activities in the Army that intelligence is associated with
learning things faster and with
performing to a higher level, particularly when the task becomes more
complicated and when you have to apply general principles rather than follow a
checklist. Linda Gottfredson has assembled a lot of data on the importance of g in everyday life. http://www.udel.edu/educ/gottfredson/reprints/1997whygmatters.pdf

Have a look at the Hambrick et al. paper, which also includes discussions
about the contribution of personality
(not much, once you have measured hours of practice) and genes directly
involved in particular skills.

Bottom line from the authors:

1)The evidence is quite
clear that some people do reach an elite level of performance
without copious practice, while other people fail to do so despite copious
practice.

2)Ten thousand hours are not required. Some chess players take 26 years of practice to make
Master level, while others achieve that in less than 2 years.

Sunday, 26 May 2013

Our study on the lowering of
intelligence has drawn massive attention from the media, with headlines from
Brazil to Vietnam. Also thousands of reactions were posted on blogs, including
two highly relevant critical comments on the blogs of Scott Alexander and HBD
Chick. We give a response in this post. We are also pleased that our paper in Intelligence is starting a scientific
discussion on the lowering of intelligence.

Alexander (2013) advances the
argument that Galton’s sample is unrepresentative of the population of
Victorian London, and may be heavily skewed towards those with high-IQ and
faster reaction times (RTs) owing in part to the fact that Galton charged a
small fee to those wishing to participate in his data collection exercise.
Hence, these studies should not be used as the basis for comparison with more
modern studies, which, it has been argued are relatively far more
representative in many cases of the populations from which they are drawn. We
show here that this argument is wrong.

HBD Chick (2013) has advanced a
second argument to the effect that Galton’s sample, and other contemporaneous
19th century studies (i.e. Ladd & Woodsworth, 1911; Thompson,
1903) represent ethnically homogeneous samples in comparison with more modern
ones, which are obviously less homogeneous. Given the existence of ethnic-group
differences in reaction time (RT) means (i.e. Jensen, 1998), this is proposed
as a cause of the substantially depressed means in current-era studies, thereby
undercutting our conclusion that RT has become slower for the general
population (HBD Chick, 2013). We show here that this second argument is wrong
in as much as changing population composition cannot account for the preponderance of the observed secular
trend.

In addressing the first argument,
the seminal paper of Johnson et al. (1985) which constitutes the source of
Galton’s simple visual RT data employed in both our study and that of Silverman
(2010), contains excellent data on the socio-economic and occupational
diversity of the relevant subset of Galton’s exceptionally large sample (N around 17,000 individuals, 4838 [or
30%] of whom were included in Johnson et al’s study). The paper states that “…
a sizable portion of Galton’s sample consists of professionals,
semi-professionals, and students. However … all socioeconomic strata were
represented” (p. 876). As can be seen in Tables 10 and 11 (pp. 890-891), the
male cohort could be split into seven socioeconomic groups (Professional,
Semi-professional, Merchant/Tradesman, Clerical/Semiskilled, Unskilled,
Gentlemen [aristocracy] and Student or Scholar). For females, there were six
socioeconomic groups represented in the data (Professional, Semi-professional,
Clerical/Semiskilled, Unskilled, Lady [aristocracy] and Student or Scholar). In
both the male and female sample the modal group appears to be the Student or
Scholar category; in both cases these groups exhibit the largest Ns – 1657 in the case of 14-25 year old
males, and 297 in the case of equivalently aged females. The second- and
third-largest groups amongst the males of equivalent age were
Clerical/Semiskilled (N=425) and
Semi-professional (N=414). This is
basically true of the female sample also, with Semi-professional being the next
largest group after Student or Scholar (N=104)
and Clerical/Semiskilled comprising the third largest group (N=47). Whilst it is obviously true that
the sample is skewed towards Students or Scholars in both cases, individuals
from these lower-middle/upper-working class occupations combined (see p. 888 in
Johnson et al., 1985; for a full description of how these occupational
categorizations correspond to employment type), make up a respectable
proportion of the 14-25 year old samples also (>30% in the case of the
males, and >30% in the case of the females). It is important to note that
according to Johnson et al (1985) many of the students would have been pupils
at schools accompanied by teachers on day-trips to Galton’s laboratory at the
Kensington Museum. However, a fundamental point is that Silverman’s (2010)
study uses only data for those aged 18-30 (see Table 1, p. 41 in Silverman
[2010] for full details of this subsample), hence is quite unlikely to have
been nearly as skewed towards school-aged students relative to the sample as a
whole, which included a much larger range of ages.

A careful reading of Silverman
(2010) will reveal that he was cognizant of precisely how much socioeconomic
diversity was present in Galton’s dataset. Accordingly he was very careful to
include only samples that would broadly match one or more of the categories in
Galton’s dataset (see: Silverman, 2010, Table 2, pp. 42-43 for full disclosure
of the sample background characteristics). One advantage of Silverman’s care
and meticulous attention to detail is that it permits us to make like for like
comparisons with specific socioeconomic and occupational groups in Galton’s
data, thus we can directly test the claims of Alexander (2013). Concerning the
post-Galton studies Silverman included five student samples, two of which date
from the 1940s (Seashore et al. 1941), and the remaining three of which date
from the 1970s to the 2000s (mean testing year = 1993; Brice & Smith, 2002;
Lefcourt & Siegel, 1970; Reed et al., 2004). These can be compared with the
combined Galton and Thompson 19th-century student data in a
three-way comparison as follows:

The difference between the 19th
century and the ‘modern’ male students is very similar to the
meta-regression-weighted increase in RT latency between 1889 and 2004,
estimated on the basis of all samples included in the meta-analysis (81.41 ms).
Silverman also included data from other socioeconomic groups. For example the
study of Anger et al. (1993) included a combined male + female sample of 220
postal, hospital and insurance workers from three different US cities. These
occupations clearly fall into the Clerical/Semiskilled and Semiprofessional
groups identified in Galton’s study. For both males and females in Galton’s
data, the N-weighted RT mean for
these two groups is 185.7 ms, the N-weighted
average amongst the participants in the study of Anger et al. (1993) was 275.9
ms. This equates to a difference of 90.2 ms between the 19th century
and 1993. Again, this is not dissimilar to our meta-regression-weighted
estimate of the cross-study increase in RT latency (81.41 ms).

The results of these broadly
socioeconomically- and occupationally-matched study comparisons therefore imply
an additional degree of robustness to the findings of our more statistically
involved analysis of the overall secular trend. Furthermore, this evidences
Silverman’s contention that as an aggregate, the ‘modern’ studies have broadly
equivalent representativeness to the subset of Galton’s data employed in his
and our own analyses. Alternatively we could state that neither Galton’s nor
Silverman’s data are truly fully representative of any population, however they
are both ‘biased’ in their sampling towards broadly similar groups.

We continue with the second
concern, i.e. the lack of strict ethnic matching criterion, hypothesized to
lead to substantially depressed RT means in current-era studies. Ethnic-group
differences in performance on various elementary cognitive tasks have been
documented and are to be expected (i.e. Jensen, 1998). Substantial changes in
terms of the ethnic composition of test-takers would however be needed in order
for the magnitude of change to be solely
or even substantially a consequence
of this process. This is assuming of course that within and between
ethnic-group comparisons in terms of RT produce proportional results.

RT is related to g via mutation load (as measured using
fluctuating asymmetry; Thoma et al., 2006). Mutation load is therefore likely
to be a general source of individual differences in cognitive functioning
within populations (Miller, 2000), but not between them (e.g. Rindermann,
Woodley & Stratford, 2012), hence there is no good reason to expect
ethnic-group differences in RT means to be meaningfully comparable to
within-group differences in terms of proportionality (consistent with this is
the observation that on simple RT these differences whilst present are actually
quite small; Jensen, 1993; Lynn &
Vanhanen, 2002, pp. 66-67). So, indeed ethnically heterogeneous samples will
exhibit slightly slower or even faster reaction times (depending on the
populations and proportions involved), however the current proportions of
groups exhibiting slower simple RT means to Whites in Western countries are
simply too small, and the group-differences too slight to have had a
substantial effect.

It is also worth noting that the
weighted mean of our modern (post-1970) aggregated estimate (264.1 ms) is
actually less than Jensen’s (1993)
finding of a 347.4 ms mean of simple visual RT amongst a sample of 582 White US
pupils described as being of European descent, and also Chan and Lynn’s (1989)
finding of a 371 ms simple RT mean for over 1000 White British school children
in Hong Kong. It must be noted however that these studies were conducted on
young children – simple RT shortens until the late 20’s when full neurological
maturation is achieved (e.g. Der & Deary, 2006), hence Jensen and Chan and
Lynn’s estimates are likely to be underestimates of the adult simple RT means
of these Whites, which may be somewhat closer to our sample mean of ‘modern’
(mostly White) populations in actuality.

We would like to thank Scott
Alexander and HBD Chick for their interest in our study, and for their
commentaries, however the counter-arguments, whilst thought-provoking, do not
appear to withstand scrutiny. We must therefore conclude that the secular
slowing of simple reaction time between the closing decades of the 19th
century and the opening one of the 21st has had little to do with
sampling issues.

Thompson, H. B. (1903). The mental traits of sex. An experimental
investigation of the normal mind in men and women. Chicago, IL: The
University of Chicago Press.

Woodley, M. A., te Nijenhuis, J., &
Murphy, R. (2013). Were the Victorians cleverer than us? The decline in general
intelligence estimated from a meta-analysis of the slowing of simple reaction
time. Intelligence.
doi:10.1016/j.intell.2013.04.006

Thursday, 23 May 2013

To those who knew him to any extent, he was seen as
pretty normal. Above average at school, fond of football, a fan of Tottenham
Hotspur and an amateur centre field player. Most of the accounts initially were
reasonably positive. By the following day the story was a bit more nuanced. He
had punched a girl in the face 10 years ago. He was aggressive. He was a big
guy and you wouldn’t want to get on the wrong side of him. He had a very
difficult father, who contributed nothing much to his upbringing. No-one had
any inclination he would do a thing like that, though he had become a preacher,
though not about Jesus.

With a confederate he rammed a car into an off duty
soldier, cut his head off with a meat cleaver, dismembered his body with knives
and machetes, and then for 20 minutes posed for photographs and gave interviews
in which he gave his justifications and issued threats. Some extremely brave women
passers-by (one getting off her passing bus for the express purpose of helping
the fallen man) engaged the murderers in conversation. They reported that the
main protagonist was apparently neither drunk nor on drugs, just an angry guy.

Murder is too rare a behaviour in for the average
citizen to pick up warning signals. Too rare for the security services as well,
who knew of lead protagonist, but not of his most recent Jihadist plans. With
any luck a fuller picture will emerge, though probably never a way of
distinguishing the noise of the average angry guy who converts to Islam and preaches
anger from the signal that two butchers are hunting for an English soldier walking
one afternoon on a English street.

Monday, 20 May 2013

(This nostrum, attributed toSt. Francis Xavier, also works for girls and
women, though separate equations are required, because of interrupted careers).

In popular culture, in academic debate,
and in the nitty-gritty of medico-legal battles about the bright future which
might otherwise have been enjoyed by a damaged child seeking compensation,
there is much interest in what one can predict about a person’s future given
knowledge of their social class, circumstances, school performance and
intelligence at age 7. In medieval times it was only at age 7 that it seemed
pragmatic to recognise that the infant had survived the very high early life
death rates, and could be welcomed as a human being. In these gentler times parents
have no compunction about photographing their infant, secure its survival. It
is not bad luck to register, name, photograph, film, record and display the
vulnerable neonate to the world.

A recent study has added some evidence
to these discussions, finding that maths and reading make an additional
contribution to later success in life, over and above the general factor of
intelligence. Stuart Ritchie and Tim Bates have written an elegant paper in
Psychological Science “Enduring Links from Childhood Mathematics and Reading Achievement to
Adult Socioeconomic Status”. http://pss.sagepub.com/content/early/2013/05/02/0956797612466268

Using the population
born in a single week in 1958 (National Child Development Study data held by
Institute of Education, and in my view “gold dust” for proper research) they
got the data on social class of origin, maths, reading, intelligence, academic
motivation, duration of education and attained social class.

In a nutshell, the effects
of mathematics and reading achievement at age 7 have an effect on attained Socio-Economic-Status
by age 42. Mathematics and reading ability both had substantial positive
associations with adult SES, above and beyond the effects of SES at birth, and
with other important factors, such as intelligence. Achievement in mathematics
and reading was also significantly associated with intelligence scores,
academic motivation, and duration of education. These findings suggest effects
of improved early mathematics and reading on SES attainment across the life
span.

Of
course, readers of this blog will know the standard lament by now: many causes
interact with each other, and teasing them apart is difficult, but not
impossible. For example, in the original
study the social class of origin of the children was noted, but the intelligence
of the parents was not measured. So, we cannot assume the “influence of social
class” is from social class advantage per se. It will be a blend of material
advantage and genetic advantage, of unknown proportions. The explanatory model probably should say “a
class and genetic mixture”.

In ancient
times the data would be presented in terms of means, standard deviations, a
correlation matrix, and then perhaps a multiple regression equation. A useful
and familiar progression, but not without interpretive problems. Ritchie and
Bates are made of brighter stuff, and use a OpenMX magic box http://openmx.psyc.virginia.edu/ to
generate there structured equations.

Personally, I
approach structured equation modelling with some trepidation, fearing a magic
lantern show which will convince me of anything, but Tim Bates
thunders: “SEM exposes all assumptions, claims, and lacuna ruthlessly: it
should be ubiquitous.” The (complicated story) is shown in their Figure 2,
which traces direct and indirect coefficients on final achieved social status.
From this it is possible to argue that, although intelligence has a strong
causal effect, there is an additional direct contribution from Maths, with a lower
direct effect from Reading. Nonetheless, there is a case for improving the
teaching of these skills so as to make an independent additional contribution
to life successes. Intelligence leads to motivation, which leads to years in
education, which leads to attained socio-economic status. The latter leads into
log income at the very end, which may be a relief to those who value cash over
social approval.

A few points: once you put in social
class of origin and housing tenure, the number of rooms in the parental home
has no effect. All other things being equal, the “bedroom tax” is unlikely to
diminish social mobility in a generation’s time.

I should like to have been able to give
you a much more detailed statistical analysisbut I was not taught maths properly when
I was seven. At about that age, or slightly older, I announced to my
grandfather, an Edinburgh engineer: “I know my12 times table”. He looked at me with a dour expression, and
replied: “When I was a wee lad I knew my 20 times table”.

Sunday, 19 May 2013

A response to Prof Rabbitt – The Victorians were still cleverer than us

By Michael Woodley, Jan te Nijenhuis, and Raegan
Murphy

Professor Rabbitt has reacted to our interpretation of
the secular trend in simple reaction time speeds first detected by Silverman (2010),
and validated by us (Woodley, te Nijenhuis & Murphy, 2013). We would like
to thank professor Rabbitt for his interest in our work and for being one of
the first to substantially contribute to the scientific discussion that was
started by our paper. Rabbitt makes several interesting points of criticism –
here we will show however that these do not constitute sufficient grounds to
reject the reality of the secular slowing of simple reaction time.

Firstly, Rabbitt argues that the level of inaccuracy
in instrumentation designed to measure simple reaction time was historically
quite high, especially in the pre-1970’s era where he argues that it was on the
order of 100 or so ms. Rabbitt then goes on to state paradoxically that a
reading of 200 ms might therefore fall between 200 and 299 ms, which assumes a
bias of 99 rather than 100 ms, and also that the instrumentation would
consistently ‘round down’ reaction time estimates. In actuality a bias of 100
or so ms would yield an average bias of 50 ms either way, assuming that the
error due to bias was normally distributed, and that there was no tendency for
biases to be skewed in one direction rather than in the other. Rabbitt does not
provide any evidence for such a tendency towards rounding down – he merely states
this as a fact apparently based on personal experience with pre and post-1970’s
instrumentation.

Secondly, Rabbitt argues that method variance across
studies employing different instrumentation makes direct mean-wise comparison
of results problematic. He illustrates this via reference to the use of warning
signals along with the signal intensities, durations and rise-times of
different light sources (such as bulbs, fluorescent tubes, LEDs, computer
monitors, etc), and also with respect to response keys that might have been
non-uniformly ‘sticky’ across different apparatus.

Thirdly, Rabbitt argues that the presence of only two
data points from the Victorian era in our studies means that we can “… leave
aside an important question whether there is any sound evidence that creativity
and intellectual achievements have declined since the Great Victorian
Flowering”.

In addressing the first of Rabbitt’s claims, we are
skeptical about the suggested level of inaccuracy in pre-70’s era
instrumentation (such as Galton’s apparatus and the electro-mechanical Hipp
chronoscope). True millisecond resolution in measurement had been achieved far
earlier than Rabbit claims, namely in 1908 (Haupt, 2001), with instruments
prior to that being typically accurate to at least a hundredth of a second. It
is not obvious why decent resolution (perhaps on the order of a hundredth of a
second) would not have been within the grasp of someone of Galton’s mental
stature and notoriously obsessive attention to detail (Rose & Rose, 2011).
His apparatus was described in an 1889 paper and employed a half-second
pendulum, whose duration could be estimated using very basic mathematics. Its
release occurred concomitantly with the concealing of a white paper disk, which
functioned as the stimulus - depressing a key facilitated its capture,
registering the reaction-time score. Similarly the much more sophisticated Hipp
chronoscope, with its electro-mechanical clutch-based mechanism was capable of
true millisecond resolution (Haupt, 2001). The issue of true millisecond
resolution is at any rate rendered moot in light of the fact that we are
dealing with the means of a large number of individuals measured by Galton and
others in multi-trial type experiments. Resolutions of hundredths of a second
would seem to suffice in such samples (Haupt, 2001).

These observations aside, there is a far more
substantive problem with Rabbitt’s primary claim, namely that, even assuming a
normally distributed 100 ms level of inaccuracy, the preponderance of pre-1970
studies still reveal upper bound means for simple reaction time that are
shorter in duration than the sample size weighted ‘true millisecond resolution’
mean of post-1970 studies.

Table 1

Reaction time
means for five pre-1970 studies used in Woodley et al. (2013) along with
estimates of error due to sub-100 ms measurement imprecision

Reported mean
(combined and N-weighted for the
sexes where available)

Error range
assuming 50 ms either way

184.3
ms (Galton, 1890’s)

134.3-234.3 ms

208
ms (Thompson, 1903)

158-258 ms

197
ms (Seashore et al., 1941)

147-247 ms

203
ms (Seashore et al., 1941)

153-253 ms

286
ms (Forbes, 1945)

236-336 ms

Weighted mean of post-1970 studies = 264.1 ms

Based on Table 1, assuming a normally
distributed 100 ms inaccuracy, the upper estimate falls below the post-1970
‘true millisecond resolution’ mean in four out of five cases (the exception
being the study of Forbes, 1945). The cumulative odds of this being a chance
result can easily be calculated. Let us assume a 50% chance that the
instruments would produce a mean value whose upper-bound estimate falls above
that of the post-1970’s study. The odds of four studies producing consecutive
means whose values are lower is equal to 0.5*0.5*0.5*0.5, or 6.25%. In other
words, the probability that this is a chance finding is small. If we add to
this the systematic review of Ladd and Woodsworth (1911), which found a mean
for 19th- and early 20th-century samples of 192 ms, and
whose hypothetical upper mean also falls below the weighted post-1970 mean (242
ms), the cumulative odds of this being a chance finding fall to 3.12%.

Secondly, and again assuming high inaccuracy, why are
the results of the pre-1970's studies likely to be overestimates rather than
underestimates of the true values? Let’s look at the sources of bias that
Rabbitt describes. Sticky keys might require more force to in order to register
a result. This was more likely to have been a problem in the case of earlier
studies employing cruder instruments, such as mechanical or hybrid
electro-mechanical apparatuses, rather than computer-based ones, for example.
This suggests that the bias would have been in the opposite direction for
earlier studies to that described by Rabbitt. Sticky keys would necessarily
lengthen rather than shorten reaction time estimates. Long-duration visual
signals, and also ones that are more intense and exhibit rapid rise-times
typically elicit faster (or maximal) reaction times (Kosinski, 2012). Galton’s
apparatus used a purely mechanical signal in the form of a paper disk, which
could be made to disappear via the operation of levers, thus triggering the
subject to depress a key and halt the swing of a half-second pendulum. The
signal duration was therefore indefinite – persisting until the point at which
the apparatus would be reset. It is hard to argue against the high visibility
of such a signal either, assuming a well-lit laboratory. Subsequent studies
employing the Hipp chronoscope such as Thompson (1903) and the studies
described in Ladd and Woodsworth (1911) would have employed light sources.
Thompson (1903) for example employed a Geissler tube suspended against a black
background which was reported as producing a “flash of pale purple light” that
was “thrown out sharply” (p. 8). Geissler tubes are plasma-discharge or
fluorescence-based illumination sources. Fluorescent light sources exhibit
extremely rapid rise-times compared to filament-based incandescent bulbs, for
example (Sivak, Flannagan, Sato, Traube & Aoki, 1993).

Whilst the issue of signal duration in these early
studies employing light sources as stimuli is indeed problematic, the
suboptimal tendency is towards shorter duration signals (i.e. brief flashes),
which would lengthen rather than shorten reaction time estimates. It is
long-duration visual signals that permit the recovery of accurate maximal
reaction time latencies (Kosinski, 2012). Once again, any measurement error in
these earlier instruments would tend to skew the estimates towards higher rather
than lower latencies.

What of the issue of warning signals? As Silverman
(2010, p. 41) reports, there is very little evidence that warning signals
actually make a difference to recorded reaction time latencies, especially when
the ensuing stimulus is unpredictable, as was the case in all studies employed
in our and Silverman’s analyses. It is unlikely that Galton utilized a warning
system in his single person-single trial study. Thompson (1903), however, did
use an audio warning system in her study involving multiple trials per person.
The difference in the means between the two studies is extremely small (18.7
ms), and in the opposite direction to that predicted by the theory that the
presence of a warning signal reduces
the latency of reaction time means. This strengthens Silverman’s conclusion
that employing warning signals makes little difference.

We agree with Rabbitt, and also Jensen (2011), who
both argue that method variance between studies can be a substantial problem
when it comes to comparing between different studies, especially those using
different instrumentation. However, Rabbitt seems to have missed the point of
the meta-analytic nature of our own and Silverman’s study. Indeed, the study of
Silverman (2010) set out to explicitly address the issue of method variance
using a stringent set of seven inclusion rules (p. 41) coupled with a detailed
meta-analytic search. The rules were selected on the basis that all studies
included in the comparison set should be as closely matched with respect to
Galton’s study on as many dimensions as possible. The stringency of these rules
means that method variance across studies is substantially reduced, however the
trade-off is that the number of potentially usable studies is also massively
reduced. Our meta-regression ultimately demonstrates the power of a properly
conducted meta-analysis in this regard as we found no significant role for
moderators in explaining the secular trend towards increasingly latent simple
reaction time performance. There is scatter around the regression line, but
that is exactly what meta-analytical theory predicts. All data points being on
or very close to the regression line is an extremely unlikely outcome for a
meta-analysis (see Hunter & Schmidt, 2004).

Finally, what of the issue of sound evidence for the
greater accomplishments of 19th-century Western populations relative
to contemporary ones? This is an important issue that has been addressed
quantitatively using historiometry, which is the historical study of human progress or
individual personal characteristics, using statistics to analyze references
to geniuses,
their statements, behavior and discoveries in relatively neutral texts
(Simonton, 1984). Historiometric research into innovation rates and the
lives and accomplishments of eminent individuals (geniuses) has shown that the
per capita rate (i.e. events per billion of the population per year) of
significant innovation and also geniuses in science and technology peaked in
the late 19th century, after a long period of increase. Throughout
the 20th century there was a decline (Huebner, 2005; Murray, 2003).

What is a significant innovation? It is simply one
that is conspicuously different from anything that came before – so much so
that multiple encyclopedists and compilers of inventories of innovation are
likely to independently note it. Examples include the development of the
plough, the steam engine, splitting the atom and putting a man on the moon. The
iPhone 5 is not a significant innovation in comparison with its earlier
incarnations by contrast, and is unlikely to be considered as such by
contemporary historians of science and technology. Similarly geniuses can be
rated via the degree to which these same sources reference them. The use of a
‘convergence’ criterion based on prominence across encyclopedias not only
allows us to reasonably quantify the frequencies of significant innovation and
geniuses throughout the history of civilization, but it also allows us to rank
those same innovations and individuals in terms of importance. This
historiometric technique, like many extremely useful ideas, has its origins in
the writings of Galton (1869).

In conclusion, whilst Rabbitt’s criticisms are
interesting, they are clearly insufficient grounds for rejecting the central
claims made in our paper – namely that the secular trend in increasing simple
reaction time latency is robust and translates into a decline of -1.23 IQ
points per decade or -14.1 points since Victorian times.

Silverman, I. W. (2010). Simple reaction time: It is not what it used
to be. The American Journal of Psychology, 123, 39–50.

Thompson, H. B. (1903). The
mental traits of sex. An experimental investigation of the normal mind in men
and women. Chicago, IL: The University of Chicago Press.

Woodley, M. A., te Nijenhuis, J., & Murphy, R. (2013). Were the
Victorians cleverer than us? The decline in general intelligence estimated from
a meta-analysis of the slowing of simple reaction time. Intelligence. Doi:10.1016/j.intell.2013.04.006

Thursday, 16 May 2013

Angelina Jolie, who has the BRAC1 mutation, has
undergone a prophylactic double mastectomy and says that she has thus reduced
her risk of getting cancer from “87% to 5%“. She was faced with a most dreadful
dilemma, and has been praised for her courage and for her willingness to make
her story public. It is likely that her example will lead to more women with BRAC
mutations having both breasts removed.

Should they?

If I a similar mutation and was offered the prospect
of reducing my cancer risk from 87% to 5% by having my testicles removed, and
I, like Angelina, had lost family members to cancer, I might go ahead. However,
I would first spend some time checking the statistics, and re-reading Gerd Gigerenzer’s
masterly “Reckoning with Risk” and his more recent “Calculated Risks: How to
know when numbers deceive you”.

The reason for my caution is that:

a) Most of us have difficulty with
statistics

b) Most of us have a particular difficulty
with percentages and

c) Most doctors have as much difficulty with
numeracy as the rest of us, but are more likely to be over-confident.

One of the main problems is that doctors and
journalists concentrate on relative risk
reduction and not absolute risk
reduction. The first compares two procedures in terms of their relative
effectiveness at reducing risk, the second shows you the overall reduction in
risk. It is not much use to reduce your relative risk if the absolute level
remains very much the same.

Consider the results from an earlier retrospective
study by Hartman et al. (1999) which gives deaths per 100 women in the high
risk group:

Prophylactic mastectomy 1

Control (no mastectomy) 5

You can see that the rate of death in this high risk
group (with BRAC mutations) in the women without mastectomies is higher than in
those who had the double mastectomy.

The Relative Risk Reduction is 80% (4 women have been saved, and 4
divided by 5 is 80%.

The Absolute Risk Reduction is 4%
(prophylactic mastectomy reduces the number of women who die from 5 to 1 in
100, a saving of 4 women per hundred).

Now you can see why clinicians, researchers, drug
companies and journalists prefer relative risk reduction percentages to
absolute risk reduction figures. They usually look more dramatic, and make
better headlines.

ResultsBreast cancer was diagnosed in two (1.9%) of 105 women
who had bilateral prophylactic mastectomy and in 184 (48.7%) of 378 matched
controls who did not have the procedure, with a mean follow-up of 6.4 years.
Bilateral prophylactic mastectomy reduced the risk of breast cancer by
approximately 95% in women with prior or concurrent bilateral prophylactic
oophorectomy and by approximately 90% in women with intact ovaries.

ConclusionBilateral
prophylactic mastectomy reduces the risk of breast cancer in women withBRCA1/2mutations
by approximately 90%.

Prophylactic mastectomy 2

Control (no mastectomy) 49

The relative risk
reduction is 47/49 is 95%

The absolute risk
reduction is 49-2 is 47%

A clear result, wouldn’t
you say? This study is in line with previous studies such as Hartmann et al [4] who evaluated the efficacy of bilateral
prophylactic mastectomy in a retrospective cohort analysis of 639 moderate- and
high-risk women who had bilateral prophylactic mastectomy at the Mayo Clinic
between 1960 and 1993. Data from this study suggest that bilateral prophylactic
mastectomy is associated with a 90% reduction in breast cancer incidence and
mortality in women at high risk of breast cancer.In the only other study ofBRCA1/2 mutation
carriers to date, Meijers-Heijboer et al [6] reported no postbilateral prophylactic mastectomy
breast cancers in 76BRCA1/2mutation carriers after 2.9 years of follow-up,
compared with eight breast cancers in 63 mutation carriers who did not undergo
bilateral prophylactic mastectomy (P =
.003).

Well, now we can make a
number of points. Most studies concentrate on whether a woman is diagnosed with
cancer again. However, cancers are increasingly treatable, and although the
medication is thoroughly draining and unpleasant, so is a double mastectomy, and
the latter is permanent. Furthermore, for those with BRAC cancer risks there are
prophylactic medications available. Getting a diagnosis of cancer is not
identical with dying of cancer.

The Rebbeck paper does not
report on mortality figures.
These are theoretically calculated for the next 30 years, but we do not know what
improvements we may get in cancer treatment over three decades. On current
trends it should improve significantly. Survival rates are 93% if it’s
caught at the earliest stages and 88% at stage 1.

A Cochran review in 2004
concluded: Bilateral Prophylactic Mastectomy should be considered only among those at very high risk of disease.

What
Angelina Jolie appears to have done is reduce her chance of getting cancer by
half, a very significant reduction, but at the cost of both breasts. She was
understandably frightened of getting cancer, but she was not doomed, and other
treatments are available.

There is
always a celebrity effect, but any woman considering a prophylactic mastectomy
should look at the data carefully, and look at the human costs and benefits of
all treatment options. Modern medicine is saving more of us from cancer, for
longer than ever before, but it still throws up the most awful dilemmas.