Last changed
28 Dec 2014 ............... Length about 15,000 words (99,000 bytes).
(Document started ≈ 2000.)
This is a WWW document maintained by Steve Draper, installed at http://www.psy.gla.ac.uk/~steve/hawth.html.
You may copy it. How to refer to it.

This web page began as a note on the Hawthorne effect: often mentioned, not so
easy to find a simple account of it. It now also has a significant revision
with reviews of related effects on experiments from expectation and the
experimenters: Pygmalion, placebo, and other effects. What they have in common
is that performance or other significant objective effects come from
(non-objective) causes of humans simply expecting something, or responding to
the social rather than material situation of the experiment.

These are notes: a starting point for others that might be helpful, because I
couldn't find an authority on this and had to put together some points for
myself. I'm not an expert on this.

Most people want to address the point that sometimes the fact that researchers
are studying some human participants makes the latter behave differently (thus
undermining the experiment). This effect is sometimes called the Hawthorne
effect, but only on one interpretation of the actual Hawthorne studies (see
below). It is called a number of other things e.g. novelty effect, demand
characteristics, etc.

While the Hawthorne studies, and the ensuing discussion of them, were one thing
that draw attention to this broad area, I now think they have no special or
clear place in how to think about the range of issues.
Instead, I think the following are the most important categories of factors
affecting experiments with human participants.

There are three very broad categories of factors which affect human behaviour
(whether or not we want them to in a given study):

Direct social effects e.g. legal obligations, wanting to please someone
else. (These should not be confused with the many more cases where we derive
information socially from others which changes our behaviour, but which was
not given to us in order to manipulate us.)

A large middle ground of cognitive issues, where behaviour is changed
by what we want, what we do and don't know, what we guess (i.e. beliefs with
some, but not certain, grounds). Within this probably the three most often
problematic areas in experiments are:

Participants' interpretation of the context they are in: how they
understand what is said to them, and what task is being asked of them.
Researchers frequently do not realise how each participant has understood
the meaning of what they are asked to do.

Expectancies: beliefs about the appropriateness and/or effect of doing
something in one particular way. E.g. a simple instruction "walk
across the room" is in fact massively under-specified: fast or slow?
carefully or while talking to someone else? ...

Learning effects. Humans learn all the time, whether they mean to or not.
Scientific experiments have value only if they are repeatable: clearly this
is very often in conflict with the fact of human learning, so that each person
seldom does something exactly the same way twice. Within this, two things which
stimulate unintended learning most often within experiments are:

Feedback (learning can depend on the availability of feedback, i.e. on
knowing the consequences of one's actions)

Reflection (learning can depend on being prompted to reflect, to think
about feedback and about memories of one's actions).

The term "Hawthorne effect", coined by French (1953, p.101) in a chapter on
field experiments in an edited book on social science research methods, refers
back to a series of experiments on managing factory workers carried out around
1924-1933 in the Hawthorne works of the Western Electric Company in Chicago.
However there is no one precise meaning for the term, since the results were
puzzling to the original experimenters, and their interpretation continues to
be sporadically debated. Generally, references to the Hawthorne effect all
concern effects on an experiment's results of the awareness of participants
that they are the subject of an intervention. However there are many
different possible mechanisms, and all may be important in particular cases.
What is not disputed is that there is an important issue here, and it is clear
that there is a need for a term to refer to these issues: the term "Hawthorne
effect" has often been re-appropriated for any issue in the general area. What
is not understood is what the full range of issues is, and authors have often
(re)defined the term solely in terms of the one aspect and interpretation that
concerns them. An attempt to list some of the different mechanisms and
effects is made below. Part of the variation in meaning comes from the
different interpretations put on the original studies, part comes from the
different disciplines concerned with studies of humans (e.g. management
science, medicine, psychology, aircraft crash investigation), but underlying
it all is the absence of a comprehensive catalogue of the ways in which human
awareness sometimes affects the outcomes of experiments on human participants.

Note that "Hawthorne" is not the name of a researcher, but of the factory
where the effect was first observed and described:
the Hawthorne works of the Western Electric Company in Chicago.

One definition of the Hawthorne effect (out of a number) is:
An experimental effect in the direction expected but not for the reason
expected; i.e. a significant positive effect that turns out to have no causal
basis in the theoretical motivation for the intervention, but is apparently
due to the effect on the participants of knowing themselves to be studied in
connection with the outcomes measured.

Parsons (1974) p.930 defined it as:
"Generalizing from the particular situation at Hawthorne, I would define the
Hawthorne Effect as the confounding that occurs if experimenters fail to
realize how the consequences of subjects' performance affect what subjects do".
(However this is just an effect of motivation and learning,
and scarely needs a new term. The universal propensity of humans to learn is
a constant threat in almost every experiment on people.)

A short way to refer to the Hawthorne effect is:

French,J.R.P. (1953) "Experiments in field settings" ch.3 pp.98-135 in
Festinger,L., & Katz,D. Research methods in the behavioral sciences
(New York: Holt, Rinehart & Winston). [This is the paper that
coined the term "Hawthorne effect" and discusses it in the context of research
methods.]

or

Mayo,E. (1933) The human problems of an industrial civilization (New
York: MacMillan) ch.3. [This is the earliest publication about it, and
the one influential in the human resource management field (as opposed to
research methods).]

or

Roethlisberger,F.J. & Dickson,W.J. (1939)
Management and the Worker (Cambridge, Mass.: Harvard University Press).
[This is the first detailed account of the actual studies, as opposed to
conclusions from them.]

or perhaps

Gillespie, Richard, (1991)
Manufacturing knowledge: a history of the Hawthorne experiments
(Cambridge : Cambridge University Press)

The longer way is:

The studies were done 1924-1933 (although the phrase "Hawthorne effect"
only appeared in 1953). Roethlisberger & Dickson (1939) give a great
amount of detail, and little interpretation. Mayo (1933) gives a shorter
account, and additionally the interpretation which has been so influential in
the management field: essentially, that it was feeling they were being closely
attended to which was the cause of the improvements in performance. French
(1953) coined the term, and is probably responsible for seeing it as a general
issue in experimental methodology.

Basically, a series of studies on the productivity of some factory workers
manipulated various conditions (pay, light levels, rest breaks etc.), but each
change resulted on average over time in productivity rising, including
eventually a return to the original conditions. This was true of each of the
individual workers as well as of the group mean.

Clearly the variables the experimenters manipulated were not the only nor the
dominant causes of productivity. One interpretation, mainly due to Mayo, was
that the important effect here was the feeling of being studied: it is this
which is now often being referred to as "the Hawthorne effect".

1924-1927 there were 2.5 years of illumination level experiments.
In 1927 four studies began on selected small groups.
In 1932 a questionnaire and interview study of 20,000? employees.

Illumination studies pp.14-18 (part of ch.1) of
Roethlisberger & Dickson (1939)
Study 1a-d. a-c were experiments on whole departments.
1a) No control group, experimental groups in 3 different departments.
All showed an increase of productivity (from an initial base period), that
didn't decrease even with illumination decreases.
1b) 2 groups. The control group got stable illumination; the other got a
sequence of increasing levels. Got a substantial rise in production in both,
but no difference between the groups.
1c) Experimental and control groups. Experimental group got a sequence of
decreasing light levels. Both groups steadily increased production, until
finally the light in the experimental group was so low they protested and
production fell off.
1d) 2 girls only. Their production stayed constant under widely varying light
levels, but they said they preferred the light (1) if experimenter said bright
was good, then the brighter they believed it to be the more they liked it;
(2) then ditto when he said dimmer was good. And if they were deceived about
a change, they said they preferred it i.e. it was their belief about the light
level not the actual light level, and what they thought the experimenter
expected to be good, not what was materially good.

Study 2: the relay assembly experiments (2a,b) on a group of 1+5 female operators.
2a Rest pauses and hours of work (in a separate room). Small group
piecework the only expt. var.
2b About a piecework payment system (on a separate bench, but normal
room).
2c Mica splitting test room. Like 2a: separate room, but already and
constantly on piecework rates.
2d Bank wiring: pure observation of a 14 man team. Group piecework.
Could always easily see their own rate.

Study 2a: a group of 6 experienced female workers segregated; 1 serving, 5
assembling telephone relays: a 1 min. task in good conditions. Output
carefully measured. 5 year study. Output (time for every relay produced) was
secretly measured for 2 weeks before moving them to the experimental room.
Then 5 weeks of measures; then manipulations of pay rules (group piecework
for the 5 person group); then 2 5 min. breaks (after a discussion with them
on the best length of time); then 2 10 min. breaks (not their preference)
again produced improvement; then 6 5 min. rests (dislike, reduced output);
then (free?) food in the breaks; shortened the day by 30 mins (output up);
shortened it more (output per hour up, but overall down); return to earlier
condition (output peaked); etc. etc. Attitudes as well as behaviour and
output were measured.

Parsons (1974) argues that in 2a,2d they had feedback on their work rates; but
in 2b they didn't. He argues that in the studies 2a-d, there is at least some
evidence that the following factors were potent:

Rest periods

Learning, given feedback i.e. skill acquisition

Piecework pay where an individual does get more pay for more work,
without counter-pressures (e.g. believing that management will just lower pay
rates).

He (re)defines "the Hawthorne effect as the confounding that occurs if
experimenters fail to realize how the conseqences of subjects' performance
affect what subjects do" [i.e. learning effects, both permanent skill
improvement and feedback-enabled adjustments to suit current goals]. So he is
saying it is not attention or warm regard from experimenters, but either a)
actual change in rewards b) change in provision of feedback on performance.
His key argument is that in 2a the "girls" had access to the counters of their
work rate, which they didn't previously know at all well.
(To see how feedback can be crucial, think bio-feedback, where people can
learn to control normally unconscious physiological states of their own body
when but only when they are given direct feedback on it.)

It is notable however that he refuses to analyse the illumination
experiments, which don't fit his analysis, on the grounds that they haven't
been properly published and so he can't get at details, whereas he had
extensive personal communication with Roethlisberger & Dickson.

Possibly a longitudinal learning effect. But Mayo says it is to do with the
fact that the workers felt better in the situation, because of the sympathy
and interest of the observers. He does say that this experiment is about
testing overall effect, not testing factors separately. He also discusses it
not really as an experimenter effect but as a management effect: how
management can make workers perform differently because they feel differently.
A lot to do with feeling free, not feeling supervised but more in control as
a group. The experimental manipulations were important in convincing the
workers to feel this way: that conditions were really different. The
experiment was repeated with similar effects on mica splitting workers.

Franke & Kaul (1978) offered yet another interpretation for the management
psychology field, and argued it better in Franke (1980).
This is argued and summarised
here.

When we refer to "the Hawthorne effect" we are pretty much referring to Mayo's
interpretation in terms of workers' perceptions, but the data show strikingly
continuous improvement. It seems quite a different interpretation might be
possible: learning, expertise, reflection — all processes independent of the
experimental intervention? However the usual Mayo interpretation is certainly
a real possible issue in designing studies in education and other areas,
regardless of the truth of the original Hawthorne study.

Recently the issue of "implicit social cognition" i.e. how much weight we
actually give to what is implied by others' behaviour towards us (as opposed to
what they say e.g. flattery) has been discussed: this must be an element here
too.

Clark & Sugrue (1991, p.333) in a review of educational research
say that uncontrolled novelty effects cause on average 30% of
a standard deviation (SD) rise (i.e. 50%-63% score rise),
which decays to small level after 8 weeks.
In more detail: 50% of a SD for up to 4 weeks;
30% of SD for 5-8 weeks; and
20% of SD for > 8 weeks, (which is < 1% of the variance).

David Carter-Tod says of the same newspaper piece:
Interestingly in the process of doing a quick search on this I came across
the following quote:
A psychology professor at the University of Michigan, Dr. Richard Nisbett,
calls the Hawthorne effect 'a glorified anecdote.'
'Once you've got the anecdote,' he said, 'you can throw away the data.'"
A dismissive comment which back-handedly tells you something about the
power of anecdote and narrative. There is however, no doubt that there is
a Hawthorne effect in education particularly.

Don Smith says:
I recall studying the Hawthorne Effect as an undergraduate for a
management degree years ago. At that time the message was that if a group
knew they were being studied the results may be biased.

However, I found Harry Braverman's comments in his book "Labor and Monopoly
Capital" more interesting. According to Braverman, the Hawthorne tests were
based on behaviorist psychology and were supposed to confirm that workers
performance could be predicted by pre-hire testing. However, the Hawthorne
study showed "that the performance of workers had little relation to ability
and in fact often bore a reverse relation to test scores...".

What the studies really showed was that the workplace was not "a system of
bureaucratic formal organization on the Weberian model, nor a system of
informal group relations, as in the interpretation of Mayo and his followers
but rather a system of power, of class antagonisms".

According to Braverman this discovery was a blow to those hoping to apply
the behavioral sciences to manipulate workers in the interest of management.

My view:
What is wrong about the quoted dismissiveness is that there was not 1 study,
but 3 illumination experiments, and 4 other experiments: only 1 of these 7 is
alluded to. What is right is that a) there certainly are significant
criticisms of the method that can be made and b) most subsequent writing shows
a predisposition to believe in the Hawthorne effect, and a failure to read the
actual original studies.

The experiments were quite well enough done to establish that there were large
effects due to causal factors other than the simple physical ones the
experiments had originally been designed to study. The output ("dependent")
variables were human work, and we can expect educational effects to be
similar (but it is not so obvious that medical effects would be). The
experiments stand as a warning about simple experiments on human participants
as if they were only material systems. There is less certainty about the
nature of the surprise factor, other than it certainly depended on the mental
states of the participants: their knowledge, beliefs, etc.

Candidate causes for the observed results in the Hawthorne studies are:

Expectancies / expectation effects. Quite often it is not either
personal motivation or social pressures, but simply the individual acquiring
from others information about the speed of work which is usual, which you can
expect of yourself; e.g. when you ask someone how long it will take you to
climb this hill, or prepare this recipe.

Learning effects. People get better at everything with practice.
Counterbalancing in experimental designs can control for this. However
assymmetric learning effects can occur and undermine the counterbalancing.

Feedback: can't learn skill without good feedback. Simply providing
proper feedback can be a big factor. This can often be a side effect of an
experiment, and good ethical practice promotes this further. Yet perhaps
providing the feedback with nothing else may be a powerful factor.

The attention of observers (e.g. experimenters).

As an important special case of this: specific and known expectations of
others (e.g. experimenters, observers, supervisors, oneself, ....)

Parsons implies that (6) might be a "factor" as a major heading in our
thinking, but as a cause it might turn out to be reduced to a mixture
of individual, not social, effects: (2, 3, 5). That is: people might take on
pleasing the experimenter as a goal, at least if it doesn't conflict with any
other motive; but also, improving their performance by improving their skill
will be dependent on getting feedback on their performance, and an experiment
may give them this for the first time. So you then often wouldn't see any
Hawthorne effect — only when it turned out that with the attention came
either usable feedback, or information about what to expect of themselves, or
a change in motivation.

Adair (1984): warns of gross factual inaccuracy in most secondary publications
on Hawthorne effect. And that many studies failed to find it, although
nevertheless some did. He argues that we should look at it as a variant of
Orne's (1973) experimental demand characteristics. So for Adair, the issue is
that an experimental effect depends on the participants' interpretation of the
situation; that this may not be at all like the experimenter's interpretation
and the right method is to do post-experimental interviews in depth and with
care to discover participants' interpretations. So he thinks it is not
awareness per se; nor special attention per se; but you have to investigate
participants' interpretation in order to discover if/how the experimental
conditions interact with the participants' goals (in participants' view).
This can affect whether participants' believe something, if they act on it or
don't see it as in their interest, etc.

The research was and is relevant firstly in the "Human Resources Management"
movement. The discovery of the effects in the Hawthorne studies was most
immediately a blow to those hoping to apply the behavioural sciences to
manipulate workers in the interest of management.

Other interpretations it has been linked to are:
Durkheim's 'anomie' concept;
the Weberian model of a system of bureaucratic formal organization;
a system of informal group relations, as in the interpretation of Mayo and his
followers; a system of power, of class antagonisms.

Gillespie (1991) stresses the diversity of interpretation of the Hawthorne
experiments at the time and among the researchers involved, as well as later
and by others.

He also stresses that although workers (subjects) were extensively interviewed
at times during the trial, Mayo developed arguments that were widely accepted
for dismissing their interpretations, and imposing other interpretations.

He also points out that these researchers, and much of this field, assumes
that happier workers are more productive workers. This was not only used to
justify seeking higher productivity (as in the interests of workers as well as
management), but led to using measures of productivity directly as measures of
worker happiness.

With the advantage of hindsight, and of the wider methodological issues
explored within this web page, I would now suggest that the Hawthorne studies
are enough to dismiss as naive both (a) the simplest Taylorist view of
expecting that studying only material aspects of workers' behaviour (e.g. time,
motions, illumination) would be adequate; but equally (b) Mayo's view that
thinking only of management's relationship to the workers and issues of human
compassion and relationships are adequate. Instead also, and in fact
more widely important, are issues of how people govern their behaviour by
expectancies about the time, effort and quality they should aim for,
and the manifold sources they use to acquire or modify those expectancies.
Just because material factors on the one hand, and management-worker
relationships on the other, sometimes have an effect does not mean that this
exhausts or even dominates the important causal factors determining how
productive a person is. Similarly, these areas do not exhaust, nor even
predominate amongst, the issues which experimental designs with human
participants should attempt to address.

In the light of the various critiques, I think we could see the Hawthorne
effect at several levels.

At the top level, it seems clear that in some cases there is a large effect
that experimenters did not anticipate, that is due to participants' reactions
to the experiment itself. This is the analogue to the Heisenberg uncertainty
principle BUT (unlike in quantum mechanics) it only happens sometimes. So
as a methodological heuristic (that you should always consider this issue)
it is useful, but as an exact predictor of effects, it is not: often there is
no Hawthorne effect of any kind. To understand when and why we will see a
Hawthorne or experimenter effect, we need more detailed considerations.

At a middle level, I would go with Adair (1984), and say that the most
important (though not the only) aspect of this is how the participants
interpret the situation. Interviewing them (after the "experiment" part) would
be the way to investigate this, and to build in a precautionary check in
every experiment.

This is important because factory workers, students, and most experimental
participants are doing things at the request of the experimenter. What they
do depends on what their personal goals are, how they understand the task
requested, whether they want to please the experimenter and/or whether they
see this task as impinging on other interests and goals they hold, what they
think the experimenter really wants. Besides all those issues that determine
their goals and intentions in the experiment, further aspects of how how they
understand the situation can be important by affecting what they believe about
the effects of their actions. Thus the experimenter effect is really not one
of interference, but of a possible difference in the meaning of the situation
for participants and experimenter. Since all voluntary action (i.e.
actions in most experiments) depends upon both the actors' goals AND on their
beliefs about the effects of their actions, differences in understanding of
the situation can have big effects.

At the lowest level is the question of what types the specific causal factors
might be. The rest of this web page elaborates on them, but a preliminary set
might be:

Material ones that are intended by the experimenter

Feedback that an experiment might make available to the participants

Changes to motivation, goals, and beliefs about action effects induced by
the experimental situation.

Expectancies e.g. what to expect about how long it takes to do something.

According to Rosenthal & Jacobson (1968), Jastrow (1900) reported a
different striking effect on workers being trained on the then new IBM
Hollerith punch card machines in the US census bureau. The first group were
expected by the inventor to produce 550 per day, and did so but had great
difficulty in improving on that. However a second group who were isolated
from the expectation were soon doing 2100 per day.

In my own practice, asking students to do the novel task of writing a
paragraph on an unexpected topic in 5 minutes led to no words being written in
the time; but asking them plus mentioning that most students write about 15
lines in that time, led to almost all of them meeting that quiet expectation.

Rosenthal & Jacobson (1968/1992) report and discuss at length an important
effect, which I shall call the Pygmalion effect.
Basically, they showed that if teachers were led to expect enhanced
performance from some children then they did indeed show that enhancement,
which in some cases was about twice that showed by other children in the same
class.

The biggest study was at "Oak school": a US primary school. Teachers were
deceived into believing that a set of one fifth of their class were expected to
develop much faster than the rest, as measured by IQ points. In fact, this set
was randomly selected; or rather, selected by stratified random sampling, the
better to guarantee that they were extremely similar in both mean and
variation to the rest of the class. The main measure was a kind of IQ test,
administered at the start of the school year (pretest) and at 4 months (end of
first semester), 8 months (end of second semester and of first year of
school), and 20 months (end of second school year with a different teacher).
Maximum overall effect at 8 months, but a lot of gain still present at 20
months. There was a big effect on first and second grade children by the end
of the first year. By the end of the second year, much of this had gone in
those classes, but in other classes positive effects had emerged for the first
time. Girls and boys gained in somewhat different ways (verbal vs. reasoning
subscales). The advantage was true of pre/post test of an IQ test. It was
also true of teacher assessments e.g. reading grades, which showed big effect
in third grade as well. They also did blind retesting of a sample by an
examiner who was not the teacher, and who didn't know which were supposed to
do well, and got results showing a greater difference.

Another effect was that pupils in the control group who improved against
expectation were disliked by teachers, or at least showed signs of being in
conflict.

This is the biggest and most careful study. But besides primary school
pupils, it has also been shown for algebra at the (US) Air force academy, and
for university students as well.

Although not of central importance here, of huge importance in educational
research in general is the issue of teacher effects.
Tim O'Shea once told me that in all studies where one of the variables was the
teacher, the effect of different teachers was always bigger than the effect of
different treatments (usually what was meant to be being studied).
Basically, teachers have a huge effect but one we don't understand at all.

If we did, we could train teachers to use best practice in the sense of getting
the best effects: but we have no idea how to do that.
Assuming this is true, this is the most important effect in the whole field of
education. (Consider: if this was true in medicine, then it wouldn't matter
much what treatment you gave a patient, the most important thing would be to
get the best doctor regardless of drugs, surgery or other treatments.)
It also implies that the professionalisation of teaching does not entail
improvement in learning or in any rational basis for treating learners, though
it may from a social viewpoint or of course from the viewpoint of the benefits
to practitioners of restrictive practices and regulation to exclude the worst
practitioners. However we shouldn't be suprised. Medicine was organised into
its current professional form before there was a single scientifically
justified treatment available: in the UK, the governing professional body, the
General Medical Council, was established by law in essentially its present
form by the 1858 Medical Act. However on an optimistic view, Pasteur's rabies
vaccination, established around 1870, was the first medical treatment based on
scientific evidence; and it has been estimated that 1911 is the first year
when a patient was objectively likely to benefit from being treated by a
doctor.
(L.J Henderson: "somewhere between 1910 and 1912 in this country, a random
patient with a random disease, consulting a doctor at random had, for the
first time in the history of mankind, a better than a fifty-fifty chance of
profiting from the encounter." as quoted in John Bunker (2001) "Medicine
Matters After All: Measuring the benefits of medical care, a healthy
lifestyle, and a just social environment" (Nuffield Trust))

Note too that all this casts doubt on the value of training teachers, apart
from giving them practice to learn for themselves: if we don't know what it is
about teachers' behaviour that has such large effects on learning, how can we
usefully train them? In the absence of this knowledge, the only measure of a
teacher's worth is the comparative learning outcomes of their students.
However neither teachers nor teacher training is usually assessed by this. So
while it is quite possible that teachers learn either by unaided practice, or by
unconscious imitation of other teachers (apprenticeship learning), there is
almost no evidence on whether that training makes a difference.

The empirical observation of the importance of teachers has major implications
for theory. Because they are of such large importance, I prefer
Laurillard's theory of the learning and teaching
process to others since it gives equal weight to learners and to teachers, and
I regard slogans such as "learner-centered" and theories such as
neo-constructivism to be flawed because they do not acknowledge or give a
place to teachers of the prominence that they in fact have in the causation
of learning.

So given the importance of teacher effects, what is the evidence?
I need to do a proper review of this. But the Pygmalion effect is one big
demonstration of the effect of teachers, showing they can double the amount of
pupil progress in a year. Rosenthal & Jacobson (1992) also mention
briefly research that showed that 10 secs of video without sound of a teacher
allows students to predict the ratings that person will get as a teacher.
Similarly hearing the sound without vision AND without content (rhythm and
tone of voice only) were enough too. This is powerful evidence that teachers
differ in ways they cannot easily or normally control, but which are very
quickly perceptible, and which at least in students' minds, determine their
value as a teacher. (And Marsh's (1987) work shows that student ratings of
teachers do relate to learning outcomes.)

This also brings out an essential difference between medicine and education.
In education, the teacher is supposed (except by radicals) to be a major
cause of learning; while in medicine it is supposed to be the "treatment"
regardless of who administers it.

Placebos are things like sugar pills, that look like real treatments but in
fact have no physical effect. They are used to create "blind" trials in which
the participants do not know whether they are getting the active treatment or
not, so that physical effects can be measured independently of the
participants' expectations. There are various effects of expectations, and
blind trials control all of these together by making whatever expectations
there are equal for all cases. Placebos aren't the only possible technique
for creating blindness (unawareness of the intervention): to test the
effectiveness of prayer by others, you just don't tell the participants who
has and has not had prayers said for them. To test the effect of changing the
frequency of fluorescent lights on headaches, you just change the light
fittings at night in the absence of the office workers (this is a real case).

Related to this is the widespread opinion that placebo effects exist, where
belief in the presence of a promising treatment (even though it is in fact an
inert placebo) creates a real result e.g. recovery from disease. Placebos as
a technique for blinding will remain important even if there is no placebo
effect, but obviously it is in itself interesting to discover whether placebo
effects exist, how common they are, and how large they are. After all, if
they cure people then we probably want to employ them for that.

Claims that placebo effects are large and widespread go back to at least
Beecher (1955). However Kienle and Kiene (1997) did a reanalysis of his
reported work, and concluded his claims had no basis in his evidence; and then
Hrobjartsson & Gotzsche (2001) did a meta-analysis or review of the
evidence, and concluded that most of these claims have no basis in the
clinical trials published to date. The chief points of their sceptical
argument are:

Only trials that compare a group that gets no treatment with another group
that gets a placebo can test the effect.

Most claims are based on looking at the size of the improvement measured in
placebo groups in trials which compared only placebo and experimental (active)
treatments. This is misleading since most diseases have a
substantial clearup rate with no treatment: seeing improvements doesn't mean
the placebo had an effect. (And also because of regression to the mean: for
the difference, see below.)

Nevertheless, even they conclude that there is a real placebo effect for pain
(not surprising since this is partly understood theoretically: (Wall, 1999));
and for some other continuously-valued subjectively-assessed effects e.g.
fatigue, nausea. A recent experimental demonstration was reported:
Zubieta et al. (2005) "Endogenous Opiates and the Placebo Effect"
The journal of neuroscience vol.25 no.34 p.7754-7762.
This seems to show that the psychological cause (belief that the placebo
treatment might be effective in reducing pain) causes opioid release in the
brain, which then presumably operates in an analogous way to externally
administered morphine.

A more extensive review of the overall dispute is Nimmo (2005) and another is
Woolfson (2009).
(See also
Hrobjartsson, A. & Gotzsche, P,C (2006)
"Placebo interventions for all clinical conditions"
Cochrane Database of Systematic Reviews issue 3.)

N.B. the opposite of placebo is nocebo: something that although in fact
materially neutral, causes harm in the patient because they believe it
will harm them: see a review by Barsky et al. (2002).

Kaptchuk offers some ideas about what the key factors might be in the placebo
effect (i.e. in positively enhancing healing, but not from the material
effects of the treatment or drug). He suggests it is the effect of hope,
attention and care; and to do with the social, not just their personal, beliefs
i.e. with "the healing drama". That is, like the Pygmalion effect, if those
round you believe in the effect that may itself have an important effect.

In turn that suggests to me these points:

It has been established that persistent stress (unlike sharp, but
passing, shocks) reduces immune system function. One service which doctors can
perform is to tell a patient that they will recover. This is often not at all
certain to the patient and their family. One of the reasons for mental
illness being frightening to most people, is they don't believe it will pass:
this is in strong contrast to flu, food poisoning, broken bones, where they
know they will recover. The assurance relieves anxiety and stress, which in
turn improves immune system functioning, and so recovery. This, if correct, is
a case where doctor's knowledge speeds recovery without any treatment i.e. it
is entirely consistent with an approach of scientific medicine, though the
mechanism is not a material one.

A few people are born without a working system for feeling pain (Wall;
1999). They usually die by the time they are 20 from osteomyelitis because
when they sustain everyday small injuries, they don't "favour" them as their
body tries to heal the injury: they don't treat themselves with tenderness
(shielding little cuts from continual re-opening, limping and avoiding further
strain to damaged ligaments, etc.). This turns out to be so important that it
kills them slowly but inexorably. Thus it could be that inducing a patient
(and those around them) to care for themselves is sometimes a key factor, and
the basis for real healing effects independent of material interventions. This
thought is further supported by the consideration that today in developed
countries, the leading causes of death (heart disease, cancer) are thought to
be largely controlled by lifestyle behaviours that the patient, not doctor,
controls: persuading patients to behave differently is the key to controlling
mortality.

Currently it is apparent that the placebo effect is real and important with
pain, but may not exist elsewhere. This is not grasped by some important
authors. So meta-analyses e.g. by Hrobjartsson may show no effect, but this is
probably because he averages together placebo effects across many fields; and
conversely Benedetti argues that all drug trials should be done differently
even though he has shown important effects only for pain. It also brings out
that for good pure science, you should compare no-treatment, placebo, and
treatment (and additionally seriously consider learning effects from repeated
administrations); but for applied science, the standard treatment vs. placebo
trial is mainly good enough. It will only miss out on how important placebo
effects are in potentiating both drugs. But still such standard trials measure
drugs in
approximately valid clinical conditions, where patients believe the treatment
is probably effective.

Note too the strong distorting tendencies of different fields: in drug trials,
researchers tend to attribute the whole efficacy of the control (placebo)
condition to a placebo effect whereas much of it will be due to
spontaneous recovery; while physicians see it as like bedside manner:
something to please the patient independently of objective healing mechanisms,
whereas the neurophysiological effects of the placebo effects have now been
well established.

Benedetti has shown that some pain drugs rely in part on placebo effect: they
are real but there is a statistical interaction with patient knowledge of
getting a drug i.e. the placebo effect is necessary to potentiate the drug,
which then adds value on top of placebo alone. Same is true of diazepam's
(valium) effect on anxiety.

Can do (they have) trials where patient doesn't know when they are getting the
drug; but can be informed consent because they know they might, or they will
at some time.

Kaptchuk and irritable bowel syndrome: Placebos did better than no treatment,
and attention and pseudo (placebo) treatment added to give an effect as big as
the best drug.

Finally: here's where you can buy one!
"Obecalp, the First Standardized, Branded and Pharmaceutical Grade Placebo is
Now Available for Sale at InventedByAMommy.com":
"Obecalp"

Both are psychological effects on the participants, causing an effect when
the material intervention has no effect.

Both are effects produced by the participants' perceptions and reactions;
but the former emphasises their response to new equipment or methods, while
the latter emphasises their response simply to being studied.

The leading suspected cause in the placebo effect is the participants'
false belief in the material efficacy of the intervention. The leading
suspected cause in the Hawthorne effect is the participants' response to being
studied i.e. to the human attention.

In both cases, the experimenter may be deceiving the participants, or may be
mistakenly sincere, or neutral with respect to the effects of the technology or
intervention. In general however, the experimenter appearing to the
participant to believe in the efficacy of the intervention, while not
essential, may be more, or more often, important to the placebo effect than to
the Hawthorne effect.

Related to placebos is evidence about how positive thinking can improve heath
outcomes.
This area shows that healing and recovery is affected by various kinds of
positive thinking. This is not mostly about a placebo effect confounding an
experiment, but about better medical outcomes for physically treated patients
depending on whether they additionally have positive thinking. The mediating
causes are probably: effects on pain; on how well we look after ourselves; and
on reduction in persistent stress.

Placebos are used to create blind trials. They are not the only technique
for this, but are a common and important one.

Whether or not there is a placebo effect, placebos will remain an
important technique for this.

Recent sceptical meta-analyses of placebo effects suggest that the effect
does exist, but only in very limited contexts. The widespread claims are
mainly misplaced, based on faulty inferences.

Some interesting suggestions about how the placebo effect might work —
what mediates it — are being made; and involve unconscious actions by the
practitioner as well as the patient.

Placebos are often seen as posing ethical difficulties.
Essentially the issues are of two kinds, neither about placebos alone.

Deceiving experimental participants, or at least witholding information.
This is potentially in tension with the principle of informed consent. This
is most acute for experiments that wish not just to achieve blinding, but to
measure the effect of expectancies, and so wish to induce expectancies by
misinforming (some) participants.

Witholding treatment from patients (or education from students). The
tension here is between the greater certainty a controlled experiment will
give, versus the prior guesses of people and experts. After all, you probably
wouldn't do an experiment unless you had some reason to hope a treatment
worked; but if you do have such grounds, then should your opinion of the best
treatment be given to all patients rather than give some a placebo?
This pits authority against science: "expert opinion" i.e. ignorant but
sincere authority against evidence-based knowledge.

This section is a list of names in the literature purporting to identify
"effects".

The placebo effect in medicine, where getting an inert
(e.g. sugar) pill has a large positive effect.
Many believe that there are often large positive effects apparently simply
from the expectation created in the patient: if true, this is the placebo
effect, where the intervention in fact has no material effect, but the belief
by the participant does. Although often transmitted from the doctor's
expectancies, it may be independent of the doctor. It may show particularly
strongly in side-effects, where the number and severity of side-effects may be
three times larger when patients are warned about the possibility in both
groups that get the active treatment and in the placebo group. However as
noted above, some do not believe any such effect exists.

The Hawthorne effect (French, 1953): simply of being
studied. Aspects of this suggest that the effect did not depend on the
particular expectation of the researchers, but that being studied caused the
improved performance. This might be because attention made the workers feel
better; or because it caused them to reflect on their work and reflection
caused performance improvements, or because the experimental situation
provided them with performance feedback they didn't otherwise have and this
extra information allowed improvements.

The John Henry effect (Zdep & Irvine, 1970)
is the opposite of the Hawthorne effect: it is when a supposedly control
group, that gets no intervention, compares themselves to the experimental
group and through extra effort gets the same effects or results. A kind of
counter-suggestibility.

Jastrow's effect on factory work was much bigger: here an
explicit expectation about performance was transmitted and turned out to
change output by a factor of three.
(Rosenthal & Jacobson, 1968; Jastrow, 1900.)

The Pygmalion effect or "expectancy advantage" is that
of a self-fulfilling prophecy. Teachers' expectations of pupils can strongly
affect (by about a factor of two over a year) the amount of development they
show. (Rosenthal & Jacobson, 1968)

The charisma effect.
This term really is most used as one of the rival theories of leadership, not as
a general problem to watch out for in studies of humans.
However Flynn (2012) uses it to refer to something like the Hawthorne effect.

The halo effect.
Coined by Thorndike, this is a specific type of cognitive confirmation bias,
wherein positive feelings in one area cause ambiguous or neutral traits to be
viewed positively. In other words, as in the use of brands in marketing,
positive attitudes can spread from one aspect or thing to associated ones.
E.g. from the researcher to the task; from the use of new technology to the
amount you are learning.

The novelty effect:
the participant performs differently at first because of the novelty of the
"treatment" which may change their expectation, or simply cause them to be
more alert or otherwise perform differently. The experimenter is not
important, but a materially unjustified belief, perhaps from other social
media, may be (e.g. participants think the technology / educational
intervention is wonderful and that belief is the real cause of raised
outcomes); or else simply the novelty rather than belief matters, if it
operates through (say) attention rather than through expectancies.

Experimenter effects.
Specific expectations acquired, consciously or not,
from the researcher. Some experimenter effects have been demonstrated equally
in positive and negative directions.
Rosenthal (1966) describes experimentally tested experimenter effects in
behavioral research, which is summarised by Rosenthal & Jacobson (1992).
Prophesying a difference caused research assistants to create
an effect, and this could be done equally in either direction (i.e. can create
a positive or negative effect this way). This was done where the experimental
task being manipulated required judgements by the nominal participants.
However this was about one tenth the size of the effect prophesied, so it
would be quite wrong to describe this as "seeing what you expect": it would be
more accurate to suggest that experimenters could influence subjects on
marginal cases and so systematically bias (only) within the range of
experimental "noise". However if stooges acting as the first subjects behaved
differently, this overrode and created a more effective expectation (and
consequent effect on real subjects). Such effects have also been demonstrated
in animal experiments and on learning and IQ tests/tasks at least sometimes.

Observer effects. Much as for experimenter effects.

Trial effects. As for experimenter effects, but with the
emphasis on the activity not the person of the experimenter.

Research participation effects. Ditto: perhaps a more
general label for observer, experimenter, trial, and novelty effects.

Demand characteristics
(wikiP).
A more established term, with the effects coming from the (perceived)
requirement being placed on the participant.
After all, even conversations have strong demand characteristics (being
relevant, being polite, etc.) which strongly modify what we say; and given also
that in most human experiments, the participants are either paid or are
volunteers, it must be expected that they try hard to do what they perceive is
wanted. So interpreting any such experiment as somehow showing what the
participants themselves want or prefer is obviously problematic.

No-one knows the mechanisms behind these effects. However it is not hard to
generate speculations on how they might be advantageous and so
quasi-rational.
Note that not all conceivable effects are in fact observed. The cases where
they are and are not, may often be explained by where they are rational,
advantageous.

John Henry: act to maintain respect and status.

Hawthorne: people paying you attention makes you feel like working harder.
Aspects of the Hawthorne studies suggest that it was not that the researchers
expected a better result in every case, but that being studied caused improved
performance. This is similar to job interviews, and competitive events, where
participants often perform much better than their average performance because
they are being studied and assessed, not because they are expected to do
better.

Hawthorne: being studied prompts a person to reflect more on the task
i.e. to study their performance and themselves as well; and reflection often
causes them to improve their performance gradually because they are thinking
about it, and about improving it.

Hawthorne: Parson's argument, primarily about feedback provision, is that
learning (improving a skill) requires plenty of feedback on your performance.
If an experiment provides that (as a side effect of making experimental
measurements) where it wasn't readily available before, you may see
performance improvement due to that alone. He argues convincingly that some
of the original Hawthorne studies did provide performance feedback to workers
which they didn't previously have access to.

Jastrow, Pygmalion, Experimenter effects:
where participants are strongly affected by the expectations expressed by
their "bosses". In work, a worker donates their time to an employer in return
for pay but remains responsible for the amount of effort they make. We know
from injuries in sport and in the home (e.g. lifting things) that it is
dangerous to have unrealistic estimates of one's own capabilities, so this is
rational and in both workers' and usually managers' interests to get these
estimates right. Thus expectations are rightly central in regulating, and so
limiting, output; and naturally we are very likely to pay some attention to
others' informed opinion on these.

Novelty effects. Conversely if a participant normally has a zero
expectancy about something, then something new may get them to review it. In
education if a student does not believe they can improve at something then
they won't try (e.g. "I can't do maths", "Perfect pitch is an innate ability
so there is no point in me practising"), but an experiment might make them
change this assumption and so start making an effort to learn (placebo effect,
novelty effect).

Placebo effects.
These have been shown to operate on pain and on effects like nausea; but not
to heal broken bones. This makes sense because (contrary to common sense) it
has been shown that cognitive expectations have a big effect on the operation
of pain (Wall, 1999) (but not on bone growth); and also on perceptions of
fatigue e.g. when running (Lovett, 2004). There is no strong evidence of it
operating on any other medical problem. And this shouldn't surprise us, since
(contrary to our intuition) pain is not a directly sensed objective sensation
carried one-way by special nerves: actually there are the same number of
descending fibres from the brain to peripheral pain receptors. Clearly there
is a well established, long evolved system for having the mind change pain
sensations. Given that, it should be no surprise that a variety of methods
(e.g. meditation, hypnosis) can modify pain sensation (easily by a factor of
four) via the mind.

On the other hand, as discussed above, the real life-critical aspect of the
pain system seems to be in getting us to "nurse" injuries until they heal:
which is consistent with how the mind generally cannnot turn pain off
entirely, only modify its intensity depending partly on current priorities.

Demand characteristics and expectancies.
In relation to the Hawthorne effect on the rate of workers' task performance,
expectancies are probably the biggest issues, and again for good reason.
When you set out to do a "task" such as walk along a corridor, this is massively
under-specified. You can sprint, walk fast, walk slowly, etc. Even in
walking to work, my own walking speed varies by about a factor of 2 depending
on time of day, caffeine level, or whether I have a companion.
Humans in fact probably have huge adaptations to make us extremely ready to
modify such actions to align with others. Firstly: any predator which pursues
its prey, adapts its motion in fine detail depending on the moment to moment
movement of the prey. Secondly: team work requires fitting your motions in
with others' if you are all lifting or carrying one thing. Thirdly, in recent
decades research has made it clear that in conversing with another person, we
"align" i.e. converge to a single level, the speed, loudness of speech, the
length of pauses, aspects of vocabulary, syntax and semantics.
The difference between using a focus group and an interview is whether the
researcher wants the participant(s) to converge on participant-language or on
researcher-language. Both are important methods, depending on the aim.

The specific lesson, then, is that for most physical tasks there is a tradeoff
between speed and accuracy/quality/error rate. And in fact there are two kinds
of "error" rate: one for defects in the product; another for injuries to the
worker e.g. hurting your back carrying something the wrong way, or for too long,
or which is too heavy. If this tradeoff isn't specified, the participant MUST
nevertheless decide on it. If the experimenters leave this necessary decision
process to chance, they will get results which are hard to interpret.

There are many terms that have been used in one way or another refer to how the
behaviour of human participants is modified by the experiment itself.
These terms suggest different attitudes to this problem area.
McCambridge et al. (2014) are largely right to conclude that:
"there is no single Hawthorne effect. ...
Consequences of research participation for [the] behaviors being investigated do
exist, although little can be securely known about the conditions under which
they operate, their mechanisms of effects, or their magnitudes. New concepts
are needed to guide empirical studies."

Actually, I think we can already do a bit better than that. It is quite wrong
to think that all these effects apply all the time, and that no useful
experiments with human participants can be done.

The placebo effect is real, but only applies to very limited cases such as
the treatment of pain, nausea and fatigue. There is no good evidence of it
operating on other medical problems. And there seem to be good reasons for
this (see previous section).

Unless the task in an experiment is unusually well specified, then
expectancies are likely to be an important factor for the reasons outlined
above. This in turn interacts strongly with how participants' understand "the
task" and what is being asked of them: so their interpretation of the
situation is often another large factor.
To repeat, simply telling participants to speak, to walk, to pull, to sew on a
button is massively under-specified: they MUST somehow fill in all the missing
details especially of the desirable speed and accuracy; and are likely to try
to fill them in by imagining what you want, or what interests them, or ...

A more general lesson is about conduct of any study with human participants.
Coombs & Smith (2003) argue that these issues justify action research
rather than lab. experiments as a method, but often it may rather be that their
criticism of naive positivism is more widely applicable. Many naive
researchers say very little to participants "in order not to bias them"; but
this just means the participants all have to make an interpretation of what the
under-specified instructions mean, and will make it differently from each other.
If you want to study individual differences in interpretation (as in Rorschach
blots), then witholding a specified meaning is the way to proceed. But if you
want to study other variables, then more fully specifying the task is what is
wanted, and comparing two groups with the same interpretation of the task but
differences in some other variable (e.g. illumination level). In other words,
the solution is a better controlled, not an uncontrolled, study; which may be
either in the lab or in the field.

In the medical field, a strong adherence to the method of double and triple
blinding in trials, at least of drugs, has developed. We could also use this
as one practical, applied, behaviouristic way of classifying effects in this
area.

Single blind: concealing from the patient (subject, participant) which
"treatment" they are getting, and hence what result to expect.
Some placebo effects are "pure" in that they demonstrate that the patient's
belief independently of the doctor's can be enough to induce effects.

Double blind: also concealing from the researcher (the person adminstering
the treatments and measurements). Many placebo effects and the Pygmalion
effect demonstrate how researchers' expectations can have a large effect,
often because participants perceive these.
It is possible too that expectancies may cause a teacher or doctor acutally
to treat the pupil or patient differently in a way that benefits or
disadvantages them; i.e. to have a direct effect independent of the primary
participants' expectancies.

Triple blind: also concealing treatment and hence expectations from
the persons making measurement judgements e.g. a lab technician classifying
cells as cancerous, a doctor assessing a patient's degree of mobility or pain
as an output (dependent) variable.

The triple blind is in fact a precaution against human psychological effects
working adversely (from the viewpoint of the validity of the experiment) not on
the participants, but on the researchers. It thus applies to "hard" science as
well: or undermines it when ignored.

Mediocre scientists are blind to this danger; the best are not.
"The first principle is that you must not fool yourself — and you are the
easiest person to fool. So you have to be very careful about that. After
you've not fooled yourself, it's easy not to fool other scientists. You just
have to be honest in a conventional way after that." Feynman (1985)

For a long period, astronomers mistakenly thought they could see canals
on the planet Mars (wikiP). Eventually better telescopes made it clear to all
that this was an error; until then, disagreements between observers were not
enough (as perhaps they should have been) to discredit this belief.

Clever Hans: around 1900 a horse "clever Hans" became famous for being
able to answer any questions put by humans. The owner (and interpreter) was
entirely sincere: but eventually turned out to have been fooling himself.
Many histories of psychology mention this important case.

The problem is alive and well in current neuroscience:
Kriegeskorte et al. (2009).

Thus from a practical point of view, there are three classes of humans to be
managed in an experimental trial, and whose expectations have each been shown
sometimes to affect its outcome.

(I learned what is expressed in this section from Stephen Senn (2009).)

It is possible for an experiment to test for a placebo effect clearly, by
comparing those who get the placebo and those who get nothing at all. However
in standard medical trials using a placebo, the new treatment is compared to a
placebo. It is common for the placebo group to show an improvement (reduced
illness) compared to the start of the trial.
There are 3 different kinds of reason for this, and such standard experiments
cannot tell which applies:

The patient was ill (that is, showing symptoms, and would continue
indefinitely to be ill if nothing was done); but the placebo treatment made a
difference, and they are now cured for the indefinite future. This is a real
placebo effect.

The patient was ill, but spontaneously recovered (as we usually do from
most illnesses) and would have done so even if they hadn't had the placebo
intervention.

The patients' symptoms fluctuate over time (generally true of some
conditions such as depression, high blood pressure, hay fever). They were
selected because they had the symptoms, but over the time of the trial, the
symptoms fluctate downwards, so that they look cured at the end of the trial.
However in fact no real change has occurred and the symptoms will continue to
fluctuate up and down. However because only those with symptoms are selected,
this fluctuation is not balanced out by the experiment. This misleading shift
is called "regression to the mean".

Thus many experiments look as if they show a placebo effect because there is a
group that receives a placebo and which shows significant improvement during
the trial. However in many cases this effect is not due to a placebo, but to
either spontaneous recovery or to regression to the mean of fluctuating
symptoms. (Note that this is a case where the within-subjects comparison is
LESS informative than the between-subjects one.)
The standard double blind, placebo controlled trial can not
discriminate between these three cases. Many published papers have asserted
quite erroneous conclusions through not understanding this.

If you want just to find causes and laws, not to achieve any useful practical
effect, then the focus is on isolating causes by controlling experiments and
avoiding things such as the Hawthorne effect. Hence, in medical research,
double blind trials etc.

Note that double blind trials (where neither experimenter nor patient know
which intervention/treatment they are getting during the trial) are quite
practicable for testing pills (where a dummy sugar pill can easily be made
that the patient cannot tell apart from other pills); but not for major
surgery, nor usually for educational interventions that require actions by the
learner: in these cases participants necessarily know which treatment they
have been given.

Double (or triple) blind trials "control for" most of the various effects
above in the sense of making them equal for all groups by removing the ability
of both experimenter and participants to even know which treatment is being
given, much less to believe they know which is the more effective.
They may tend to reduce the placebo effect since the patient knows they have
only a 50% chance that they are getting the active treatment. However they do
NOT remove the Hawthorne effect (only make it equal for all groups in the
trial), since on the contrary the experiment almost certainly makes
participants very aware of receiving special attention. This could mean that
the effect sizes measured in some groups are misleading, and would not be seen
later in normal practice. The trial would be a fair comparison between groups,
but the magnitude of effect measured would not be predictive of the effect
seen in non-experimental conditions, due to a similar "error" (i.e. effect due
to the Hawthorne effect) applying to both groups.

This could, at least in theory, matter. A case in point could be comparing
homeopathic and conventional medicine. Generally a patient will get about 50
minutes of the practitioner's attention in the former case, and 5 minutes in
the latter. It is not hard to imagine that this might have a significant
effect on patient recovery.
A standard double blind experiment (comparing just two treatments) would be
most seriously misleading in a case where both a drug and a "Hawthorne" effect
of attention were of similar magnitude, but not additive (i.e. either one was
effective, but getting both gave no extra benefit): then a conventional trial
would see similar and useful effect magnitudes in both groups, but would not be
able to tell that in fact either giving the drug or giving an hour's attention
to the patient were alternative effective therapies, unless there were also a
third "no treatment, no attention" control group. A thorough experiment would
have to have at least five groups: with either 5 or 50 minutes of practitioner
attention, with either conventional or homeopathic substances taken, plus a
group that got no substance and no practitioner attention.

Finally, neither medicine nor education habitually employ counter-balanced
experimental designs, where all participants get both treatments: one group
gets A then B, and the other gets B then A. This is partly because of the
possibility of assymmetric transfer effects i.e. the effect of B (say) is
different depending on whether or not the participant had A first. For
instance, learning French vocabulary first then reading French literature is
not likely to have the same effect as receiving them the other way round.

Shayer thinks there are distinct questions and stages to address in
applied as opposed to "scientific" research — i.e. in research on being able
to generalise the creation of a desired effect:

Study primary effect: Is there an effect (whatever the cause),
what effect, what size of effect?

Replication: can it be done by other enthusiasts (not only by the
original researcher)?

Generalisability: can it be done by non-enthusiasts? i.e. can it be
transferred via training to the general population of teachers? i.e. without
special enthusiasm or skills. This is actually a test of the training
procedure, not of the effect — but that is a vital part of whether the effect
can be of practical use.

One danger is the Hawthorne effect: you get an effect, but not due to the
theory. The opposite is to get a null effect even though the theory is correct
because transfer/training didn't work. So you need to do projects in several
stages, showing effects at each.

In stage (1) you do an experiment and show there really is an effect, defensible
against all worries. But you still haven't shown what it is caused by:
whether the factors described in your theory, or by the experimenter: i.e. no
defence against Hawthorne. Use one or two teachers, and control like crazy.
In (2) you show it can be done by others: so at least it is not just a Papert
charisma effect, but it still might be a learner enthusiasm effect (of novelty
or halo).
Use say 12 teachers. In (3) you are testing whether training can be done.

Note that if what you care about is improving learning and the learners'
experience, then you may want to maximise not avoid novelty, halo, and Hawthorne
effects. If you can improve learning by changing things every year, telling
students this is the latest thing, then that is the ethical and practical
and practically effective thing to do.

Rosenthal & Jacobson (1992) have a brief chapter proposing methods to
address these effects, at least for "science" studies of primary effects.

They say firstly we should have Hawthorne controls i.e. 3 groups: control (no
treatment); experimental (the one we are interested in); a Hawthorne control,
which has a change or treatment manifest to participants but not one that
could be effective in the same way as the experimental intervention. [This is
the reply to wanting to do triple blind trials, but not being able to avoid
participants knowing something is being done; AND is a response to measuring
the size of the placebo effect as well as of the experimental effect.]

Secondly, have "Expectancy control designs":
2X2 of control/experimental X with / without secondary participants expecting
a result.
[Hawthorne effect and control groups are about subject expectancies;
expectancy controls are about Pygmalion effect i.e. teachers' expectancies.]

So, combining these, they then suggest a 2 X 3 design of {teacher expects effect
or not} X {control, experimental, Hawthorne ctrl i.e. placebo treatment}.
The point of these is not merely to avoid confounding factors but to measure
their existence and size in the case being studied.

N.B. A medical trial with drug and placebo groups is most like having
experimental and Hawthorne-control groups but no pure control group. Adding
the latter would additionally require a matched group that was monitored but
given no treatment. However participants are normally told it is a blind
trial, rather than fully expecting both treatment and placebo to be effective,
so this is not an exact parallel.

Adair (1984) suggests that the important (though not the only) aspect of these
effects is how the participants interpret the situation. Interviewing them
(after the "experiment" part) would be the way to investigate this.
This is also essental in "blind" trials to check whether the blinding is
in fact effective. Some trials which are conducted and probably published as
blind are in fact not. If the active treatment has a readily perceptible side
effect on most patients (e.g. hair falls out, urine changes colour, pronounced
dry mouth) both doctors and patients will quickly know who does and does not
have the active drug. Blinding depends on human perception, and so these
perceptions should be measured.

First party (cf. "single blind"): the pupil or patient
Second party (cf. "double blind"): the teacher or doctor or researcher(Third party (cf. "triple blind"): a rater or lab technician who makes
observations or tests is also blind to the condition s/he is judging)

2nd party expectancy

1st party expectancy

Teacher (mis)led to expect positive result

Experimental group

Control group: no treatment

Hawthorne control: irrelevant treatment / placebo

Teacher (mis)led to expect no effect

Experimental group

Control group: no treatment

Hawthorne control: irrelevant treatment / placebo

Plus interview both first and second parties on how they see (interpret) the
situation.

My comment
We know that all the above effects can have important and unexpected effects.
So we cannot trust results that don't at least try to control for them. A
double or triple blind procedure allows a 2-group experiment to control for
them. Rosenthal's recommended 6-group approach is three times more costly.
However it doesn't merely control but measures the size of all three effects
(placebo, Hawthorne, and the material effect)
separately AND their interactions. If the effects aren't there, that might be
grounds for doing it more simply and cheaply in future. But if they are, then
without the larger design, we cannot know what size of effect to expect in
real life, only that there is an effect that is independent of expectations.
Thus we could see a blind trial as somewhat like Shayer's stage 1
(establishing the existence of an effect), while the larger designs also
address aspects of later practical stages.

Because placebo effects are so large and so prevalent in medicine, blind
trials have become the standard there. Nevertheless they do not give
information about the size of benefit to be expected in real life use. In
fact it may initially be greater than in the trials, because the placebo
effect will be unfettered (everyone will expect it to work after the trials),
but may decline to lower levels later. Another way of looking at it is that
blind trials test the effect of the (say) drug, but resolutely refuse to
investigate the placebo and Hawthorne benefits even though these may possibly
be of similar size and benefit to the patient. Drug companies may reasonably
stick to research that informs their concerns only, but those who either claim
to investigate all causes or those that benefit patients or pupils have much
less excuse.

Currently we don't understand how any of these effects work. This could
probably be done, but would require some concentrated research e.g. on
uncovering how expectancies are communicated (cf. "clever Hans") unconsciously
or anyway implicitly, and what expectancies are in fact generated.

Ann Brown, a notable researcher in education and psychology, has a section on
the Hawthorne effect as a criticism of studies in her field (Brown, 1992;
p.163ff.). As in this web page, she went back to the original literature to
find a considerable difference between the original work and what is often
said about it now.

Her comments relate to several points:

A general "Hawthorne" effect just predicts "improvements". If your
intervention is meant to improve one specific aspect, and that aspect alone
improves (and you are measuring many aspects), the criticism loses plausibility.

A serious applied researcher is mainly interested in improving benefits:
educators want better learning, doctors want more healthy patients, management
consultants want more productive workers. They don't care much why the
improvement happens (only theorists care). If you have to give someone a
placebo to cure them, then surely you should; and discovering that is as
important as a "scientific" cure.

Brown then argues that in the original Hawthorne studies, there were not
always improvements. However when there were, they occurred only when the
workers perceived that there were changes; and that these were benefits
from their viewpoint; and that the workers were in control of the changes.
In her particular work, these are also central features of what she wanted to
bring about in the classroom (learner "empowerment"); and so in this sense,
the Hawthorne effect is exactly what she aimed for.

This began as my own notes; but over time I have taken stuff from others.
Particularly important contributions from Morag Nimmo, Stephen Senn, and the
various workers on the wikiPedia entry on the Hawthorne effect.

French,J.R.P. (1953) "Experiments in field settings" ch.3 pp.98-135 in
Festinger,L., & Katz,D. Research methods in the behavioral sciences
(New York: Holt, Rinehart & Winston)
Cited, with a long quotation, by Jones (1992, p.452).

Gillespie, Richard, (1991)
Manufacturing knowledge : a history of the Hawthorne experiments
(Cambridge : Cambridge University Press)
[Has an extensive bibliography of primary sources on Hawthorne.]

Jastrow (1900) Fact and fable in psychology
(Boston: Houghton Mifflin) [I haven't seen this book myself.]

Jones, Stephen R. G. (1992)
"Was There a Hawthorne Effect?"
The American Journal of Sociology
vol.98 no.3 (Nov., 1992), pp. 451-468,
from the abstract "the main conclusion is that these data show slender to no
evidence of the Hawthorne Effect"

Orne,M.T. (1973) "Communication by the total experimental situation: Why is it
important, how it is evaluated, and its significance for the ecological
validity of findings"
in P.Pliner, L.Krames & T.Alloway (eds.) Communication and affect
pp.157-191 (New York: Academic Press).

Parsons,H.M. (1974) "What happened at Hawthorne?" Science
vol.183, pp.922-932
[A very detailed description, in a more accessible source, of some of the
experiments; used to argue that the effect was due to feedback-promoted
learning.]

Carroll, Robert Todd (2001?)
The Placebo Effect
Accessed on 2004-05-19. [Part of the Skeptics Dictionary. Useful
categorisation of possible types of mechanism for the placebo effect if it
exists.]

Hrobjartsson, Asbjorn; Gotzsche, Peter C. (2001)
"Is the Placebo Powerless? An Analysis of Clinical Trials Comparing
Placebo with No Treatment"
New England Journal of Medicine
vol.344 no.21 May 2001 pp.1594-1602
[Meta-analysis, destroying most but not all of the belief that there is
evidence for a placebo effect.]

Dodes, John E. (2001?)
The Mysterious Placebo
Accessed on 2001-01-19. Originally published in the January/February 1997
issue of Skeptical Inquirer. A nice overview of the placebo effect and how it
influences the study of alternative medicines.

Woolfson, Jenny (2009) Questioning the Power of the Placebo
Given the Substantial Psychological and Physiological effects Generated by
Placebos, should Pharmacologically Inactive Medicines be considered
Ineffective or Indispensable?PDF copy.