The first thing one notices in looking at hisGoogleScholarrecord is that Dr. Guéguen is a remarkably
prolific researcher. He regularly
publishes 10 or more sole-authored empirical articles per year (this total
reached 20 in 2015), many of which include extensive fieldwork and the
collection of data from large numbers (sometimes even thousands) of
participants. Yet, none of the many research assistants and other junior
collaborators who must have been involved in these projects ever seem to be
included as co-authors, or even have their names mentioned; indeed, we have yet
to see an Acknowledgments section in any sole-authored article by Dr.
Guéguen. This seems unusual,
especially given that in some cases the data collection process must have
required the investment of heroic amounts of the confederates’ time.

It seems that some of this research is actually taken quite seriously by some psychologists. For example, it is cited
in recent work by AndrewElliot and colleagues at the University of
Rochester that claims to show that womenwearredclothesasasexualsignal (thus also providing a piece of Dr.
Guéguen’s IV/DV combination bingo card that would otherwise have been missing). The skeptical psychologist Dr. Richard Wiseman
also seems to be something of a fan of Guéguen's work; for example, in his 2009
book 59 Seconds: ThinkaLittle, ChangeaLot, Wiseman noted that "Nicolas
Guéguen has spent his career investigating some of the more unusual aspects of
everyday life, and perhaps none is more unusual than his groundbreaking work on
breasts", and he also cited Guéguen several times in his 2013 bookTheAsIfPrinciple.

But our concerns go well beyond the apparent borderlineteenagesexism that seems to characterise much of this
research. A far bigger scientific
problem is the combination of extraordinary effect sizes, remarkably high (in some cases, 100%) response rates among participants recruited in the street (cf. thisstudy, where every single one of the 500 young female
participants who were intercepted in the street agreed to reveal their age to the
researchers, and every single one of them turned out to be aged between 18 and
25), other obvious logistical obstacles, and the large number of statistical errors or mathematically impossible
results reported in many of the analyses.

We also have some concerns about the ethicality of some of Dr. Guéguen’s field
experiments. For example, in thesetwo
studies, his male confederates asked other men how likely it was that a female
confederate would have sex with them on a first date, which might be a suitable
topic for bar-room banter among friends but appears to us to be somewhat intrusive. In another study, womenparticipantsweresecretlyfilmedfrombehind with the resulting footage being shown
to male observers who rated the “sexiness” of the women’s gait (in order to
test the theory that women might walk “more sexily” in front of men when they
are ovulating; again, readers may not be totally surprised to learn that this
is what was found). In thisstudy, the debriefing procedure for the young
female participants involved handing them a card with the principal
investigator’s personal phone number; this procedure was “refined” in anotherstudy, where participants who had agreed to
give their phone number to an attractive male confederate were called back,
although it is not entirely clear by whom. (JohnSakaluk
has pointed out that there may also be issues around how these women’s
telephone numbers were recorded and stored.)

It is unclear from the studies presented that any of these protocols
received individual ethical approval, as study-specific details from an IRB are
not offered. Steps to mitigate potential harms/dangers are not mentioned, even
though in several cases data collection could have been problematic, with
confederates dressing deliberately provocatively in bars and so on. Ethical
approval is mentioned only occasionally, usually accompanied by the reference
number “CRPCC-LESTIC EA 1285”. This might look like an IRB approval code of
some kind, but in fact it is just the French national science administration’s
identification code for Dr. Guéguen’s own laboratory.

It is also noteworthy that none of the articles we have read mention any
form of funding. Sometimes, however, the expenses must have been
substantial. In thisstudy (hat tip to HarryManley
for spotting it), 99 confederates stood outside bars and administered
breathalyser tests to 1,965 customers as they left. Even though the breathalyserdevicethatwasused is a basic model that sells for €29.95, it seems that at least 21
of them were required; plus, as the “Accessories” tab of that page shows, the
standard retail price of the sterile mouthpieces (one of which was used per
participant) before they were discontinued was €4.45 per 10, meaning that the
total cash outlay for this study would have been in the region of €1500. One would have thought that a laboratory that could
afford to pay for that out of petty cash for a single study could also pick up
the tab in a nightclub from time to time.

This has been quite the saga

It is almost exactly two years to the day since we started to put
together an extensive analysis (over 15,000 words) focused on 10 sole-authored
articles by Dr. Guéguen, which we then sent to the French Psychological Society
(SFP). The SFP’s research department agreedthatwehadidentifiedanumberofissuesthatrequiredananswer
and asked Dr. Guéguen for his comments. Neither they nor we have received any
coherent response in the interim, even though it would take just a few minutes
to produce any of the following: (a) the names and contact details of any of
the confederates, (b) the field notes that were made during data collection,
(c) the e-mails that were presumably sent to coordinate the field work, (d)
administrative details such as insurance for the confederates and reimbursement
of expenses, (e) minutes of ethics committee meetings, etc.

At one point Dr. Guéguen claimed that he was too busy looking after a
sick relative to provide a response, circumstances which did not prevent him
from publishingasteadystreamoffurtherarticles in the meantime. In the autumn of 2016, he sent the SFP a
physical file (about 500 sheets of A4 paper) containing 25 reports of field
experiments that had been conducted by his undergraduates, none of which had
any relevance to the questions that we had asked. In the summer of 2017, Dr. Guéguen
finally provided the SFP with a series of half-hearted responses to our
questions, but these systematically failed to address any of the specific
issues that we had raised. For example,
in answer to our questions about funding, Dr. Guéguen seemed to suggest that
his student confederates either pay all of their out-of-pocket expenses
themselves, or otherwise regularly improvise solutions to avoid incurring those
expenses, such as by having a friend who works at each of the nightclubs that
they visit and who can get them in for free.

We want to offer our thanks here to the officials at the SFP who spent
18 months attempting to get Dr. Guéguen to accept his responsibilities as a
scientist and respond to our requests for information. They have indicated to
us that there is nothing more that they can do in their role as intermediary,
so we have decided to bring these issues to the attention of the broader
scientific community.

Hence, this post should be regarded as a reiteration of our request for
Dr. Guéguen to provide concrete answers to the questions that we have raised.
It should be very easy to provide at least some evidence to back up his
remarkable claims, and to explain how he was able to conduct such a huge volume
of research with no apparent funding, using confederates who worked for hours
or days on end with no reward, and obtain remarkable effect sizes from
generally minor social priming or related interventions, while committing so
many statistical errors and reporting so many improbable results.

Further reading

We have made a copy of the current state of our analysis of 10 articles
by Dr. Guéguen available here, along with his replies (which are
written in French). For completeness,
that folder also includes the original version of our analysis that we sent to
the SFP in late 2015, since that is the version to which Dr. Guéguen eventually
replied. The differences between the
versions are minor, but they include the removal of one or two points where we
no longer believe that our original analysis made a particularly strong case.

Despite its length (around 50 pages), we hope that interested readers
will our analysis to be a reasonably digestible introduction to the problems
with this research. There are one or two
points that we made back in December 2015 which we might not make today (either
because they are rather minor, or because we now have a better understanding of
how to report these issues now that we have more experience with the
application of tools such as GRIM and SPRITE).
Most of the original journal articles are behind paywalls, but none are
so obscure that they cannot be obtained from standard University subscriptions.

08 December 2017

My e-mail address, nick.brown@free.fr, got hacked earlier this week. The hosting company "suspended" the account, which doesn't just mean I can't access it or send mails from it; it also means that if you send me a mail you will get a message that the user doesn't exist.

Their procedure for dealing with this is fairly... amazing. If you read French, it's described here. I had to send them an e-mail explaining what might have been the reason why I got hacked (virus or trojan on my PC, a password that wasn't long enough, reusing the e-mail/password combination as the login details for a site that itself got hacked, etc). Despite their disclaimer ("Il ne s'agit pas ici de distribuer les bons points et les réprimandes"), it seems pretty clear that the point of the exercise is to cause people sufficient annoyance that they take more care in future, a bit like a mildly sadistic schoolteacher forcing a student to write 500 words on "why I will not forget to bring my gym clothes in future"). I posted about this in a French online forum and discovered that several other people have been victims of this too. My hosting company is screwing me over 1000 times worse than the hackers.

The account was suspended on Wednesday evening (6 December 2017 around 17:00 UTC, which is now more than 48 hours ago) and I sent the required e-mail straight away, but I haven't heard anything since. The technical support line is always busy, and in any case I don't know if they provide support for e-mail. The address to which I sent my "explanation" was abuse@theirdomain, so it is presumably in the hands of the e-mail server managers.

The problem, of course, is that for many people, losing access to their e-mail has the potential to be economically disastrous. Yes, we can all do things more securely, but the hackers only sent out a few pieces of spam; the real damage is being done by the company trying to teach me a lesson. And I have a certain amount of computer knowledge. How is J. Random Customer, who just uses, meant to respond to that list of points is beyond me.

I don't know how long this will take to sort out. For all I know it could be forever, since the suspension is presumably triggered by an algorithm and I don't know if anyone is there to read mails sent to abuse@theirdomain. This will be rather boring since I have about 30,000 e-mails in there - pretty much everything I've done for the last five or more years.

As a result of this, I'm starting to move everything over to a new Gmail address, "nicholasjlbrown". This will take a while; I estimate that I have over 150 accounts with various sites out there that use my e-mail address either as the username or the contact address or both. So if I ever lose the password to those I will be stuck; plus, if those sites send me a mail and it bounces, they might have a policy of deactivating the account. So I'm going to have a very boring weekend updating logins (and discovering which sites didn't [yet] bother to implement a mechanism to change your e-mail address; to my surprise, PubPeer is in this category).

If you have been expecting to hear back from me, I might no longer have your address. This applies in particular to people who have written to me in the last few weeks about things that have come up on this blog, so this post is to apologise in advance and invite you to recontact me at my new Gmail address.

There are four studies reported in this article; I want to concentrate on Study 4, although as you will see if you read the whole thing, there are plenty of questions one could ask about the other studies as well.

Brief summary of the study

Participants were male customers in bars. The author's hypothesis was that men would be quicker to approach a woman drinking on her own in her bar if she was wearing shoes with high (versus medium or flat) heels. A female confederate was instructed to sit on her own "at a free table near the bar where single men usually stand" (p. 2231). She was identically dressed in all conditions apart from the size of her heels, and she was told to "cross her legs on one side so that people around could clearly view her shoes" (p. 2231). Meanwhile, two male observers seated nearby timed how long it took before a man approached the female confederate. When this happened, she told the man that her friend was expected to arrive shortly, and one of the observers then "arrived" to meet her, thus ending the interaction with the participant. If no contact was made within 30 minutes, the confederates were instructed to leave the bar.

The results showed that the mean time before a male customer of the bar approached the woman was lower when her heels were higher. This difference was statistically significant for high heels (versus medium or flat heels), but not for medium heels (versus flat heels). Although it was not reported whether contact was made in every case, the degrees of freedom of the reported ANOVA imply that it was, even when the woman was wearing flat shoes.

There are a few readily apparent problems with this study.

1. The research design is inefficient and implausible

This study seems to be a very inefficient way of gathering data. You need three young volunteers (it's not exactly clear why two male confederates were necessary rather than just one) to give up their Wednesday and Saturday evenings for six straight weeks. They have to visit three bars each time, and no mention is made of funding to pay for the drinks that they would presumably need to buy in order to maintain their credibility as ordinary customers. As soon as contact is made between a participant and the female confederate, data collection ends. The three confederates leave the bar and walk to the next, taking care to spend half an hour on the walk so they don't arrive too early for the next session. (Or maybe they drive and spend 26 minutes chatting in the car. Sounds like fun.) And after all this, you get a maximum of three data points in an entire evening.Even the choice of "time taken before someone approaches the female confederate" as the dependent variable seems strange. Let's imagine for a moment that you are the kind of man who goes to bars in the hope of meeting attractive single women. Today is your lucky day; one such individual has just come into the bar and sat down on her own, close to where you hang out with your fellow bachelor drinkers. She is wearing "a skirt and an off the shoulder tight fitting top" (p. 2231). You have to decide whether or not to approach her (presumably before anyone else does, if I may be allowed to show off my limited knowledge of what one might call "folk evolutionary psychology" for a moment). The apparent claim of the study is that the degree of sexual availability conveyed only by the height of the woman's her heels will affect, not whether you ultimately decide to approach her or not, but how long you will hesitate before doing so. I don't find this very convincing. What else are all of the single males in the bar (the number of whom, incidentally, is not reported anywhere in the article) thinking about during that time? Whether they can get a "better deal" if her identical twin appears at the next table wearing slightly higher heels? See also point 4, below.

2. Repeated use of the same bars

The study took place in each of three different bars on twelve
different nights (Wednesdays and Saturdays). The same female
confederate thus made twelve visits to each bar, in each case sitting on her own at a table "near the bar where single men usually stand" (p. 2231). You might imagine that the staff or
the regular customers of the bar might notice what was going on, as a
different man each evening attempted to make contact with the same female confederate who was always identically dressed (apart from her heels) and sitting in an area of the bar where one might not expect a woman who was waiting for her boyfriend to feel comfortable, only
to be told that her friend would be arriving shortly (which, indeed,
transpired every time). But the article describes no precautions that might have been taken to deal with this
issue, which has obvious implications for the validity of the study. After a few visits, the regular customers might have started taking bets among themselves as to who was going to try his luck this evening (perhaps trying not to giggle as he introduced himself with "Hello, I’ve never seen you here before"), only for the woman's boyfriend to show up immediately afterwards. Even if the staff were aware of the experiment, it would seem to be hard to take into account the possible range of behaviours of young single men in a bar, especially just before midnight on a Saturday evening.

3. The effect size is huge

Remember that the only difference between the conditions was the height of the woman's heels, which, even with her legs crossed as described in the article, were probably not going to be something that many people --- even single men on the lookout for some action --- would necessarily even notice. Yet, Cohen's (1988, pp. 274–277) formula gives an effect size (f) of 0.67 for the numbers in Table 4 of Guéguen's article, which corresponds (for k=3 groups) to 1.64 in the more familiar terms of Cohen's d. Such effect sizes are very rare in psychological studies, and indeed in real life (James will be covering this in his next post). It seems highly implausible that a manipulation of this kind could have such an effect.

4. The pattern of behaviour by the men is very strangeDespite my advanced age, my personal lifetime experience of hanging around in bars waiting to hit on single women is exactly zero. However, it seems to me that for individuals who list that particular activity as one of their hobbies, time is probably of the essence. If you're going to start talking to a girl who has just sat down and crossed her legs so you can see how high her heels are, you probably want to do it fairly quickly, if only to stake your claim before any of your buddies does.So what would we expect the distribution of the waiting-time-until-contact to look like? I don't think we can apply something like queuing theory here since the behaviour of the men probably can't be assumed to be random, but I'm guessing it's likely to look like some kind of Poisson or negative binomial distribution, with a lot of guys trying their luck in the first few minutes, resulting in a big right skew.So I decided to simulate some data. For each condition, I generated 12-item samples from a uniform distribution, with a minimum of 0 minutes and a maximum that I determined with some preliminary testing to be the largest possible time that could give the mean and SD reported in the article, plus or minus 0.05 in each case. I ran this simulation until I had 400 samples for each condition, which required about 250 million iterations per condition. Then I plotted the simulated amounts of time to make contact, to the nearest minute, from those samples:

Given that the high heels were meant to be especially irresistible, you might expect a certain number of contacts to have been made within the first minute in that condition. But you can see from the plot that in the high heels condition (blue bars) that no values below 2 minutes were returned by my simulation. In fact when I forced one of the 12 values in the sample to be 30 seconds, I didn't find a single valid sample in 100 million iterations in the high heels condition. When I set the minimum to 1 minute, I found three valid samples, but they all had looked weird: the value of 1 meant that the other values were all very close to 8 minutes (i.e., when the woman was wearing high heels, if one man approached her after a minute, the other 11 would all have had to approach her after 8 minutes, plus or minus a few seconds).You can also see in the above plot that the aggregate of the simulated values in each condition is nicely normally distributed. The most highly skewed 12-item samples were not in fact very badly skewed at all; for example, here is the most right-skewed sample out of 400 in the high heels condition:

So even here, we can see that these single men are taking a certain amount of time before talking to the woman, even though their tongues are apparently all hanging out at the height of her heels. The limiting factor here is that the standard deviations (4.87, 3.67, and 2.18 minutes, for the flat, medium, and high heels conditions, respectively) are too small, relative to the range of values allowed (0 to 30), to allow any of the 12 responses to be very far from the others (or, if one value is a little bit further away, this requires all of the others to bunch up). As we saw in James's post about dead plants and global warming, the subjects in this study all appear to be intensely moderate in their behaviour; the manipulation (increasing the size of the woman's heels) simply reduces the diversity of that moderation somewhat.

5. The reported statistics are incorrect

Readers who are familiar with some recent corrections of work from the Cornell Food and Brand Lab may have been anticipating this problem: the reported F statistic (7.18 with 2 and 33 degrees of freedom) is incorrect. With the given means and SDs, the correct F statistic should be between 8.06 and 8.16, depending on rounding. This does not change the statistical significance of the reported result, but it makes one wonder what numbers were run in order to produce the incorrect F statistic, and where those numbers came from. (Just as an aside, the standard deviations appear to be substantially different between the groups, but no indication is given in the article about whether the standard ANOVA checks for homogeneity of variance were made; however, given the context, perhaps asking for this is like criticising Donald Trump for not having his tie straight.)

Conclusion

The report of this study sounds like it is describing a thought experiment for an undergraduate methods class (in a world where nobody is too concerned about crass sexist stereotypes), rather than the results of a field experiment carried out under real-world conditions. The premise is based on a pastiche of evolutionary psychology (skeptics of this subfield can fill in their own joke here), the scenario is a minefield of strange decisions, the effect size is absurdly large, the implied behaviour patterns of the participants are weird, and the statistics haven't been reported correctly. Yet, this article was the subject of uncritical pieces in Huffington Post (under the headline "High Heels Increase A Woman's Attractiveness, And For Once It's Not A Bogus Survey"), the Boston Globe, and Psychology Today (twice). It seems that there is quite a market for sexist junk science out there.