If I told you I’d have to exclude you: Do online research participants have too much experience?

When I was an undergraduate Research Assistant, I worked with two- to six-year-old kids in a developmental psychology lab. Aside from my general incapacity to get small children to cooperate, the data collection process took an excruciatingly long time. It can be hard to find people to participate in studies, especially when many of the people you’re looking for are three years old. Recruiting enough participants for a single study took a team of about six Research Assistants working for an entire semester.

Then there was the actual data collection. This required in-person “interviewing” sessions with the kids in which we would attempt to stick to a written script so that all of the participants’ sessions would be as similar as possible. We would take this carefully orchestrated and brilliantly crafted scenerio to extract just the information we wanted… and then drop a three-year-old into the middle of it. It was always time-consuming, and sometimes painful. Luckily the research I conduct now with university student participants is not usually this difficult, but it does require wrangling rabid undergraduates away from their cell phones for an hour.

Enter Amazon’s Mechanical Turk (MTurk). It’s one of the most popular platforms for online research in the social sciences—A Google Scholar search of the phrase “Mechanical Turk” yields about 122,000 results. MTurk lets you collect data from a pool of online workers who are paid small amounts of money to fill out surveys and complete other short tasks. As a researcher, Amazon’s Mechanical Turk is one of my favorite tools. I can post a study online and collect data from hundreds of participants in days (or even hours) using a completely automated process. MTurk allows many researchers to conduct important work that might be cost-prohibitive to do using other methods.

MTurk appears to offer a nearly endless supply of ready participants. It boasts a pool of 50,000 workers from around 200 countries.1 Psychologists are somewhat infamous for “convenience sampling,” meaning we tend to study 18 year-old college students, since they’re a captive audience. Thus, we often ask how well our findings generalize to other populations. Compared with student samples, MTurk participants are more representative of the general population with respect to certain demographic characteristics like age, ethnicity, and nationality.2

So, to summarize, there are many benefits to using MTurk. It provides a cost-effective platform that allows researchers to pre-program studies and save weeks or months of long days otherwise spent collecting data in windowless labs, and in the process recruit reasonably diverse participants… consider me sold. But not so fast! Even though there are great advantages to using it, MTurk presents its own unique challenges. An issue that became relevant to me recently is subject pool pollution, the idea that study results might be affected when research participants have participated in similar studies before.

Because it’s so easy to collect data from lots of participants quickly, MTurk gives the impression of an unlimited supply of fresh workers. But in practice, this is not necessarily how it works. The reachable worker population is actually smaller than it might seem,3 and many workers are completing a high volume of studies. These participants are often referred to as “professional participants” (and for many of them, MTurk can be a full-time job). One study found that in their sample, the average MTurk worker had completed a staggering 1,500 MTurk jobs, of which three hundred were academic studies.4 Since so many participants have participated in so many studies, they’ve already seen a lot of the commonly used paradigms. These nonnative participants may have been debriefed about the purpose of a similar study in the past, and thus might respond differently.

For example, as an undergraduate, I participated in a study in which I read a sheet of paper purportedly describing the food preferences of other participants. I noticed that one participant strongly disliked spicy foods. I then played an online game of “catch” in which that participant appeared to exclude me by not throwing the ball my way. However, as was explained in the debriefing, this was actually deceptive—the throws were pre-determined by a computer program. Afterwards, I was supposed to determine how much of a (spicy) food the participant should sample. Since I had learned about a similar study in a class, I assumed that this was a measure of my aggression towards the participant and did not give them any of the food to taste. In contrast, the researchers probably assumed many naïve participants would behave differently after feeling excluded in the game of catch.

More recently, I was conducting my own study using a test called the cognitive reflection test (CRT), which consists of a short list of trick questions.5 It helps assess participants’ thinking styles because some people are more likely to give an intuitive response (the wrong answer) versus to reflect further and arrive at the correct response. The problem with the test is that it’s very popular, and the questions are memorable. People are more likely to get the answers right if they’ve seen the questions before. We suspected that subject pool pollution was an issue, so we administered the test on MTurk and then asked our participants whether they’d seen the questions before. In one of our samples, 94% of had been previously exposed to the test—a huge number.6

This is not unprecedented. An earlier study found that more than half of their MTurk participants had previously seen two other commonly encountered experimental paradigms (the ‘prisoner’s dilemma’ and the ‘ultimatum game’). There are a lot of other reasons for concern, too, including that many savvy online participants “follow” favorite requesters (including research labs), often because they offer completive pay rates or interesting tasks,7 so they are particularly likely to be exposed to the types of research that lab does. And similar labs have been found to attract overlapping participant pools.8

It is not clear yet what this subject pool “pollution” means for researchers. It is possible that some experimental paradigms aren’t effective (or aren’t as effective) if participants have completed similar studies. But it is also possible that for some paradigms, it won’t matter whether participants have seen them before. Currently, there is a small amount of evidence supporting both possibilities,9 but there has not been a lot of systematic research on the topic. Especially for those researchers who rely on commonly-used methods, this is an important area for additional investigation. It is also important to realize that “pollution” is not unique to online research methods. Many participant pools draw on students from introductory and more advanced behavioral science courses, who might also be considered nonnative in many studies.

Overall, there is some reason for caution in using MTurk. There are ways to screen out participants who have completed studies for you before,10 but these are limited since you may not know about what studies the participants have completed for other labs. Online research is changing the game for a lot of researchers in the social sciences, and part of that is that it is easier to conduct large pilot studies and hastily-designed studies. This provides lots of opportunities to pollute subject pools. It may be useful for researchers who rely heavily on commonly-used measures to test these periodically to know to what extent pollution is a problem, and to understand whether and how prior exposure changes responses. It is also important to recognize that some measures might best be considered as an expendable resource, and that it is important to generate new versions of these measures.