Digital Lab for the Social Sciences at IQSS Successfully Replicates MTurk Results

February 19, 2015

Ryan Enos has successfully replicated the results of a study performed in Mechanical Turk using the Harvard Digital Labs for the Social Sciences (DLABSS). This means that DLABSS could be a suitable replacement for MTurk as a way of gathering data, and is great news for researchers currently using DLABSS or considering using it in the future. The report written by Enos, Lab Director and faculty member in the Department of Government, explaining his findings is below:

DLABSS has successfully replicated several studies in Mechanical Turk. By replicating studies—meaning the results obtained in DLABSS and Mechanical Turk were substantively the same—we are demonstrating that volunteers of DLABSS can potentially substitute for Mechanical Turk. Mechanical Turk (MTurk) is currently the primary source of online subjects among social science researchers, so this signals significant potential for DLABSS as a tool for researchers. In this post, I describe the details of one of those replications.

In this study, a call for volunteers was posted on DLABSS and a very similar advertisement was posted on MTurk. I paid subjects $1 to complete the study on MTurk. The study took the average subject about 9 minutes to complete. I collected similar demographics on MTurk and DLABSS, so we can compare the differences between a volunteer and paid sample. These basic demographics are displayed in the table below. The DLABSS sample looks largely similar to the MTurk sample, however it is more liberal and better educated. Of course, one of the primary advantages of DLABSS is that subjects can be easily targeted, so that if a researcher wants, for example, fewer college graduates, this can be easily obtained.

In this particular study, I asked people to judge the appearance of faces. I was interested in whether subjects thought that these faces looked more like the face of an African American person or more like the face of a Caucasian person. I showed them groups of faces for five seconds. One particular face was highlighted and I asked subjects to judge the appearance on a 7-point scale from ”Completely African American” (1) to ”Completely Caucasian” (7).

The test of interest is that subjects were shown three conditions and in each condition asked to judge the same face. In two conditions, the faces on the screen were segregated by race (white faces separated from Black faces). The highlighted face, which the subjects were asked to judge, could be grouped next to white faces or Black faces (see image below). In the other condition, the faces were integrated by race (white faces and black faces together).

My hypothesis is that subjects will use segregation as a heuristic in judging the faces, so that when the face is segregated and grouped with Black faces that subjects will say the face is more African American, when it is grouped with white faces, subjects will say it is more Caucasian, and when it is integrated, subjects will be more likely to say it evenly split in appearance.

The figure below shows that this was the result in both Mechanical Turk (the red bars) and DLABSS (the gray bars). The bars represent differences in judgments of the faces between the integrated and segregated conditions. Negative numbers mean more African American, and positive numbers are more Caucasian. The Black segregated faces were judged to be more African American and the white segregated faces were judged to be more Caucasian than the baseline integrated condition. A T-test for a difference of means between conditions yields p < .05 in Mechanical Turk and p < .01 in DLABSS.

Of course, the average differences between conditions are not the same: -.11 in MTurk and -.24 in DLABSS—but exact replication is rare in social science. The important takeaway though is that a researcher using either MTurk or DLABSS would have come to same conclusion from this data—indicating that DLABSS is a worthwhile replacement for MTurk.