20% of all surveys are based on fraudulent data?

How often do people conducting surveys simply fabricate some or all of the data? Several high-profile cases of fraud over the past few years have shone a spotlight on that question, but the full scope of the problem has remained unknown. Yesterday, at a meeting in Washington, D.C., a pair of well-known researchers, Michael Robbins and Noble Kuriakose, presented a statistical test for detecting fabricated data in survey answers. When they applied it to more than 1000 public data sets from international surveys, a worrying picture emerged: About one in five of the surveys failed, indicating a high likelihood of fabricated data.

Their method flags as potentially fraudulent any interviews in which two people have answered a high percentage of the questions in exactly the same way. The problem with this approach? There are a number of perfectly legitimate reasons why two people’s answers to a survey can wind up looking extremely similar.

One reason is purely statistical: When you ask a large number of people a small number of questions, and don’t give them many answer choices (i.e., a simple “yes” or “no”), it is quite common to find sets of responses that look much the same. Another reason has more to do with the nature of public opinion itself: When it comes to certain topics, certain groups of people tend to think alike. As social scientists would say, some populations are more homogenous in their views.