Clean and Pure Samplingby Jerry W. Thomas

Sampling, of course, has never been clean and pure. Door-to-door, in-person interviewing was one of the main data collection methods in the United States up until about 1950. It was almost impossible to design and pull a representative sample of households because of variances in household density and in the number of people per household. Telephone interviewing replaced door-to-door interviewing during the 1950s and 1960s, but the phone posed new sampling challenges. Not all households had telephones; some households had multiple phones; some households had unlisted numbers; some demographic groups were more likely to answer their telephones, and the time of day greatly influenced whether the phone was answered and who answered the phone. Random-digit dialing and CATI-based sampling and callback systems addressed these problems, but they never fully solved all of the sampling issues.

The arrival of online data collection and the rise of online panels in the late 1990s ushered in the “Wild Wild West” of sampling practices. Many of the new sampling-panel companies were technology companies new to the world of marketing research. Often these new companies were good at building online panels, but they didn’t have a clue about how to pull representative samples. Decision Analyst started its first online panels in late 1995 and progressively added new panels thereafter, including international and B2B online panels. Here are the systems and practices that Decision Analyst employs in its pursuit of “clean and pure” online samples:

Recruit online panelists from many different sources using many different incentives and appeals. The goal is to give every type of person an opportunity and a reason to join Decision Analyst’s panels. Of course, all panelists must be double opt-in, and only one panel member is permitted per household.

Verify the geographic location of each participant at the time he or she joins an online panel, as well as at the time of each survey.

Keep encrypted online panels’ data safe behind a firewall, to prevent any hackers tampering with the sampling database. Encrypt all personally identifiable data. Encrypt all survey data that must be transferred over the web.

Use Icion® multivariate sampling software to pull and balance online samples so that the resulting samples mirror the U.S. in terms of demographics and geography (to match state and county-size population distributions).

Adjust the amount of sample released to compensate for different response rates among different demographic groups and geographic areas.

Set stratified quotas to help ensure that the final sample is representative of the target market or target group.

Use whitelisting services and email deliverability services to ensure that all survey email invitations to panelists are actually delivered—and if not delivered, to track exactly what happened.

Use Sleuth™ system to identify any high-risk participants, based on digital fingerprinting, IP vetting, and other technical variables.

Hide cheater questions and traps in the screener and questionnaire to identify any potential cheaters or bots.

Monitor each survey’s completion time so that speedsters can be identified and reviewed (most of these are deleted).

Review all open-ended questions and remove any panelists who fail to give accurate and relevant answers to these questions. It takes a dedicated team of real people to screen and review the open-ends. This is a nearly foolproof method of screening out nonserious, inept, and deceptive respondents.

Review cross-tabulations, and if composition of sample is not as designed, then use Iterative Proportional Fitting to weight the data.

All of these best practices move us closer and closer to “clean and pure” online samples and better data, but perfection is elusive. Typically, despite all these precautions and filters, we have to delete about 1% of American Consumer Opinion® participants (our consumer panel) from the completed interviews’ datafile because of quality assurance concerns. If outside panel suppliers must be used for a study, the deletion rate is typically 5% to 7%. The search for perfection continues.

About the Author

Jerry W. Thomas (jthomas@decisionanalyst.com) is President/CEO of Decision Analyst. He may be reached at 1-800-262-5974 or 1-817-640-6166.