Abstracts

Day 1, March 16th 2017

Andrew Mercer(Pew Research Center, University of Maryland) AMercer@PewResearch.org

Both in practice and in the methodological literature, there exists a widespread expectation
that nonprobability samples should have similar properties to probability-based samples
– that researchers should be able to commission a survey using a standard data
collection procedure, apply a standard set of demographic quotas or weights, and draw
reliable inferences about a wide range of topics. When such samples yield biased
estimates, this is taken as evidence that the output of the nonprobability survey
process insufficiently mimics the process of random selection. Rather than evaluate
nonprobability samples in terms of their resemblance to the probability-based ideal,
this paper argues that nonprobability sampling is better viewed as part of model
construction, where the researcher must identify confounding variables and specify their
distribution explicitly and in advance. This perspective sees the distinction between
probability-based and nonprobability survey inference as analogous to the distinction between
causal inference from randomized experiments and observational studies. We review
how this framework is guiding the Pew Research Center’s ongoing research into
the use of nonprobability methods for public opinion research, revisit past research
in a new light, and present findings from our most recent experiment comparing
alternative statistical estimation procedures across sample providers and survey
topics.

13:45–14:15A Partially Successful Attempt to Integrate a Web-Recruited Cohort into an Address-Based
Sample

A web-and-mail survey was conducted in Oregon on attitudes towards and use of
recently-legalized marijuana. Roughly two-thirds of the respondent sample was selected via a
simple random sample of addresses. Sampled individuals were encouraged to respond by web,
but about half of the respondents returned a mail questionnaire instead. Another third of the
respondent sample was nonprobability, recruited via Facebook and responding by web. Thus,
there were three cohorts: a mail cohort, a mail-to-web cohort, and a recruit cohort.
Preliminary investigations revealed that the recruit cohort did not look like the mail
cohort, but that the recruit cohort might be similar to the mail-to-web cohort. The
paper demonstrates how and why the SUDAAN procedure WTADJX was used to
calibrate the randomly-selected respondents to variable totals from the American
Community Survey while the mail-to-web and recruit cohorts were calibrated to each
other using the ACS variables and political affiliation. WTADJX was used to assess
whether differences between estimates from the mail-to-web and recruit cohorts were
statistically significant. The calibrated weights for these cohorts were then scaled so
that the population they represented was single-counted. Finally, delete-a-group
jackknife weights were developed for estimates computed from the entire respondent
sample.

With increasing levels of nonresponse in household surveys, there is renewed interest in
alternatives to the traditional way of conducting such surveys. Rivers (2007) proposed the
sample matching approach, and showed under certain assumptions matching from a
sufficiently large and diverse web panel provides results similar to a simple random sample. In
this paper, we test the sample matching approach due to Rivers using a pseudo-web sample.
We use data from two different household surveys to simulate the sample matching
methodology. The population of the study consists of the 2011 National Household Survey
(NHS) respondents and the Canadian Labour Force Survey (LFS) respondents are treated
as a pseudo-web sample. Different matching techniques and variables are tested,
and the robustness of the method is evaluated under various conditions. We also
briefly describe an experiment that uses a real web sample to collect data for sample
matching.

15:00–15:30Expanding the toolbox: inference from non-probability samples using machine learning

Social and economic scientists are currently exploring non-probability samples like big data as
an alternative to traditional survey samples. Big data generally cover an unknown part of the
population of interest. Simply ignoring this potential selection bias is error-prone. The mere
volume of data provides no guarantee for valid inference. Tackling this problem with methods
originally developed for probability sampling is possible but shown here to be limited, since
they often fail to account for the data generating process. We propose a more general
predictive inference framework, including three classes of inference methods: design-based,
model-based and machine learning techniques. The machine learning methods we studied are
k-nearest neighbor, artificial neural networks, regression trees and support vector
machines. In a simulation study, we create selective samples from real-world data
on annual mileages by vehicles, infer a population parameter using these inference
methods, and compare the method performances. Our results show that machine
learning methods can outperform the other methods in removing selection bias.
Describing economies and societies using sensor data, internet data, social media andqua
voluntary opt-in panels can be cost effective and timely compared with traditional
sample surveys, but require inference procedures that account for the data generating
process.

15:30–16:00Investigation into the use of weighting adjustments for non-probability online panel
samples

Dina Neiger, Darren W. Pennay, Andrew C. Ward, Paul J. Lavrakas(ANU Centre for Social Research and Methods, Australian National University; Institute for
Social Science Research, University of Queensland; NORC at the University of Chicago; Office
of Survey Research at Michigan State University) dina.neiger@srcentre.com.au, darren.pennay@srcentre.com.au, andrew.ward@srcentre.com.au, pjlavrakas@centurylink.net

Weighting is used to try to reduce total survey error for probability samples by making
adjustments for selection probability and enforcing population distribution across key
demographics.

There is no agreement on the efficacy of similar weighting adjustments for correcting bias of
non-probability samples given non-probability selection methods, enforcement of quotas and
the proprietary mechanisms used by sample providers to ensure that their sample resembles
the population.

Alternative methods, such as blending and calibration (e.g. DiSogra et al. 2011) and
propensity-based weighting (e.g. Schonlau et al. 2003) have shown benefit but there is limited
research available comparing the impact of different methods on the total survey
error.

Our presentation aims to contribute to this topic through a comparative evaluation of
weighting alternatives by using data from the recent Australian Online Panels Benchmarking
study (Pennay et al. 2016). Survey items included in the study were selected to allow
comparison with many demographic, health and wellbeing benchmarks. The availability of
these official benchmarks makes it possible to evaluate a range of methods with respect to
their impact on the total survey error.

The presentation will summarise the results of our evaluation and discuss alternatives
methods for weighting adjustments to nonprobability samples.

16:00–16:30A bootstrap method for estimating the sampling variation in point estimates from quota
samples

Jouni Kuha, Patrick Sturgis (London School of Economics and Political Science; ESRC National Centre for Research
Methods, School of Social Sciences, University of Southampton) P.Sturgis@soton.ac.uk, j.kuha@lse.ac.uk

Measures of uncertainty in survey estimates which are derived under assumptions of
probability sampling are not directly applicable to quota samples, yet ignoring the sampling
variability in quota sample estimates is also clearly unsatisfactory. We propose a method of
calculating the precision of estimates from quota samples which better reflects their sample
design and conveniently accommodates the features of the estimation applied to the samples.
This is a bootstrap re-sampling method which involves the following steps: (i) draw
independent samples by sampling respondents from the full achieved sample, in a way which
mimics the quota sampling design; (ii) for each sample thus drawn, calculate the point
estimates of interest in the same way as for the original sample; and (iii) use the distribution
of the estimates from the samples to quantify the uncertainty in the survey estimates. We
illustrate the method and assess its performance relative to existing approaches by
application to opinion poll estimates of vote shares prior to the 2015 UK General
Election.

What do the following ideas and practices have in common: unbiased estimation, statistical
significance, insistence on random sampling, and avoidance of prior information? All have been
embraced as ways of enforcing rigor but all have backfired and led to sloppy analyses and
erroneous inferences. We discuss these problems and some potential solutions in the context of
problems in applied survey research, and we consider ways in which future statistical theory
can be better aligned with practice.

Day 2, March 17th 2017

Annelies G. Blom, Daniela Ackermann-Piek, Susanne Helmschrott, Carina Cornesse,Christian Bruch, Joseph W. Sakshaug(Department of Political Science, School of Social Sciences, University of Mannheim;
Collaborative Research Center 884 ‘Political Economy of Reforms’, University of Mannheim;
GESIS – Leibniz Institute for the Social Sciences; University of Manchester, Manchester, UK)
blom@uni-mannheim.de, daniela.ackermann@uni-mannheim.de, helmschrott@uni-mannheim.de, carina.cornesse@uni-mannheim.de, christian.bruch@uni-mannheim.de, joesaks@umich.edu

Online surveys have become more and more important during the past years. They promise a
faster and cheaper data collection, enable researchers to react to societal events within days,
and, due to their self-completion format there are no interviewer effects and social desirability
biases can be reduced. However, despite the ubiquity of the internet and emails in our daily
lives, we still cannot sample individuals or households directly online, because no frames
of email addresses or internet access points are available. For probability online
surveys, we thus have to sample via initial probability face-to-face or telephone
interviews, which is costly. This lack of available sampling frames paired with the
attractiveness of the online mode has given rise to an industry of nonprobability online
surveys.

This study assesses the accuracy of eight nonprobability online samples with the accuracy of
two probability online samples, and compares these to two gold-standard probability
face-to-face samples in Germany. All samples were specifically drawn to be representative of
the general population aged 18 to 70 in Germany. We compare aggregate results against
official benchmarks on socio-demographic characteristics and political participation. The
probability samples showed higheraccuracy than nonprobability samples. Additional weighting
reduced differences between the samples.

9:30–10:00Assessing the accuracy of 51 non-probability online panels and river samples: A re-analysis of
the Advertising Research Foundation (ARF) online panel comparison experiment.

Survey research is increasingly conducted using online panels and river samples. With a large
number of data suppliers available, data purchasers need to understand the accuracy of the
data being provided and whether probability sampling continues to yield more accurate
measurements of populations. This paper evaluates the accuracy of a probability sample
and non-probability survey samples that were created using various different quota
sampling strategies and sample sources (panel versus river samples) on the accuracy of
estimates. Data collection was organized by the Advertising Research Foundation
(ARF) in 2013. We compare estimates from 45 U.S. online panels of non-probability
samples, 6 river samples, and one RDD telephone sample to high- quality benchmarks –
population estimates obtained from large-scale face-to-face surveys of probability
samples with extremely high response rates (e.g., ACS, NHIS, and NHANES). The
non-probability samples were supplied by 17 major U.S. providers. The online samples were
created using three quota methods: (A) age and gender within regions; (B) Method
A plus race/ethnicity; and (C) Method B plus education. Comparisons are made
using unweighted and weighted data, with different weighting strategies of increasing
complexity. Accuracy is evaluated using the absolute average error method. The study
illustrates the need for methodological rigor when evaluating the performance of survey
samples.

Ever since the 1940s the guidelines of good survey research strongly advise to apply random
sampling, as this makes it possible to generalize from the sample to the population.
If the principles of probability sampling have been applied, it is always possible
to compute valid estimates of population characteristics. Moreover, the accuracy
of the estimates can be computed by means of confidence intervals or margins of
error.

Developments in society face the survey researcher with new challenges. One of the problems
is the increasing nonresponse rates, which affect the validity of surveys. Another
problem are surveys costs. High quality surveys (for example CAPI surveys) are
very expensive. Therefore, researchers are looking for cheaper alternatives. Also,
for some surveys (for example CATI surveys) it is hard to find proper sampling
frames.

And then came the internet. It made it possible to conduct online surveys. The
advantages of online data collection (it is fast, simple, and cheap) on the one hand, and
the lack of proper sampling frames on the other, caused many online surveys to be
self-selection surveys. This is a form of non-probability sampling. Self-selection surveys have
disadvantages. Estimates may be invalid, and it is impossible to compute the accuracy of
estimates.

The presentation compares surveys based on random sampling with those based on
self-selection. It also attempts to answer the question whether a probability sample with a
substantial amount of nonresponse is not just as bad as a non-probability sample
based on self-selection. Some examples show the perils of this type of non-probability
sampling.

11:15–11:45An Empirical Process for Using Non-probability Survey for Inference

While non-probability sampling (NPS) surveys are widely in use in market research, their
adoption for official statistics is much more problematic. The adoption and acceptability of
NPS surveys seem linked to assurances of the quality (or accuracy) of the NPS data. To date
most of the research involves comparisons to probability survey estimates or uses some form of
modeling derived from a probability survey to produce estimates. There had been little
research on going beyond the comparison stage where a NPS stands alone and is valid for
statistical inference. This paper describes a two-step empirical method that first
compares an NPS survey, or series of survey, from an online panel to a probability
survey. The second step proposes how, at a later date, the NPS survey can stand
alone for statistical inference. The approach also relies on defining a priori rules
allowing the data user to decide on the level of risk they are willing to accept for a
satisfactory comparison at the first step. We use two different online samples for
a large urban area: a traditional quota sample and a sample based on filling the
most problematic quotas first. Here no follow-up emails are sent and new invitations
are sent until all the quotas are filled. The key aspects of the methodology include
transparency through an a priori decision rule motivated by the ASPIRE system
developed by Bergdahl et al. (2014). For the first step we propose creating a scoring
index based on 1) overall survey estimates, 2) subgroup estimates and 3) the ratio of
coefficients of variation of the post-stratification weights from the NPS and the
probability survey. A predetermined cutoff value determines the risk accepting or
rejecting the NPS estimates. Assuming a successful comparison at step 1 we again
define an a priori rule that compares the demographics of the online panels’ target
population to the demographics at a later time for the stand alone conduct of the
NPS survey using the same methods. We illustrate our step 1 empirical method
by comparing data from the two NPS samples to a probability survey of the same
area.

Inbound call methodology is based on the possibility of intercepting incorrectly-dialed calls
and replacing the curt termination message with an invitation to complete a survey. The
methodology is nonprobabilistic and open to bias but on the other hand it is extremely
inexpensive and quick. The number of such calls that can be intercepted on a daily basis in the
USA and Canada numbers in the millions. Callers hear an intercept message such as “Please
take our national health survey. Your call couldn’t be completed and was redirected to this
survey”. Multiple modes can be used for inbound call surveys including IVR (Interactive Voice
Response), a live interviewer, or redirection to a web site to complete a web based
instrument.

We first outline the methodology and how we weight adjust the ICS data to known
population totals using calibration. Next, we report on the methodology to compare ICS
results with established national surveys (2015 American Community Survey, 2015 National
Health Interview Survey) and the bias that was found. We quantify bias by treating the
population estimates from the ACS and the NHIS as correct and quantify the deviation from
the unweighted ICS results for demographic characteristics and the weighted ICS results for
the health outcomes. We also examine bias in a multivariate analysis. We show
how ICS methodology can produce estimates with mean squared error comparable
to an outbound telephone survey. Furthermore, we show how these gains in mean
squared error are achieved at considerably lower cost. We also discuss the ICS as an
efficient means of screening for rare and hard-to-reach population as well as a tool for
bio-surveillance.

We show that while the results of the comparisons are promising, more rigorous research is
needed to address potential biases, some of which are related to the timing of survey and the
way questions are asked. We also discuss issues related to questionnaire design, sensitive
topics, informed consent, and the protection of human subjects.

The Netherlands Institute for Social Research/SCP conducts sociocultural research in the
Netherlands. This type of research, frequently based on surveys, covers a wide range of topics,
such as health, education, sociocultural integration, discrimination, etc., among a
variety of groups living within the Netherlands. Quite often the research is targeted at
difficult to survey groups such as the elderly, children, ethnic minorities or sexual
minorities. These groups can be difficult to survey. Among the elderly, health or
cognition can provide challenges, surveying people in institutions is hindered by legal
and coverage issues and surveying ethnic minorities can entail cultural difficulties.
These issues, in turn, can lead to increased coverage, nonresponse or measurement
error.

A different type of problem the SCP faces, concerns so called ‘hidden’ populations such as
the LHBT community, where the lack of a useable sample frame can make it nearly impossible
to draw a probability based sample. All of the aforementioned problems can be further
complicated by (internal and external) demands for timely reports on findings. As the
generalizability of results is very important for the SCP, non-probability based samples have,
in the past decades, not been a typical choice for SCP-research. However, in the light of the
concerns that were addressed in the previous paragraphs, in the recent years the SCP has in
some cases opted for non-probability based samples. In these instances, a balance needed to be
struck on addressing the most urgent and relevant research questions and the wish
to make generalizations on research findings. In our presentation we will briefly
describe a few examples of studies that made use of a non-probability based sample
and discuss how we dealt with the issues of generalizability of these results. On the
basis of these examples, we formulate several ‘best practices’ on how to best deal
with expectation management when presenting survey results from nonprobability
samples.

Information

Venue

SciencesPo 254 Boulevard Saint-Germain 75007 Paris France

Conference room: Salle du Liepp (first floor)

Registration

For Registration, please create an account here if you do not already have one:

Once you have created your account, please select the Non Probability
Conference. Click through and indicate whether or not you are coming to the conference
dinner. Finally click pay for the conference fee.

The registration fee of € 50 is meant to cover costs for coffee, refrechments and to pay the
persons at the registration desks. Costs for the conference dinner on the evening of March 16th
2017 are not included.

Public Transport

Bus, tram and subway tickets are cheaper if you buy them by 10 (“carnet de 10 tickets”, € 16)
There are also day passes (“Paris visite pass”), the one for two days cost € 18.95; see
http://www.ratp.fr/en/ratp/r_61584/tickets/.

Conference dinner

There will be a conference dinner on Thursday evening. Please indicate with your registration
whether you like to participate in the dinner.