Due to the inherent sensitivity of many survey questions, a number of researchers have adopted an indirect questioning technique known as the list experiment (or the item count technique) in order to minimize bias due to dishonest or evasive responses. However, standard practice with the list experiment requires a large sample size, is not readily adaptable to regression or multivariate modeling, and provides only limited diagnostics. This paper addresses all three of these issues. First, the paper presents design principles for the standard list experiment (and the double list experiment) to minimize bias and reduce variance as well as providing sample size formulas for the planning of studies. Additionally, this paper investigates the properties of a number of estimators and introduces an easy-to-use piecewise estimator that reduces necessary sample sizes in many cases. Second, this paper proves that standard-procedure list experiment data can be used to estimate the probability that an individual holds the socially undesirable opinion/behavior. This allows multivariate modeling. Third, this paper demonstrates that some violations of the behavioral assumptions implicit in the technique can be diagnosed with the list experiment data. The techniques in this paper are illustrated with examples from American politics.

Many important papers studying cross-national outcomes such as political regime type or economic development exploit treatment variables generated by either geological or pre-modern historical processes. A general and major problem with these treatments, however, derives from their heavy regional concentration. Despite not being caused by other variables that independently affect the dependent variable, due to geological or historical accidents, variables such as oil or settler mortality claimed to be exogenous are nonetheless highly correlated with potential confounders that impede drawing causal inferences. With the goal of eliminating bias by controlling for observables, many papers studying variables such as these use parametric procedures to control for regional dummies. While estimation techniques such as ordinary least squares (OLS) provide a seemingly straightforward methodological fix, OLS also obscures particular shortcomings of the data, and imposes strong assumptions to combine information across regions. The current paper takes a closer look at these assumptions and provides examples from top political science and economic journals to show how disaggregating the data can either help to support or to severely qualify existing results.

We analyze a natural experiment to answer the longstanding question of
whether the name order of candidates on ballots affects election outcomes.
Since 1975, California law has mandated randomizing the ballot order with a
lottery, where alphabet letters would be shaken vigorously and selected
from a container. Previous studies, relying overwhelmingly on non-randomized
data, have yielded conflicting results about whether ballot order effects
even exist. Using improved statistical methods, our analysis of statewide
elections from 1978 to 2002 reveals that in general elections ballot order
has a significant impact only on minor party candidates and candidates
for nonpartisan offices. In primaries, however, being listed first benefits
everyone. In fact, ballot order might have changed the winner in roughly
nine percent of all primary races examined. These results are largely
consistent with a theory of partisan cuing. We propose that all
electoral jurisdictions randomize ballot order to minimize ballot effects.

Lewis and King (2000) discuss difficulties in evaluating the
proximity hypothesis about issue voting versus the directional
hypothesis. In this paper, we propose as a solution to this problem
is asking individuals to evaluate candidates generated to represent
certain issue positions experimentally. Such an approach controls
candidates' positions, while holding other features constant, presents
these fictitious candidates to randomly assigned subjects, and
examines whether the relationship between subjects' evaluations of
these candidates and their ideological beliefs is consistent with
proximity or directional theory. Our results provide slightly more
support for proximity theory, but our data are not entirely conclusive
on this point.

Asymptotic results from theoretical statistics show that the linear
structural relations (LISREL) covariance structure model is robust to
many kinds of departures from multivariate normality in the observed
data. But close examination of the statistical theory suggests that
the kinds of hypotheses about alternative models that are most often
of interest in political science research are not covered by the nice
robustness results. The typical size of political science data
samples also raises questions about the applicability of the
asymptotic normal theory. We present results from a Monte Carlo
sampling experiment and from analysis of two real data sets both to
illustrate the robustness results and to demonstrate how it is unwise
to rely on them in substantive political science research. We propose
new methods using the bootstrap to assess more accurately the
distributions of parameter estimates and test statistics for the
LISREL model. To implement the bootstrap we use optimization software
two of us have developed, incorporating the quasi-Newton BFGS method
in an evolutionary programming algorithm. We describe methods for
drawing inferences about LISREL models that are much more reliable
than the asymptotic normal-theory techniques. The methods we propose
are implemented using the new software we have developed. Our
bootstrap and optimization methods allow model assessment and model
selection to use well understood statistical principles such as
classical hypothesis testing.

Standard estimation procedures assume that empirical observations are accurate reflections of the true values of the dependent variable, but this assumption is dubious when model self-reported data on sensitive topics. List experiments can nullify incentives for respondents to lie to interviewers, but current data analysis techniques are limited to difference-in-means tests. I present a revised procedure and statistical estimator called listit to model the process multivariately. Monte Carlo simulations and a field test in Lebanon explore the behavior of this estimator.

If ignored, non-compliance with a treatment and nonresponse on outcome measures can bias estimates of treatment effects in a randomized experiment. To identify treatment effects in the case where compliance and response are conditioned on unobservables, we propose the parametric generalized endogenous treatment (GET) model. As a multilevel random effect model, GET improves on current approaches to principal stratification by incorporating behavioral responses within an experiment to measure each subjects' latent compliance type. We use Monte Carlo methods to show GET has a lower MSE for treatment effect estimates than existing approaches to principal stratification that impute, rather than measure, compliance type for subjects assigned to the control. In an application, we use data from a recent field experiment to assess whether exposure to a deliberative session with their member of Congress changes constituents' levels of internal and external efficacy. Since it conditions on subjects' latent compliance type, GET is able to test whether exposure to the treatment is ignorable after balancing on covariates via matching methods. We show that internally efficacious subjects disproportionately select into the deliberative sessions, and that matching apparently does not break the latent dependence between treatment compliance and outcome. The results suggest that exposure to the deliberative sessions improves external, but not internal, efficacy.

Methodologists (King et al. 2004; King and Wand 2007) have recently proposed a novel approach to adjusting for bias in interpersonal and cross- cultural comparisons in survey research. The method centers on the use of anchoring vignettes to allow the statistical correction of differential usage of ordinal response scales at the individual or group level. Using data from a randomized survey experiment I investigate whether analyses based on these vignettes may be vulnerable to the introduction of survey artifacts due to vignette ordering or the placement of the self-assessment item relative to the vignettes. I find several patterns of bias due to context effects. Researchers using anchoring vignettes should consider randomization or other methods to mitigate these problems.

One consequence of the proliferation of vote-by-mail (VBM) in certain areas of the United States is the opportunity for voters to cast ballots weeks before Election Day. Understanding the ensuing effects of VBM on late campaign information loss has important implications for both the study of campaign dynamics and public policy debates on the expansion of convenience voting. Unfortunately, the self-selection of voters into VBM makes it difficult to casually identify the effect of VBM on election outcomes. We overcome this identification problem by exploiting a natural experiment, in which some precincts are assigned to be VBM-only based on an arbitrary threshold of the number of registered voters. We assess the effects of VBM on candidate performance in the 2008 California presidential primary via a regression discontinuity design. We show that VBM both increases the probability of selecting candidates who withdrew from the race in the interval after the distribution of ballots but before Election Day and affects the relative performance of candidates remaining in the race. Thus, we find evidence of late campaign information loss, pointing to the influence of campaign events and momentum in American politics, as well as the unintended consequences of convenience voting.

Principal-Agent (PA) theory has been used for over three decades to model the relationship between an information-advantaged Agent and a Principal able to issue a contract ultimatum. For its common implementation as a game, the subgame-perfect Nash equilibrium is reasonably simple but generally wrong in predicting experimental or observational data. This paper implements PA theory theoretically and statistically as two kinds of strategic statistical model, then develops methods for testing competing behavioral hypotheses. I show that subgame-perfect Nash equilibrium, risk aversion/affinity, distributive justice/fairness theories, agent error, and random utility can be observationally distinct and how they might be distinguished statistically.

To measure the levels of support for political actors (e.g., candidates and parties) and the strength of their issue ownership, survey experiments are often conducted in which respondents are asked to express their opinion about a particular policy endorsed by a randomly selected political actor. These responses are contrasted with those from a control group that receives no endorsement. This survey methodology is particularly useful for studying sensitive political attitudes. We develop a Bayesian hierarchical measurement model for such endorsement experiments, demonstrate its statistical properties through simulations, and use it to measure support for Islamist militant groups in Pakistan. Our model uses item response theory to estimate support levels on the same scale as the ideal points of respondents. The model also estimates the strength of political actors' issue ownership for specic policies as well as the relationship between respondents' characteristics and support levels. Our analysis of a recent survey experiment in Pakistan reveals three key patterns. First, citizens' attitudes towards militant groups are geographically clustered. Second, once these regional differences are taken into account, respondents' characteristics have little predictive power for their support levels. Finally, militant groups tend to receive less support in the areas where they operate.

In the experimental study of political affect two assumptions are implicit. First, scholars assume that the affective state intended by the treatment is actually invoked. Second, scholars assume that semantic prompts such as, “Has (Barack Obama/John McCain) -- because of the kind of person he is, or because of something he has done -- ever made you feel: (insert word from feeling scale),” provide an accurate reliability check on the former assumption. However, work in psychology demonstrates that the use of semantic self reports is unreliable because participants do poorly at accurately reporting experienced emotion (Breckler 1984; Shacter and Singer 1962; Weber et al. 2007). If the presence of the prompt introduces error into the model or participants do not reliably recall their affective state then the use of semantic affective prompts is problematic. I ask:
Q1: Is the semantic affective prompt an effective check on the reliability of an emotional cue?
Additionally, I examine the use of overt anger cues versus subliminal anger cues in eliciting anger. Though most scholars use semantic self-reports as a direct test that the emotion of interest was elicited, others use subliminal primes to elicit emotional states outside of awareness (Bargh 1997). I ask:
Q2: Is there a significant difference between models that invoke emotion overtly versus subliminally?
I utilize a unique research design to tease out the effects of interest. To do so I set up treatment conditions which vary in the way the affective state is invoked (overtly/subliminally) and in the presence or absence of a semantic affective prompt. If find that challenges to the use of the semantic affective prompt are warranted: there is a mean difference in the responses of participants assigned to the semantic affective prompt condition and participants assigned to the no affective prompt condition.

Opinion polls taken over the past two years suggest that a majority of Americans prefer granting illegal immigrants amnesty. At the same time, however, restrictive state and local immigration laws which aim to identify, arrest and ultimately deport illegal immigrants also receive majority support. In this paper, I develop a survey experiment which employs Internet Protocol (IP) address geo-location technology in an attempt to understand this public opinion divide and to reassess the relevance of racial and cultural threat theories as explanations for immigration policy preferences. To accomplish this, skin tone and perceived proximity of a fictional illegal Mexican immigrant are manipulated using an embedded image and the location of the respondent, respectively. Within the stimulus, the respondent’s city and state are first determined using their IP address and are subsequently displayed to them. Three major findings emerge: (1) close immigrant proximity causes attitude polarization on immigration policy; (2) close immigrant proximity increases pro-immigration sentiment but only when immigrant had a light complexion; (3) immigrant race effects opinions on immigration policy in opposite directions depending upon where the immigrant is located: when no immigrant location information is provided, white respondents express greater pro-immigrant sentiment towards the darker immigrant. When respondents believe that the immigrant resides in their city and state, however, support for restrictive state immigration laws increases when the darker immigrant is shown.

Recent scholarship suggests that ideological polarization in Congress can ultimately be explained by changes in residential choice decisions. According to Bishop (2008), Americans have become more likely to live near others ideologically similar to themselves. These trends, in turn, have contributed to a widening partisan and ideological gap between urban and suburban areas across American cities (Bishop 2008; Hui 2011; Nall 2012). In this paper, I develop the Migration-Flight-Polarization (MFP) theory to explain how changes in diversity brought about by internal migration and immigration hold the key to understanding the connection between residential choice decisions and geographic polarization along partisan and ideological lines. Using an original agent-based modeling simulation and data collected from Houston, Texas during Hurricane Katrina migration, I demonstrate that changes in diversity and ``white flight'' responses to these changes are responsible for the growing partisan divide in Houston neighborhoods and the City of Houston as a whole.

Past studies have found qualitative and quantitative evidence suggesting that international sports can increase tensions between countries and lead to military conflict. We provide a new test of this theory by taking advantage of the random assignment of countries to compete against each other in the group stage of the World Cup from 1930-2010. We first briefly explain how countries were randomly assigned to World Cup groups over this period. We then use randomization inference to show that the number of military disputes between pairs of countries that were assigned to the same group was much larger than would be expected if World Cup competition had no impact on international conflict. The results hold under many important robustness checks, including a placebo test on the previous outcome. To further illustrate this effect, we use Twitter data to show that competition on the playing field increased expressions of nationalism during the 2014 World Cup.