To send this article to your account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about sending content to .

To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle.
Find out more about sending to your Kindle.

Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

By using this service, you agree that you will only keep articles for personal use, and will not openly distribute them via Dropbox, Google Drive or other file sharing services.
Please confirm that you accept the terms of use.

The various methodological techniques that fall under the umbrella description of qualitative comparative analysis (QCA) are increasingly popular for modeling causal complexity and necessary or sufficient conditions in medium-N settings. Because QCA methods are not designed as statistical techniques, however, there is no way to assess the probability that the patterns they uncover are the result of chance. Moreover, the implications of the multiple hypothesis tests inherent in these techniques for the false positive rate of the results are not widely understood. This article fills both gaps by tailoring a simple permutation test to the needs of QCA users and adjusting the Type I error rate of the test to take into account the multiple hypothesis tests inherent in QCA. An empirical application–a reexamination of a study of protest-movement success in the Arab Spring–highlights the need for such a test by showing that even very strong QCA results may plausibly be the result of chance.

We apply a specialized Bayesian method that helps us deal with the methodological challenge of unobserved heterogeneity among immigrant voters. Our approach is based on generalized linear mixed Dirichlet models (GLMDMs) where random effects are specified semiparametrically using a Dirichlet process mixture prior that has been shown to account for unobserved grouping in the data. Such models are drawn from Bayesian nonparametrics to help overcome objections handling latent effects with strongly informed prior distributions. Using 2009 German voting data of immigrants, we show that for difficult problems of missing key covariates and unexplained heterogeneity this approach provides (1) overall improved model fit, (2) smaller standard errors on average, and (3) less bias from omitted variables. As a result, the GLMDM changed our substantive understanding of the factors affecting immigrants' turnout and vote choice. Once we account for unobserved heterogeneity among immigrant voters, whether a voter belongs to the first immigrant generation or not is much less important than the extant literature suggests. When looking at vote choice, we also found that an immigrant's degree of structural integration does not affect the vote in favor of the CDU/CSU, a party that is traditionally associated with restrictive immigration policy.

Many areas of political science focus on causal questions. Evidence from statistical analyses is often used to make the case for causal relationships. While statistical analyses can help establish causal relationships, it can also provide strong evidence of causality where none exists. In this essay, I provide an overview of the statistics of causal inference. Instead of focusing on specific statistical methods, such as matching, I focus more on the assumptions needed to give statistical estimates a causal interpretation. Such assumptions are often referred to as identification assumptions, and these assumptions are critical to any statistical analysis about causal effects. I outline a wide range of identification assumptions and highlight the design-based approach to causal inference. I conclude with an overview of statistical methods that are frequently used for causal inference.

Electoral forensics involves examining election results for anomalies to efficiently identify patterns indicative of electoral irregularities. However, there is disagreement about which, if any, forensics tool is most effective at identifying fraud, and there is no method for integrating multiple tools. Moreover, forensic efforts have failed to systematically take advantage of country-specific details that might aid in diagnosing fraud. We deploy a Bayesian additive regression trees (BART) model–a machine-learning technique–on a large cross-national data set to explore the dense network of potential relationships between various forensic indicators of anomalies and electoral fraud risk factors, on the one hand, and the likelihood of fraud, on the other. This approach allows us to arbitrate between the relative importance of different forensic and contextual features for identifying electoral fraud and results in a diagnostic tool that can be relatively easily implemented in cross-national research.

“Robust standard errors” are used in a vast array of scholarship to correct standard errors for model misspecification. However, when misspecification is bad enough to make classical and robust standard errors diverge, assuming that it is nevertheless not so bad as to bias everything else requires considerable optimism. And even if the optimism is warranted, settling for a misspecified model, with or without robust standard errors, will still bias estimators of all but a few quantities of interest. The resulting cavernous gap between theory and practice suggests that considerable gains in applied statistics may be possible. We seek to help researchers realize these gains via a more productive way to understand and use robust standard errors; a new general and easier-to-use “generalized information matrix test” statistic that can formally assess misspecification (based on differences between robust and classical variance estimates); and practical illustrations via simulations and real examples from published research. How robust standard errors are used needs to change, but instead of jettisoning this popular tool we show how to use it to provide effective clues about model misspecification, likely biases, and a guide to considerably more reliable, and defensible, inferences. Accompanying this article is software that implements the methods we describe.

Scholars have increasingly turned to fuzzy set Qualitative Comparative Analysis (fsQCA) to conduct small- and medium-N studies, arguing that it combines the most desired elements of variable-oriented and case-oriented research. This article demonstrates, however, that fsQCA is an extraordinarily sensitive method whose results are worryingly susceptible to minor parametric and model specification changes. We make two specific claims. First, the causal conditions identified by fsQCA as being sufficient for an outcome to occur are highly contingent upon the values of several key parameters selected by the user. Second, fsQCA results are subject to marked confirmation bias. Given its tendency toward finding complex connections between variables, the method is highly likely to identify as sufficient for an outcome causal combinations containing even randomly generated variables. To support these arguments, we replicate three articles utilizing fsQCA and conduct sensitivity analyses and Monte Carlo simulations to assess the impact of small changes in parameter values and the method's built-in confirmation bias on the overall conclusions about sufficient conditions.

The accuracy of U.S. Social Security Administration (SSA) demographic and financial forecasts is crucial for the solvency of its Trust Funds, other government programs, industry decision-making, and the evidence base of many scholarly articles. Because SSA makes public insufficient replication information and uses antiquated statistical forecasting methods, no external group has ever been able to produce fully independent forecasts or evaluations of policy proposals to change the system. Yet, no systematic evaluation of SSA forecasts has ever been published by SSA or anyone else—until a companion paper to this one. We show that SSA's forecasting errors were approximately unbiased until about 2000, but then began to grow quickly, with increasingly overconfident uncertainty intervals. Moreover, the errors are largely in the same direction, making the Trust Funds look healthier than they are. We extend and then explain these findings with evidence from a large number of interviews with participants at every level of the forecasting and policy processes. We show that SSA's forecasting procedures meet all the conditions the modern social-psychology and statistical literatures demonstrate make bias likely. When those conditions mixed with potent new political forces trying to change Social Security, SSA's actuaries hunkered down, trying hard to insulate their forecasts from strong political pressures. Unfortunately, this led the actuaries into not incorporating the fact that retirees began living longer lives and drawing benefits longer than predicted. We show that fewer than 10% of their scorings of major policy proposals were statistically different from random noise as estimated from their policy forecasting error. We also show that the solution to this problem involves SSA or Congress implementing in government two of the central projects of political science over the last quarter century: (1) transparency in data and methods and (2) replacing with formal statistical models large numbers of ad hoc qualitative decisions too complex for unaided humans to make optimally.

The list experiment, also known as the item count technique, is becoming increasingly popular as a survey methodology for eliciting truthful responses to sensitive questions. Recently, multivariate regression techniques have been developed to predict the unobserved response to sensitive questions using respondent characteristics. Nevertheless, no method exists for using this predicted response as an explanatory variable in another regression model. We address this gap by first improving the performance of a naive two-step estimator. Despite its simplicity, this improved two-step estimator can only be applied to linear models and is statistically inefficient. We therefore develop a maximum likelihood estimator that is fully efficient and applicable to a wide range of models. We use a simulation study to evaluate the empirical performance of the proposed methods. We also apply them to the Mexico 2012 Panel Study and examine whether vote-buying is associated with increased turnout and candidate approval. The proposed methods are implemented in open-source software.

Digit-based election forensics (DBEF) typically relies on null hypothesis significance testing, with undesirable effects on substantive conclusions. This article proposes an alternative free of this problem. It rests on decomposing the observed numeral distribution into the “no fraud” and “fraud” latent classes, by finding the smallest fraction of numerals that needs to be either removed or reallocated to achieve a perfect fit of the “no fraud” model. The size of this fraction can be interpreted as a measure of fraudulence. Both alternatives are special cases of measures of model fit—the π∗ mixture index of fit and the Δ dissimilarity index, respectively. Furthermore, independently of the latent class framework, the distributional assumptions of DBEF can be relaxed in some contexts. Independently or jointly, the latent class framework and the relaxed distributional assumptions allow us to dissect the observed distributions using models more flexible than those of existing DBEF. Reanalysis of Beber and Scacco's (2012) data shows that the approach can lead to new substantive conclusions.

Theories of coalition politics in parliamentary democracies have suggested that government formation and survival are jointly determined outcomes. An important empirical implication of these theories is that the sample of observed governments analyzed in studies of government survival may be nonrandomly selected from the population of potential governments. This can lead to serious inferential problems. Unfortunately, current empirical models of government survival are unable to account for the possible biases arising from nonrandom selection. In this study, we use a copula-based framework to assess, and correct for, the dependence between the processes of government formation and survival. Our results suggest that existing studies of government survival, by ignoring the selection problem, overstate the substantive importance of several covariates commonly included in empirical models.

In this article, I use joint scaling methods and similar items from three large-scale surveys to place voters, parties, and politicians from different Latin American countries on a common ideological space. The findings reveal that ideology is a significant determinant of vote choice in Latin America. They also suggest that the success of leftist leaders at the polls reflects the views of the voters sustaining their victories. The location of parties and leaders reveals that three distinctive clusters exist: one located at the left of the political spectrum, another at the center, and a third on the right. The results also indicate that legislators in Brazil, Mexico, and Peru tend to be more “leftists” than their voters. The ideological drift, however, is not significant enough to substantiate the view that a disconnect between voters and politicians lies behind the success of leftist presidents in these countries. These findings highlight the importance of using a common-space scale to compare disparate populations and call into question a number of recent studies by scholars of Latin American politics who fail to adequately address this important issue.

Over the past eight decades, millions of people have been surveyed on their political opinions. Until recently, however, polls rarely included enough questions in a given domain to apply scaling techniques such as IRT models at the individual level, preventing scholars from taking full advantage of historical survey data. To address this problem, we develop a Bayesian group-level IRT approach that models latent traits at the level of demographic and/or geographic groups rather than individuals. We use a hierarchical model to borrow strength cross-sectionally and dynamic linear models to do so across time. The group-level estimates can be weighted to generate estimates for geographic units. This framework opens up vast new areas of research on historical public opinion, especially at the subnational level. We illustrate this potential by estimating the average policy liberalism of citizens in each U.S. state in each year between 1972 and 2012.

Mass election predictions are increasingly used by election forecasters and public opinion scholars. While they are potentially powerful tools for answering a variety of social science questions, existing measures are limited in that they ask about victors rather than voteshares. We show that asking survey respondents to predict voteshares is a viable and superior alternative to asking them to predict winners. After showing respondents can make sensible quantitative predictions, we demonstrate how traditional qualitative forecasts lead to mistaken inferences. In particular, qualitative predictions vastly overstate the degree of partisan bias in election forecasts, and lead to wrong conclusions regarding how political knowledge exacerbates this bias. We also show how election predictions can aid in the use of elections as natural experiments, using the effect of the 2012 election on partisan economic perceptions as an example. Our results have implications for multiple constituencies, from methodologists and pollsters to political scientists and interdisciplinary scholars of collective intelligence.

Spatial theories of lawmaking predict that legislative productivity is increasing in the number of status quo policies that lie outside the gridlock interval, but because locations of status quo policies are difficult to measure, previous empirical tests of gridlock theories rely on an auxiliary assumption that the distribution of status quo points is fixed and uniform. This assumption is at odds with the theories being tested, as it ignores the history dependence of lawmaking. We provide an alternative method for testing competing theories by estimating structural models that explicitly account for temporal dependence in a theoretically consistent way. Our analysis suggests that legislative productivity depends both on parties and supermajority pivots, and we find patterns of productivity consistent with a weaker, contingent form of party influence than found in previous work. Parties appear to exert agenda power only on highly salient legislation rather than strongly influencing outcomes through voting pressure and party unity.

Self-reported measures of media exposure are plagued with error and questions about validity. Since they are essential to studying media effects, a substantial literature has explored the shortcomings of these measures, tested proxies, and proposed refinements. But lacking an objective baseline, such investigations can only make relative comparisons. By focusing specifically on recent Internet activity stored by Web browsers, this article's methodology captures individuals' actual consumption of political media. Using experiments embedded within an online survey, I test three different measures of media exposure and compare them to the actual exposure. I find that open-ended survey prompts reduce overreporting and generate an accurate picture of the overall audience for online news. I also show that they predict news recall at least as well as general knowledge. Together, these results demonstrate that some ways of asking questions about media use are better than others. I conclude with a discussion of survey-based exposure measures for online political information and the applicability of this article's direct method of exposure measurement for future studies.