Empirical social science often relies on data that are not observed in the field, but are transformed into quantitative variables by expert researchers who analyze and interpret qualitative raw sources. While generally considered the most valid way to produce data, this expert-driven process is inherently difficult to replicate or to assess on grounds of reliability. Using crowd-sourcing to distribute text for reading and interpretation by massive numbers of non-experts, we generate results comparable to those using experts to read and interpret the same texts, but do so far more quickly and flexibly. Crucially, the data we collect can be reproduced and extended cheaply and transparently, making crowd-sourced datasets intrinsically reproducible. This focuses researchers’ attention on the fundamental scientific objective of specifying reliable and replicable methods for collecting the data needed, rather than on the content of any particular dataset. We also show that our approach works straightforwardly with different types of political text, written in different languages. While findings reported here concern text analysis, they have far-reaching implications for expert-generated data in the social sciences.

For over three decades, party manifestos have formed the largest source of textual data for estimating party policy positions and emphases, resting on the pillars of two key assump- tions: that party policy positions can be measured on known dimensions by counting text units in predefined categories, and that more text in a given category indicates stronger emphasis. Here we revisit the inductive approach to estimating policy positions from party manifesto data, demonstrating that there is no single definition of left-right policy that fits well in all contexts, even though meaningful comparisons can be made by locating parties on a single dimension in each context. To estimate party positions, we apply a Bayesian, multi-level, Poisson-IRT measurement model to category counts from coded party mani- festos. By treating the categories as “items” and policy positions as a latent variable, we are able to recover not only left-right estimates but also direct estimates of how each policy category relates to this dimension, without having to decide these relationships in advance based on political theory, exploratory analysis, or guesswork. Finally, the flexibility of our framework permits numerous extensions, designed to incorporate models of manifesto au- thorship, coding effects, and additional explanatory variables (including time and country effects) to improve estimates.

Economic crisis and the resulting need for austerity budgets has divided many governing parties in Europe, despite the strict party discipline exercised over the legislative votes to approve these harsh budgets. Our analysis attempts to measure divisions in governing coalitions by applying automated text analysis methods to scale the positions that MPs express in budget debates. Our test case is Ireland, a country that has experienced both periods of rapid economic growth as well as one deep financial and economic crisis. Our analysis includes all annual budget debates during the time period from 1983 to 2013. We demonstrate that government cohesion as expressed through legislative speeches has significantly decreased as the economic crisis deepened, the result of government backbenchers expressing speaking against the painful austerity budgets introduced by their own governments. While ministers are bounded by the doctrine of collective cabinet responsibility and hence always vote for the finance min- isters’ budget proposal, we find that party backbenchers’ position-taking is systematically related to the economic vulnerability of their constituencies and to the safety of their electoral margins.

Quantitative methods for scaling latent political traits have much in common with supervised machine learning methods commonly applied to tasks such as email spam detection and product recommender systems. Despite commonalities, however, the research goals and philosophical underpinnings are quite different: machine learning is usually concerned with predicting a knowable or known class, most often with a practical application in mind. Estimating political traits through text, by contrast, involves measuring latent quantities that are inherently unobservable through direct means, and where human “verification” is unreliable, prohibitively costly, or otherwise unavailable. In this paper we show that not only can the Naive Bayes classifier, one of the most widely used machine learning classification methods, can be successfully adapted to measuring latent traits, and also that it is equivalent in general form to \cite{lbg:2003}’s “Wordscores” algorithm for measuring policy positions. We revisit several prominent applications of Wordscores reformulated as Naive Bayes, demonstrating the equivalence but also revealing areas where the original Wordscores algorithm can be substantially improved using standard techniques from machine learning. From this we issue some concrete recommendations for future applications of supervised machine learning to scale latent political traits.

Well-established methods exist for measuring party positions, but reliable means for esti- mating intra-party preferences remain underdeveloped. Most efforts focus on estimating the ideal points of individual legislators based on inductive scaling of roll call votes. Yet in most parliaments, roll call data suffer from two problems: selection bias due to unrecorded votes, and strong party discipline which tends to make votes strategic rather than sincere in- dications of preference. In contrast, legislative speeches are relatively unconstrained, since party leaders are less likely to punish MPs for speaking sincerely as long as they vote with the party line. This conventional wisdom remains essentially untested, despite the grow- ing application of statistical analysis of textual data to measure policy preferences. Our paper addresses this lacuna by exploiting a rich feature of the Swiss legislature: On most bills, legislators both vote and speak many times. Using this data, we compare text-based scaling of ideal points to vote-based scaling from a crucial piece of energy legislation. Our findings confirm that roll call votes underestimate intra-party differences, and vindicate the use of text scaling to measure legislator ideal points. Using regression models we further explain the difference between roll-call and text scalings with energy policy preferences at constituency level.

Several methods have now become popular in political science for scaling latent traits— usually left-right policy positions—from political texts. Following a great deal of de- velopment, application, and replication, we now have a fairly good understanding of the estimates produced by scaling models such as “Wordscores”, “Wordfish”, and other variants (i.e. Monroe and Maeda’s two-dimensional estimates). Less well understood, however, are the appropriate methods for estimating uncertainty around these esti- mates, which are based on untested assumptions about the stochastic processes that generate text. In this paper we address this gap in our understanding on three fronts. First, we lay out the model assumptions of scaling models and how to generate un- certainty estimates that would be appropriate if all assumptions are correct. Second, we examine a set of real texts to see where and to what extent these assumptions fail. Finally, we introduce a sequence of bootstrap methods to deal with assumption failure and demonstrate their application using a series of simulated and real political texts.