Current Research

Borrowing from automated “text as data” approaches, we show how statistical scaling models can be applied to hand-coded content analysis to improve estimates of political parties’ left-right policy positions. We apply a Bayesian item-response theory (IRT) model to category counts from coded party manifestos, treating the categories as “items” and policy positions as a latent variable. This approach also produces direct estimates of how each policy category relates to left-right ideology, without having to decide these relationships in advance based on out of sample fitting, political theory, assertion, or guesswork. This approach not only prevents the misspecification endemic to a fixed-index approach, but also works well even with items that are not specifically designed to measure ideological positioning.

Quantitative methods for scaling latent political traits have much in common with supervised machine learning methods commonly applied to tasks such as email spam detection and product recommender systems. Despite commonalities, however, the research goals and philosophical underpinnings are quite different: machine learning is usually concerned with predicting a knowable or known class, most often with a practical application in mind. Estimating political traits through text, by contrast, involves measuring latent quantities that are inherently unobservable through direct means, and where human “verification” is unreliable, prohibitively costly, or otherwise unavailable. In this paper we show that not only can the Naive Bayes classifier, one of the most widely used machine learning classification methods, can be successfully adapted to measuring latent traits, and also that it is equivalent in general form to Laver, Benoit, and Garry’s “Wordscores” algorithm for measuring policy positions. We revisit several prominent applications of Wordscores reformulated as Naive Bayes, demonstrating the equivalence but also revealing areas where the original Wordscores algorithm can be substantially improved using standard techniques from machine learning. From this we issue some concrete recommendations for future applications of supervised machine learning to scale latent political traits