Over at Andrew Gelman’s blog, there’s an interesting discussion going on about a new paper by Buttice and Highton assessing the accuracy of MRP estimates of state-level opinion. (MRP is short for Multilevel Regression with Poststratification, sometimes called “Mister P.”) MRP is used to recover state-level estimates of public opinion that are more accurate than simply using the means of the state samples. This is especially true when, as is usually the case, the number of observations in any state is small.

I highly recommend that you take a look at the Buttice and Highton paper first, which is worth a read if you’re thinking about using MRP in an analysis. My own take is that MRP is a very useful tool, and that the Buttice and Highton paper is a caution against using MRP indiscriminately without carefully assessing whether the necessary conditions are met. As Jeff Lax and Justin Phillips say:

But one cannot blindly run MRP and expect it to work well. Users must take the time to make sure they have a reasonable model for predicting opinion. Indeed, one way to read the BH piece is that if you randomly choose a survey question from those CCES surveys and throw just any state-level predictor at it (or maybe worse, no state-level predictor), the MRP estimates that result will not be as good as those you have seen used in the substantive literature invoking MRP. Indeed, they point out that only one published MRP paper (Pacheco) fails to follow their recommendation to use a state-level predictor.

Jeff and Justin also note the existence of a new software package to perform MRP:

MRP, the package. Use the new MRP package, available using the installation instructions below and to be available more easily soon. For now, use versions of the blme and lme4 packages that predate versions 1.x. Using the devtools package, the following commands will install the latest versions of mrpdata and mrp:

Suppose our analyses and results are deeply flawed and deserve to be disregarded completely, a supposition that we recognize may reflect the views of some readers. Consider the question that motivated our article: How confident should a researcher who only has a single national survey sample of 1,500 (or even 3,000) respondents be in the MRP estimates of state opinion produced with it? Setting aside our article, the only other published studies that assess MRP performance with samples like these are Lax and Phillips (2009) and Warshaw and Rodden (2012). The former assess MRP performance for two opinions and the latter do it for six. And, Warshaw and Rodden (2012) do find what we would call nontrivial variation in average MRP performance across items (look at the MRP entries for the six opinion items when N=2,500 in their Figure 5; we highlighted the relevant entries.) On the basis of Lax and Phillips (2009) and Warshaw and Rodden (2012), then, we would not draw the inference that MRP will consistently and routinely perform well across different opinions or even for the same opinion at different points in time. And, even when MRP has worked well, we are unsure how the researcher can verify its performance. The investigations of MRP performance that preceded ours – and ours, too – all assess the quality of the estimates by comparing them to “true” values. In the absence of knowing the true values we do not see how the researcher could determine how “good” or “bad” the MRP estimates are, and we would therefore hesitate to use them. That said, developing a validation technique for MRP estimates when “true” values are unknown appears to be an issue that LP are working on, and we look forward to reading the next version of their MPSA paper.

5 Responses to Being Careful with Multilevel Regression with Poststratification

I’d just highlight another point — when one adjusts for the demonstrated degree of reliability in the estimate of truth, MRP performance is higher than BH report (by a third) and variation reduced. That all said, there is still variation. Even giving our corrections to BH, there is still variation. BH are right to point out that an applied researcher is still left not knowing how well he or she is doing. As BH point out, this is the very problem at which our LP paper is targeted, showing benchmarks and diagnostics for Mister P…. Dr. P, if you will. We may ultimately be unsuccessful in this, as BH say they were, but we do already have some findings that are of use to practical researchers and will present more soon. MRP performance is better and less variable than BH reported in their paper, but they are absolutely right that variation remains and that performance is imperfect. We hope that no one forgets that MRP estimates are still only estimates of true state opinion and that one must do MRP with attention and care (the newly revised MRP package and our forthcoming diagnostics should help with this).