Pages

Friday, December 26, 2014

"Unskewing" Polls of the 2015 Baseball Hall of Fame

Want to know who'll win an election? Take a poll. That's what dedicated election watchers like Darren Viola and Ryan Thibs do, in effect, for the Baseball Hall of Fame every year around this time. Their methodology is simple: find and record every Hall of Fame ballot released by a BBWAA writer on Twitter or explained by them in a column. The end results are Viola's toplines (which, like a political poll, list each candidate's simple overall percentage) and Thibs's crosstabs (which break down how specific voters voted).

The results of this poll are a great starting point for predicting who will make the Hall every year—but they're not perfect. Like any poll, this one has a margin of error. In politics, you would never go around quoting raw polling data as final; the results must be weighted to account for sampling error, creating a final snapshot that's representative of the whole voting pool. It turns out the same adjustments are necessary for polls of the Hall of Fame.

For the past fewyears, I've calculated these adjustments and used them to project final Hall of Fame vote totals. As it turns out, ballot aggregators consistently over- and underestimate certain candidates (i.e., players) by predictable margins. This makes sense—the polling sample is self-selected, and the kinds of voters who value transparency and choose to release their ballots (thus opening themselves up to all sorts of vitriol on Twitter and in comments sections) are very different from those who clam up. Generally, writers who make their ballots public skew more progressive, overstating support for steroid-tainted candidates (e.g., Barry Bonds and Roger Clemens) as well as those with more subtle, sabermetrics-based cases for induction (e.g., Tim Raines and Mike Mussina). The casters of private ballots, on the other hand, are more likely to be conservative voters—less likely to be on Twitter (a major medium for sharing ballots) and less likely to even still be covering baseball (it's hard to explain your ballot when you no longer have column inches to devote to it). This explains why private ballots will give old-school candidates like Lee Smith and Don Mattingly a significant boost in the final results as compared to the polls.

We know this because we've seen this public/private divergence year in and year out, and we can calculate each player's exact deviation from the polls by looking at those historical results. The gory details can be found at the bottom of this post, but in short, all we have to do is calculate a player's historical public-versus-private disparity, add them to or subtract them from the player's current polling numbers, and combine the public ballots we know about with the private ballots we're expecting. Voilà—a more accurate forecast of the final vote totals that will be unveiled on January 6.

Below are my final Hall of Fame projections as of January 6, 2015. As of that date, 204 public ballots had been polled out of my projected final turnout of 570. Currently, I'm projecting Pedro Martínez, Randy Johnson, John Smoltz, and Craig Biggio to be elected to the Hall. Mike Piazza currently looks to fall just short despite a 76.0% showing in the polls, thanks to a negative adjustment factor, although he's so close that a small error in my calculations could easily put him in. Meanwhile, I expect Sammy Sosa, Nomar Garciaparra, and Carlos Delgado to all (unfortunately) fall off the ballot.

I'll update this page daily with new, up-to-the-minute projections as more public ballots become known, so check back often.

If you're still with me, here is my methodology for all of the above. To find each player's adjustment, I compared 2014, 2013, and 2012 Hall of Fame polls from Viola and Twitter user @leokitty to the final results released by the BBWAA in order to figure out how private ballots voted. (For these numbers year by year, see this Google spreadsheet.) I took a simple average of each player's difference between public and private ballots in those three years. (Last year, multiple people suggested to me that I should weight more recent data more heavily in calculating my adjustments; this was a good idea, but when I went back and calculated it in a post mortem of my projections, a straight average was actually more accurate.) Then I added or subtracted that average deviation to/from players' polling numbers this year and extrapolated a projection for the final vote. I assumed that final turnout will be 570 BBWAAers (it has been very consistent at 571, 569, and 573 the past three years) and weighted public and private ballots proportionally—so, as more and more public ballots are released, they will assume a greater and greater share of the final projection, and private ballots will matter less and less. This will also reduce the error in my forecasts; obviously, as public ballots approach 100% of the total ballots, the effect of private ballots will shrink to approach zero. (In other words, in some magical land where every BBWAA voter announces his or her ballot in advance, there would be no polling error because every vote has been pre-counted.)

For some players, 2015 is their first year on the Hall of Fame ballot, so there was no historical deviation to calculate. After two fancy attempts to guess first-timers' adjustment factors fell flat in 2013 and 2014, I decided to keep it simple this year. The one overriding pattern for first-timers is that public ballots tend to overstate them by a few points—specifically, they did an average of 5.1 points worse on private ballots than on public ballots the last two years. Therefore, this year, I docked each rookie candidate 5.1 points from the polls.