Frequently asked questions (FAQ)

1. About the project

1.1 What is the goal of the PollyVote project?

The PollyVote project uses the high-profile application of predicting U.S. presidential election outcomes to demonstrate advances in forecasting research. When the PollyVote was first launched in 2004, the original goal was to demonstrate the benefits of combining forecasts. Since then, the PollyVote team has expanded its focus by analyzing the value of new forecasting methods such as expectation surveys and index models.

1.2 Who is behind the PollyVote?

The PollyVote was launched in 2004 by forecasting expert J. Scott Armstrong and the political scientists Alfred Cuzán and Randy Jones. In 2007, Andreas Graefe joined the project. For more information about the team click here.

1.3 What's up with the parrot?

Polly the parrot is the mascot of the PollyVote project. In many respects, Polly is an ordinary parrot. She is completely apolitical and knows nothing about politics or voting behavior, and merely parrots what she hears from others. That said, Polly is special in her knowledge about evidence-based forecasting methods and procedures, and she can calculate simple averages. Therefore, Polly is able to apply one of the simplest and yet most effective principles of forecasting, namely combining forecasts. Polly thus illustrates that you don't have to be a domain expert in order to be able to create accurate forecasts.

1.4 Where do I get more information about the PollyVote?

The best source are the research publications, which describe the PollyVote method and its historical accuracy. For any remaining questions, contact one of the team members.

1.5 For which countries is the PollyVote available?

PollyVote is currently forecasting the U.S. presidential elections and the German federal election. That said, Polly is always looking for new challenges. If you are interested in using PollyVote, contact one of the team members.

1.6 How is PollyVote funded?

The PollyVote project received no outside funding for forecasting the three U.S. presidential elections from 2004 to 2012 but was operated by the team members in their free time.
For forecasting the 2013 German election, the project was financially supported by the Center for Advanced Studies at LMU Munich.
For the 2016 U.S. presidential elections, we received funding by the Tow Center at Columbia Journalism School and the Volkswagen Foundation to provide data-driven coverage based on the PollyVote data.

1.7 What about the name "PollyVote"?

The first part of the name "PollyVote" refers to our mascot Polly the parrot. In addition, it is derived from the words "poly" – for the many forecasts that we use in the combination and – "political" – for the context of the forecast.
The name's second part reflects the fact that our forecasts focus on vote shares rather than winning probabilities, which we believe are more honest as the accuracy of the latter is different to evaluate for one-off events such as major elections.

2. PollyVote method

2.1 How does the PollyVote work?

The PollyVote is based on the principle of combining forecasts. That is, PollyVote combines forecasts from different forecasting methods, the so-called component methods, each of which rely on different data. The PollyVote forecast is calculated by following a two-step approach. First, we average forecasts within each component method. Second, we average the resulting forecasts across the component methods. In other words, we use equal weighting of all forecasts within each component method, then equal weighting across these forecasts from different methods.

2.2 Why does combining forecasts work so well?

One intuitive explanation as to why combining improves accuracy is that it enables forecasters to use more information, and to do so in an objective manner. Moreover, bias exists both in the selection of data and in the forecasting methods that are used. Often the bias is unique to the data and the method, so that when various methods using different data are combined to make a forecast, the biases tend to cancel out in the aggregate.

While combining is useful whenever more than one forecasts for the outcome are available, the approach is particularly valuable if

Many forecasts from evidence-based methods are available.

The forecasts draw upon different methods and data.

There is uncertainty about which method is most accurate.

These conditions perfectly apply to election forecasting. First,there are many evidence-based methods for predicting election outcomes. Second, these methods often rely on different data. Third, in most situations, it is difficult to determine a priori which method will provide the best forecast at a given time in an election cycle.

2.5 Why combine within and across component methods?

The rationale behind combining forecasts first within and then across component methods is to equalize the impact of each component method, regardless of whether a component included many forecasts or only a few. For example, while there is only one prediction market that predicts the national popular vote, there are forecasts from numerous econometric models. In such a situation, a simple average of all available forecasts would over-represent models and under-represent prediction markets, which we expect would harm the accuracy of the combined forecast. Another advantage of this approach is that it allows for comparing the accuracy of the different component methods.

2.6 Why use equal weights to combine the forecasts?

A widespread concern when combining forecasts is the question of how best to weight the components, and many scholars have proposed different methods for doing so. However, an early review of more than two hundred studies from different fields concluded that the question of how to combine forecasts does not seem to be critical to the forecast accuracy. In fact, it was found that the simple average (i.e., assigning equal weights to components) often provides more accurate forecasts than complex approaches to estimating ‘‘optimal’’ combining procedures (Clemen, 1989). Empirical research since then repeatedly confirmed these findings (Graefe et al. 2015). For reasons why equal weights work so well, read our piece on combining forecasts.

2.7 How often does PollyVote update its forecast?

The forecasts are updated whenever new information becomes available, which is often several times a day.

2.8 Why does the PollyVote ignore the third-party vote?

The PollyVote predicts the two-party vote. It does not provide vote share forecasts for third-party candidates. Instead, third-party votes are allocated proportionally to the major parties.
The main reason is that four of the six component methods –namely prediction markets, citizen forecasts, index models, and econometric models – do not provide forecasts for third-parties. Therefore, it's simply not possible to directly predict the third-party vote with the PollyVote method of combining forecasts within and across components.
Therefore, Polly stays the course, at least for 2016. That said, the effects of third-party candidates could potentially be large in future elections.
We thus plan to consider ways of handling the problem and perhaps experiment with alternative ways to estimate third-party votes. In fact, the PollyVote has already been used in Germany to predict the vote of seven parties.

3. Forecast accuracy

3.1 How accurate is the PollyVote?

The PollyVote published forecasts prior to each of the three U.S. presidential elections. In addition, one ex post analysis tested how a the PollyVote would have performed for the three elections from 1992 to 2000. Across the last 100 days prior to election day, the PollyVote provided more accurate forecasts than each of the component methods (Graefe et al. 2014b). Error reductions were large. For example, compared to single polls, the PollyVote reduced the forecast error by 59%. Comparisons have also been made with other methods. For example, forecasts of the 2012 election were also substantially more accurate than the closely watched forecasts at FiveThirtyEight.com (Graefe et al. 2014a).

The 2004 PollyVote was launched in March 2004 and forecast a victory for President Bush over the 8 months that it was making forecasts. The final forecast published on the morning of the election predicted that President would receive 51.5% of the popular two-party vote, an error of 0.3 percentage points (Cuzán et al. 2005).

The 2008 PollyVote was launched in August 2007 and forecast a victory for Barack Obama over the 14 months that it was making daily forecasts. On Election Eve, it predicted that Obama would receive 53.0% of the popular two-party vote, an error of 0.7 percentage points (Graefe et al. 2009).

The 2012 PollyVote was launched in January 2011 and forecast a victory for President Obama over the 22 months that it was making daily forecasts. On Election Eve, it predicted that Obama would receive 51.0% of the popular two-party vote, an error of 0.9 percentage points (Graefe et al. 2014a).

3.2 Why is PollyVote so accurate?

PollyVote strictly adheres to a well-established principle in forecasting research, which is to mechanically combine forecasts from different methods that use different information. Hundreds of studies have demonstrated the benefits of combining forecasts in different fields.

4. Project history

4.1 When was the PollyVote first implemented?

The PollyVote was first launched in March 2004 to the U.S. presidential election outcome of that same year. Until Election Day in November of the same year, the project team collected data from 268 polls, 10 quantitative models, and 246 daily market prices from the Iowa Electronic Markets vote-share market. In each of the last three months prior to the election, the team also administered a survey with a panel of 17 experts on US politics, asking them for their predictions. The forecasts were first combined within each component method by averaging recent polls, the IEM prediction market forecasts from the previous week, and averaging the predictions of the quantitative models. Then, the researchers averaged the forecasts across the four component methods. The resulting forecast was named the PollyVote. From March to November, the forecasts were initially updated weekly, and then, twice a week. The forecasts were published at forprin.com.

4.2 What happened with the PollyVote since 2004?

For predicting the 2008 election, the general structure of the PollyVote remained unchanged; the PollyVote combined forecasts within and across the same four component methods as in 2004. However, some changes were made at the level of the component methods. Instead of averaging recent polls themselves, the PollyVote team used the RCP poll average by RealClearPolitics as the polls component. In addition, the advantage of the leading candidate was discounted (or damped) using the approach suggested by Jim Campbell. The first PollyVote forecast for the 2008 election was published in August 2007, 14 months prior to Election Day, and was updated daily (Graefe et al. 2009).
For forecasting the 2012 election, a fifth component called index models was added to the PollyVote. This component captured information from quantitative models that use a different method and rely on different information than the traditional political economy models. In particular, the index models capture information about the campaign, such as the candidates’ perceived issue-handling competence (Graefe & Armstrong 2012; 2013), their leadership skills (Graefe 2013), their biographies (Armstrong & Graefe 2011) or the influence of other factors such as whether the incumbent government faced some scandal (Lichtman 2008). The first forecast for the 2012 election was published on January 1, 2011, almost two years prior to Election Day. As in 2008, the forecasts were updated daily, or whenever new information became available (Graefe et al. 2014a).
In 2013, the PollyVote was launched in Germany to predict the German federal election of the same year.

5. Perception

5.1 How is the PollyVote perceived in the media?

The PollyVote predictions have been rarely cited in the popular press. Possible reasons are that (1) people have difficulties to understand the benefits of combining, (2) people are not interested in combining because they prefer a forecast that suits their preferences or wrongly believe that they can identify the best forecast, and (3) people think that the method of calculating averages is too simple (Graefe et al. 2014b). Another possible reason is that the PollyVote predictions are very stable and rarely change, whereas election observers and journalists are interested in excitement and newsworthiness.

7. Automated news

7.1 What is automated journalism?

Automated journalism refers to the process of using software or algorithms to automatically generate news stories without human intervention—after the initial programming of the algorithm, of course. Thus, once the algorithm is developed, it allows for automating each step of the news production process, from the collection and analysis of data, to the actual creation and publication of news. Automated journalism — also referred to as algorithmic or, somewhat misleadingly, robot journalism — works for fact-based stories for which clean, structured, and reliable data are available. In such situations, algorithms can create content on a large scale, personalizing it to the needs of an individual reader, quicker, cheaper, and potentially with fewer errors than any human journalist.