BP Unfiltered

Beware of Bias in Predicted Team Win Totals

Sam Miller and I recently interviewed 28 Baseball Prospectus 2014authors as part of the Effectively Wildseason preview podcast series. At the end of each episode, we asked our guest to predict the 2014 win total for the team we’d just talked about. Listener Jeffrey A. Friedman sent us the following unsolicited submission about bias in these predicted win totals, which we decided to publish with his permission. Beware of bias in your own predictions! —Ben Lindbergh

I enjoyed listening to Effectively Wild’s preseason team previews, and I like that you asked guests to predict team wins numerically. This provides us with an opportunity to examine forecasting patterns, and in particular, to analyze how and why your guests were consistently optimistic about their clubs. On average, your guests predicted that their teams will win 84 games this year.[1] Only four of your 30 guests predicted that their teams would underperform their PECOTA projections.[2] You can view all 30 predicted win totals here.

I find this interesting, because while we expect fans to overrate their teams, it’s not clear why your well-informed guests would do this so consistently as well. They spent 20 minutes discussing their team’s strengths and weaknesses in detail and showed little evidence of bias in doing so. But when the time came to make their predictions, they were consistently overoptimistic. Regardless of how much you believe in PECOTA as a baseline, it’s just not possible for the average team to win 84 games.

Why were your guests so optimistic? Though we only have 30 data points to work with, a few patterns appear.

For example, your guests were much more likely to overrate good teams. If a team finished below .500 in 2013, your guests were only slightly overoptimistic about them for the coming season, predicting that they would be only 0.9 wins better than their PECOTA projections, on average. By contrast, your guests predicted that teams who finished above .500 in 2013 would beat PECOTA by an average of 4.8 wins this year.

This could be because it's easier to envision better teams overachieving, and thus easier to engage in wishful thinking about those teams. Another explanation I find more plausible is that people find it difficult to predict regression from one season to the next. Things like injury risk, aging, and BABIP are the kinds of abstract, probabilistic factors that PECOTA handles much better than individuals do. I suspect that many of your guests were simply not giving these factors enough weight.

Consistent with this idea, only eight of your 30 guests predicted that their teams would win fewer games than they did in 2013. Moreover, of the four guests who thought their teams would underperform PECOTA this year, two (Adam Sobsey/Blue Jays and Mike Curto/Mariners) were specifically concerned with what they saw as above-average injury risks, and a third (Ken Funck/White Sox) thought that his team might hold a firesale. So when your guests had firm reasons to expect that key players would be taken out of commission, they did work this into their forecasts. I suspect your guests (like many analysts and fans) were simply much less sensitive to the kinds of randomly-occurring, low-probability injuries and regressions that are widespread and important, yet much harder to envision due to the fact that they are unpredictable.[3]

Another interesting pattern that emerged was what psychologists might call a “priming effect,” which I noticed whenever you or your guests mentioned a team’s PECOTA projections around the time you asked them to make a forecast. When the PECOTA projection was mentioned, guests predicted that their teams would outperform it by an average of 1.4 wins, but when PECOTA wasn’t mentioned, guests predicted their teams would beat it by an average of 3.9 wins.[4] Since these instances of priming weren’t randomly assigned, we can’t draw too much of a conclusion here. But it is interesting to note that priming correlates with much more reasonable projections.

Two final notes. First, I checked whether your guests were more likely to overrate teams that PECOTA projected to be better on offense versus defense. Interestingly, the coefficients on these variables were nearly identical. (I had expected that your guests would find it easier to envision a club overachieving if it had a great offense). I think this lends support to the idea that your guests are fairly objective and unbiased on the whole, but that they find it difficult to foresee random regression across the board.

Last, I suspect that there is another bias going on here that I did not know how to test. A lot of your discussions about each team revolved around new players coming up to the big leagues, young players improving, new coaches making an impact, or any number of “little things” that make teams better from one year to the next. The problem from a forecasting standpoint is that almost all teams benefit from marginal improvements like these. In order to factor these things into your forecast, you’d really need to say that your team is making more marginal improvements than the average team. But since your guests know most about the teams that they’re discussing, they are not necessarily in a great position to say that. Thus, they may have a tendency to give their teams too much of a boost based on the “little things” that they’re not in a position to observe elsewhere.

[1] It's actually more than that. Adam got off the hook without making a numerical prediction about Toronto. But he said they would come in last, so I gave them an estimated win total of 75, which is what PECOTA says it will take to come in last in the AL East. If you take this synthetic prediction out, your guests predict their teams will outperform PECOTA's projections by an average of 3.2 wins.

[2] I used the PECOTA projections released on February 4, just before the preseason podcasts started.

[3] An additional factor supporting this argument is that your guests’ overoptimism was totally uncorrelated (correlation=-.01) with a team’s projected wins for this year. So there is a difference between saying that your guests overrate good teams and saying that they overrate teams that were good last year. The latter is what we find in these data, which is consistent with the idea that your guests are not expecting enough regression.

[4] A two-way t-test indicates there is a 90 percent chance that this pattern is not random.

I recall many of these writers admitting they favor the teams they were writing about, even saying "we", as I recall. Isn't associating you wishes and expectations the surest way to submit to bias? Does BP even expect or request unbiased writing, at least on an intellectual level? I think most of these writers would admit their lives would be a smidge better of they could follow a winning team this summer and would do so with no regret or shame. It seems the nature of sports to start "liking" the team you experience often, so finding somebody who watches the team a lot who also does not have a hometown bias would be very difficult and rare. The flip side are the fairly frequent naysayers who also dump on their own team and predict doom. I find them dreary and avoid their company!

In my case, I mostly think the A's should end up higher than PECOTA because I don't trust PECOTA to project Bob Melvin's platooning skills correctly.

A couple of examples:
For Derek Norris, PECOTA projects him to hit .165/.268/.242 vs RHPs, and .295/.380/.528 vs LHPs. On the team projections, he is listed as getting 325 PAs for a combined total of .224/.316/.394.

In order for those splits to add up to those combined total, Norris would have to hit with a platoon advantage about 45% of the time. But last year, he hit with a platoon advantage about 56% of the time.

And then there's Brandon Moss. PECOTA projects him to hit
.225/.283/.396 vs LHPs, and .269/.350/.558 vs RHPs. Combined, he's listed as getting 314 PAs for a combined total of .242/.314/.443.

I don't get at all how Moss's combined totals add up. He'd have to somehow get more PAs against LHPs than against RHPs for those splits to add up to that combined total; that is, getting the platoon advantage in less than 50% of his PAs. Last year, he hit with a platoon advantage in 82% of his PAs.

Perhaps they were looking at other projection systems, like ZiPS or Steamer. I know that those two systems have vastly divergent projections for, say, the Reds' rotation when compared to each other.

It's pretty clear WHY the article addresses only PECOTA, but other projection systems take BABIP regression, aging, and injuries into account as well, frequently with very different results. PECOTA, obviously, isn't the end-all for projections, and finding places where projections diverge frequently is revealing about the player in question. The differences are a feature, not a bug.

By the way, there was talk of a comparison article between PECOTA and other projection systems. Did that ever materialize? Would be/have been a fascinating read, probably.

Excellent. Validation and assessment is a somewhat neglected part of analytics. At least, it doesn't get as much attention; it's treated somewhat like coming in to bayonet the wounded after the main battle is over. The truth is that if you don't validate, assess, and use feedback to refine the process, improvement doesn't take place.

BP does that, and some of that gets into the methodology and discussion part of articles. I'd like to see more of it.