In Part 1 of this article, I looked at the ability of individual projection systems to forecast hitter performance. The six different projection systems considered are Zips, CHONE, Marcel, CBS Sportsline, ESPN, and Fangraphs Fans, and each is freely available online. It turns out that when we control for bias in the forecasts, each of the forecasting systems is, on average, pretty much the same. In what follows here, I show that the Fangraphs Fan projections and the Marcel projections contain the most unique, useful information. Also, I show that a weighted average of the six forecasts predicts hitter performance much better than any individual projection.

Forecast encompassing tests can be used to determine which of a set of individual projections contain the most valuable information. Based on the forecast encompassing test results, we can calculate a forecast that is a weighted average of the six forecasts that will outperform any individual forecast.

The term “forecast encompassing” sounds complicated, but it’s a simple concept. The idea is that if one projection doesn’t contain any unique information helpful to forecasting compared to another projection, then that forecast is said to be “forecast encompassed” and it can be discarded. When we are left with a group of forecasts that don’t encompass each other, then each must then contain some unique, relevant information.

Table 1 shows the optimal forecast weights after forecast encompassing tests have eliminated the forecasts with duplicate or irrelevant information. One thing that we see is that the Fangraphs Fan projections contain a large amount of unique information relevant for forecasting in each statistical category. Marcel projections are relevant in four categories. ESPN and CHONE projections are only useful in two categories, Zips in one, and the CBS projections have no unique, useful information in them according to these metrics.

Table 1. Optimal Forecast Weights

Runs

HRs

RBIs

SBs

AVG

Marcel

0.22

0.53

0.25

0.38

Zips

0.30

CHONE

0.44

0.44

Fangraphs Fans

0.19

0.47

0.31

0.29

0.55

ESPN

0.29

0.33

CBS

Using these weights, we can compute a forecast for each statistic that is a weighted average of these six publicly available forecasts. Table 2 shows the Root Mean Squared Forecasting Errors (RMSFE) of this composite forecast versus the other six forecasts. Here, we see that the weighted average performs substantially better than any individual forecast.

Table 2. Root Mean Squared Forecasting Error

Runs

HRs

RBIs

SBs

AVG

Marcel

24.43

7.14

23.54

7.37

0.0381

Zips

25.59

7.47

26.23

7.63

0.0368

CHONE

25.35

7.35

24.12

7.26

0.0369

Fangraphs Fans

29.24

7.98

32.91

7.61

0.0396

ESPN

26.58

8.20

26.32

7.28

0.0397

CBS

27.43

8.36

27.79

7.55

0.0388

Weighted Average

21.74

6.62

21.71

6.77

0.0338

Even when we correct for the over-optimism of the six base projections, the average forecast still does better in every category, though by not as much.

Table 3. Bias-corrected Root Mean Squared Forecasting Error

Runs

HRs

RBIs

SBs

AVG

Marcel

23.36

6.83

22.81

7.28

0.0348

Zips

22.98

7.02

23.52

7.59

0.0341

CHONE

22.96

6.85

22.33

7.24

0.0341

Fangraphs Fans

23.24

6.88

23.53

7.08

0.0340

ESPN

23.03

7.27

23.62

7.14

0.0357

CBS

22.91

7.29

23.90

7.27

0.0347

Weighted Average

21.74

6.62

21.71

6.77

0.0338

So what is the takeaway from this two-part series comparing six of the freely available sets of hitter forecasts?

1) Without correcting for the over-optimism (bias) of the forecasts, the mechanical forecasts, Marcel, CHONE, and Zips, outperform the others.

2) When correcting for the biases, no set of forecasts is any better than another.

3) A weighted average of the forecasts performs much better than any individual forecast.

4) Forecast encompassing tests indicate that the Fangraphs Fan projections and the Marcel projections contain the most unique and relevant information in them compared to the other forecasts considered.

You can flag a comment by clicking its flag icon. Website admin will know that you reported it. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

“3) A weighted average of the forecasts performs much better than any individual forecast.” Well, duh. We shouldn’t expect a model to ourperform all other models in all five categories, which is necessary for this statement to have any chance of being untrue. Otherwise, you could just weight the models that are best in each category at 1.00 and make the same assertion. What you really want to know, though, is not whether a blended model would have performed better after the season, but whether a blended model can consistently outperform the individual models, with the weights having been set… Read more »

You can flag a comment by clicking its flag icon. Website admin will know that you reported it. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

@Marver: Easy there! You’re exactly right. That’s why I’m working on a set of forecasts for 2011 that are in part based on the weights from 2010. Then we’ll see if the approach works or not.

Vote Up0Vote Down

8 years ago

Guest

Everett

You can flag a comment by clicking its flag icon. Website admin will know that you reported it. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

Wouldn’t it be a good idea to run this sort of analysis back several years to see if there’s any consistent weighting of the different systems, or if its just noise?

You can flag a comment by clicking its flag icon. Website admin will know that you reported it. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

@Everett: I haven’t had time to dig up all of the old forecasts, but that would be good to do. The problem is that the fangraphs fan projections, which is one of the top performers in my analysis, is only 1 year old. For older forecasts and forecast systems, check out Nate Silver’s work at http://www.baseballprospectus.com/unfiltered/?p=564

Vote Up0Vote Down

8 years ago

Member

Marver

You can flag a comment by clicking its flag icon. Website admin will know that you reported it. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

@Will…then just include fangraphs’ projections with extremely smaller weights for 2011, or build two models: one with and one without. Ultimately, the exercise you’re trying to do will prove very difficult due to year-to-year variations in ideal weights, plus the fact that many projection systems incur tweaks to the logic, coding, etc. that further distort its year-to-year weights. I ran this analysis last year on a few sources and came to the conclusion that the weights were unstable year-to-year, producing an edge that was negligible and ultimately not worth the time/assets that went into it. That’s not to say it… Read more »

You can flag a comment by clicking its flag icon. Website admin will know that you reported it. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

@Marver: You raise some good points. Weights will vary from year to year; that is certain. I also am sure that the methods of a particular forecast (ESPN and CBS especially) will vary from year to year. Those are shortcomings, for sure. However, there is quite a bit of formal study on forecast averaging and this is the general result:* 1) forecast averages computed using previously optimal weights are better than 2) forecast averages computed using a simple average of other forecasts, which are better than 3) any single forecast Again, this is something that we should be investigating, and… Read more »

Vote Up0Vote Down

8 years ago

Member

Marver

You can flag a comment by clicking its flag icon. Website admin will know that you reported it. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

I absolutely agree; they are better. The problem is that it is certainly more true in some fields than others, and baseball projecting is a relatively untested in comparison to other fields in which projection models are prevelant, like weather. I’ve done basically the exact same thing you’re about to replicate and while I found that the result is a better projection system than any of its constituent parts, the difference was small in terms of added applied value to fantasy baseball teams. The difference was especially small when comparing the time put into developing/grading the system to other studies… Read more »

Vote Up0Vote Down

8 years ago

Guest

Jeremiah

You can flag a comment by clicking its flag icon. Website admin will know that you reported it. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

@Will I think your articles are very interesting. I don’t have a statistics background and I’ve wondered for years why people didn’t take the useful (unique?) data from the various projection systems to develop a weighted “super” system.

Do you plan on looking at Pitchers as well? What about expanding the hitter categories, K’s, BB’s, XBH’s?)

You can flag a comment by clicking its flag icon. Website admin will know that you reported it. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

@Marver: Isn’t fangraphs awesome? We wouldn’t be having this conversation on any other site. Maybe we should be doing some work together. @Jeremiah: Thanks!! There are 2 things that I’d like to do now that these articles are out. First, I want to get some hitter forecasts on record for next year so I can see if this whole idea works in practice. Second, I’d like to do the same thing I’ve done for hitters to pitchers. I don’t have plans right now to expand the hitter categories, but that’s a pretty natural extension of what I’ve done here if… Read more »

Vote Up0Vote Down

8 years ago

Guest

Brett

You can flag a comment by clicking its flag icon. Website admin will know that you reported it. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

For the 2010 season, I forecasted averages computed using a simple average of forecasts (Zips, Marcel, Chone and ESPN) and it worked rather well. For the 2011 season I plan on adding a simple weighting to my forecasts. 1). What do you feel is a good way to weight the various projections? I had initially thought of ranking the 6 projections, giving 6 to Marcel, 5 to FG Fans, 4 to Chone etc. Denominator would be the sum of 21, so Marcel would be weighted 6/21 and CBS 1/21. Is this too simple? 2) Once I had created my projections,… Read more »

You can flag a comment by clicking its flag icon. Website admin will know that you reported it. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

@Brett: Your intuition was right in doing a simple average of each of those forecasting systems. That’s usually pretty tough to beat. You’re also on the right track trying to weight the forecasts by the ones that have historically performed the best and have the most useful information in them. My article here says that you should use different weights depending on the category. For example, when you want to forecast HRs, it’s best to do about 50% marcel and 50% fangraphs fans and ignore the other systems because they don’t add anything beyond those two. For SBs, it’s best… Read more »

Vote Up0Vote Down

8 years ago

Guest

Brett

You can flag a comment by clicking its flag icon. Website admin will know that you reported it. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

Will,

I saw your weighting by source in the article above. I was under the impression that these weights were for 2010. Is there any reason to believe that any system is better at projecting a given category from year-to-year?

You can flag a comment by clicking its flag icon. Website admin will know that you reported it. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

@Brett: see ^^^^^^^:

“However, there is quite a bit of formal study on forecast averaging and this is the general result:

1) forecast averages computed using previously optimal weights are better than
2) forecast averages computed using a simple average of other forecasts, which are better than
3) any single forecast”

I plan on doing the same thing for pitcher forecasts in the next couple of months.

Vote Up0Vote Down

8 years ago

Guest

Jeremiah

You can flag a comment by clicking its flag icon. Website admin will know that you reported it. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

@Will – since CHONE is off the free market, how would you suggest a simple weighted average of these three systems: fangraph fans, ZIPS and Marcel?

Thanks!

Jeremiah

Vote Up0Vote Down

8 years ago

Member

jaywrong

You can flag a comment by clicking its flag icon. Website admin will know that you reported it. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

You can flag a comment by clicking its flag icon. Website admin will know that you reported it. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

@jaywrong: Marver had some good points. It’s your comment that has no place here. @Marver: keep ’em coming! Do you have any forecasts for 2011? I think we should compile forecasts somewhere and do a big comparison at the end of the year. Maybe like 20 or so, including the main ones, then a bunch of personal forecasts from different people trying different things (average, weighted average, subjective, etc). @fangraphs crew: Is there anyway to upload forecasts en masse as opposed to manually entering individual stats for individual players? Then, is there any way to get access to the individual… Read more »

You can flag a comment by clicking its flag icon. Website admin will know that you reported it. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

@Jeremiah: It’s not as simple as just removing the weight from CHONE and splitting it all amongst the remainder because CHONE is potentially duplicating info in the encompassed projections. I’d have to re-specify and run my routines in order to CHONE-less weights. In the absence of this, I’d just do a 1/3, 1/3, and 1/3 average between Fans, ZIPS, and Marcel, or 50/50 Fans and Marcel.

Vote Up0Vote Down

8 years ago

Guest

Brian

You can flag a comment by clicking its flag icon. Website admin will know that you reported it. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

Will, How does this help you in any way shape or form? Pick any of the RMSFE numbers for runs as an example. Knowing that the projects are going to be plus or minus 20-some runs doesn’t help a whole lot does it? That means if a player is predicted to score 90 runs, he could score anywhere from less than 70 to more than 110 runs. Sure, now you’ve done the statistical analysis to know how accurate the projections are and you have basically shown that all of the projections have a big enough error that they really can’t… Read more »

You can flag a comment by clicking its flag icon. Website admin will know that you reported it. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

@Brian: I’ve gone through periods where I ask the same question. What it comes down to though, is that any sort of projection, ranking, draft order, etc, is going to be uncertain. The real question is, despite this uncertainty, can we rank players based on their expected performance? The answer to this question is yes. While we may have difficulty getting any single particular player right, we can do much better, on average, by having a solid draft list constructed using solid projections. Any increase in the accuracy of our forecasts will make our draft lists better. If by bias… Read more »

Vote Up0Vote Down

8 years ago

Guest

Brian

You can flag a comment by clicking its flag icon. Website admin will know that you reported it. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

@Will: I’m totally convinced, that sounds like a great idea to compile a bunch of forecasts and create your own to gain that advantage over everyone else because in the end, isn’t that what we are all looking for? A way to dominate our friends so we can boast about being the best.

Are you going to have these projections somewhere to share so that the rest of us can see them, or are you just describing a way for us to do our own, more accurate projections?

You can flag a comment by clicking its flag icon. Website admin will know that you reported it. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

@Brian: Both. I’m putting a website together where I’ll gather the main online projections and allow users to submit their own projections. Then, when the season is over, we can see what systems did the best.

If you have any other ideas (or anyone else!!!) let me know what you’d like out of a site like that.

Vote Up1Vote Down

8 years ago

Member

Matt Goldfarb

You can flag a comment by clicking its flag icon. Website admin will know that you reported it. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

I found using ~2/3 Bill James and ~1/3 Marcel produced highest pearson, and lowest RMSE to actual results for HITTERS.

I did this a little over a year ago with the 2009 and 2008 stats. I’m not really good enough with SQL to go back any further.

You can flag a comment by clicking its flag icon. Website admin will know that you reported it. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

@Matt: I’m surprised you found that. I also ran Bill James’ numbers for hitters (and pitchers, for that matter) and I found it to be a pretty poor performer, and not adding any new information beyond the 6 freely available systems. What stats were you using to compare?

Vote Up0Vote Down

8 years ago

Member

evo34

You can flag a comment by clicking its flag icon. Website admin will know that you reported it. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

Will, I’m just not sure you understand sample size, and in-sample data vs. out-of-sample data. You cannot try to find optimal weights using one year of data. There is way too much variance in baseball performance to think that the stat-specific weights you mention (below) are anything but noise. The only way to prove otherwise is to generate your optimized weights on a set of training data, and check the performance on a (completely different) set of test data. So to say you “should do” or “it’s best to do” various stat-specific system weightings based on this extremely limited study… Read more »

You can flag a comment by clicking its flag icon. Website admin will know that you reported it. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

@evo34: Evo, I’m very clear about the limitations of my work. This article is looking at 2010 hitter forecasts and is thus a purely ex post analysis. Weighted averages using historic weights are useful in forecasting in other areas, so I’m presenting the hypothesis that this weighted forecast will be better than an average forecast. Your hypothesis that “there is way too much variance in baseball performance to think that the stat-specific weights you mention (below) are anything but noise” is testable as well. Before you make bold statements about my work ((such as “this extremely limited study can only… Read more »

Vote Up1Vote Down

8 years ago

Guest

Joel

You can flag a comment by clicking its flag icon. Website admin will know that you reported it. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

Well done, Will.

Obviously there will be year-to-year deviation, but since the factors underlying the mechanical predictions should remain consistent, a historical accumulation of projection data should be helpful. Even the fan projections should be consistent on some level…

Vote Up0Vote Down

7 years ago

Member

evo34

You can flag a comment by clicking its flag icon. Website admin will know that you reported it. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

Will, if you’ve done any prediction work whatsoever (stocks, sports, weather, anything), you should know that you CANNOT optimize parameters of a system on the same data you are using to test said system, and expect it to be successful. This is Data Mining 101. What you have done is this article is described what has occurred in the past. By itself, that would be fine. Not very useful, but fine. But you take the reckless step of claiming that you have found the best system to use to predict the future: “My article here says that you should use… Read more »

You can flag a comment by clicking its flag icon. Website admin will know that you reported it. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

Evo,

Both of ours are testable hypothesis. We will see after this season. If my weighted forecast averaging approach is worthless, we will be able to clearly see it in the data!

–Will

Vote Up0Vote Down

7 years ago

Guest

evo34

You can flag a comment by clicking its flag icon. Website admin will know that you reported it. Admins may or may not choose to remove the comment or block the author. And please don't worry, your report will be anonymous.

This is exactly the same mentality that led your original erroneous conclusions. You cannot test any model creation hypothesis on one season of data.

It’s critical to understand this when you are in the business of prediction.