10 Lessons I Have Learned about Creating a Projection System

Editor’s Note: This is the third post of “10 Lessons Week!” For more info, click here.

How did ZiPS come about? The genesis of what later became ZiPS stems from conversations I had over AOL Instant Messenger, with Mets fan, SABR social gadfly, and pharmaceutical chemist Chris Dial during the late(ish) 1990s. I knew Chris from Usenet, a now mostly-dead internet distributed discussion system.

Usenet was my introduction into the wider sabermetrics community, full of lots of other names you would recognize, like Keith Law, Christina Kahrl, Voros McCracken, Sean Forman, and scads of others. Chris and I talked about making a basic projection system, that had results the public could freely access, that did 95 percent as well as projections hidden behind paywalls. The conception is similar to what Tom Tango later independently developed and coined Marcel.

Nothing came of that at the time. I didn’t revisit the idea of doing a projection system until after the turn of the millennium, when I was regularly writing transaction analysis for Baseball Think Factory, a startup of Jim Furtado and Sean Forman that I had been involved in since its conception in 2000. While I majored in math back in college, I was never much motivated by it unless it could put to use making me money or analyzing sports.

I had financial flexibility at the time due to the former preferred application of math, so I had the time and ability to put together a projection system. There wasn’t any eureka moment that led to the creation of ZiPS–I didn’t fall asleep at a game until a baseball fell on my head from a Barry Bonds tree–it just seemed like a practical thing to have when analyzing transactions.

What started as a basic projection system ended up as something much more complicated. I had the idea to incorporate some of McCracken’s DIPS research into the mix, which is the reason I named it ZiPS, in honor of it. I actually intended to call it ZiPs because CHiPs was my second-favorite show as a child (behind Dukes of Hazzard), but I mistyped it as ZiPS when it finally debuted at Baseball Think Factory.

Jay Jaffe had noticed it and gave it a plug at the time–the first mention of ZiPS in media–as ZiPS, so ZiPS it remained. It originally was going to be SiPs, but that kind of sounded like some Scandinavian bottled water company or some kind of juice package for kids that is impossible to open, like those stupid Capri Suns. Note, none of this story is made up; I really am that ridiculous.

The initial build of ZiPS was still fairly simple relative to today’s version. It only used basic stats and had generic aging factors. As time went on, it became more complex. Various studies yielded more information on modeling various stats like BABIP, and the generic aging factors evolved first into 12 aging factors (representing general player archetypes) and finally into generating estimated aging curves on the fly using cluster analysis.

ZiPS isn’t the product of genius, it’s the product of a lot of work at the right time. One of the real smart guys in sabermetrics, people like Sean Forman or Chris Long, could’ve put together a ZiPS-like projection system much more quickly. When someone asks me how I got to write about baseball for a living, such a cool job, I’m almost embarrassed to admit that I connected with the right group of baseball nerds in the right place at the right time.

When I was a kid, my grandfather bought me all the Baseball Abstracts, the Elias Baseball Analysts, and a subscription to Sports Illustrated starting when I was six. (I was always a baseball nut; most of the pictures of me up to age five or six had me wearing an Orioles cap.)

I was smart enough to realize very early on that I was not destined to be a major league player, so I grew up wanting to be Bill James or Peter Gammons. Now, nobody could actually be Bill James or Peter Gammons other than the originals, but I feel very fortunate and blessed. Along the way, I’ve learned a few things. Let’s jump in, shall we?

Lesson #1: Developing a Projection System is a Lot of Work

Edison once commented that genius was one percent inspiration and 99 percent perspiration. Putting together a projection system is about 110 percent perspiration, leading to negative inspiration, and totally breaking how the math works. At a fundamental level, you’re putting together an insane amount of data in order to come up with an objective estimate.

While a lot of people start with Marcel as the base and make changes from there, ZiPS pre-dates Marcel, so a lot of the questions involving what data has predictive value and the weights to assign to various factors were things I had to do original research for. ZiPS is essentially the sum of literally hundreds of mini-studies I’ve done on various issues.

Lesson #2: People Overrate the Odds of a Player Improving

Even among many who are into the sabermetric side of baseball, there’s a belief in a neat, tidy, aging curve for players. It’s nowhere near that simple. While you see this pattern in the aggregate, especially for hitters, nothing comes that easy. Many minor leaguers, even those of prominent talents, simply don’t improve past where they are 21 or 22, even at the higher minor league levels.

People also have an idea that a superstar at 22 is going to be even better at 27, but again, that’s not true, especially to the extent it may be true for a 22-year-old still putting together his skills. While a random 21-year-old is preferable to a random 25-year-old of similar abilities, the very young high achievers tend to plateau once they hit stardom.

Willie Mays never was significantly better than he was at age 23. Alex Rodriguez didn’t have a traditional 27-ish peak. Neither did Mickey Mantle or Ted Williams, and so on and so on. Mike Trout‘s going to be an unreal player when he hits 27, but he’s unlikely to be in a different tier of craziness than he is/was from 2012-2014.

Lesson #3: Historical Data Sucks

We have all sorts of new, exciting information that has been collected about baseball players over the last 30 years, but unfortunately, we’re unable to go back in time and get this kind of data from past players. We can make estimates of data, such as Sean Smith‘s awesome Total Zone, which is one of the best attempts to kludge some good data from a time when good data wasn’t available, but so much information about players is lost in history.

As someone who spends a lot of time trying to wring some understanding out of data, it kills me how much stuff we don’t know about the past that we will never truly be able to know. And even data that theoretically should be easy to collect over the years is frequently sullied by poor record-keeping. Want to include height and weight in your projections? A lot of those numbers are fiction. Anyone believe John Kruk played at 204 pounds (FanGraphs) or 170 pounds (Baseball-Reference)?

We can make a not-completely-terrible model of how fast players were through some of the primary and secondary statistics available, but being able to model reasonable guesses as to what the data are is not the same thing as having the actual data in your hand.

Lesson #4: There’s a Lot We Still Don’t Know About Advanced Statistics

We have a lot of new data with the various f/x incarnations, and while these are cool, we are still very early in understanding the predictive value of some of these new data. Some things we can get out of the statistics, such as the general value of fastball velocity on a pitcher’s expectations, but there are a lot of lessons in there that we just don’t know yet and won’t know for another 10 or 20 years. Much of what we attribute predictive value to, among things like swinging -trike percentage, is still a set of educated guesses at this point.

Lesson #5: We Could Do a Lot More With Better Minor League Data

While the amount of data available on major leaguers has been improving continually, especially over the course of the post-Moneyball era, there’s still a lot of information about minor leaguers that isn’t regularly available to the public. The state of proprietary data on minor leaguers is a little better, but it’s still not at the level of what’s available for major leaguers.

Keeping some of the good data proprietary essentially blocks out some of the next generation of big baseball thinkers from making the next breakthrough. Even something as basic as minor league splits was really difficult to come by on a widespread level until Jeff Sackmann developed minorleaguesplits.com several years ago with scripts that data-mined play-by-play logs.

The state of public defensive data for minor leaguers is even worse, with the defensive stats provided no more advanced than MLB fielding stats in the 1950s. Sean Smith and I have systems that parse minor league play-by-play logs to get some rudimentary defensive data, and I’ve taken the step of text parsing keywords in scouting reports to get a nudge one way or the other, but this isn’t a substitute for better data.

Lesson #6: People Get Really Mad at Algorithms

No matter what you do, no matter how well you explain your projection system, no matter how clearly you lay out the basic design principles, people will get furious with your projections and accuse you of all sorts of biases and agendas in putting your projections together. I received my first death threat in 2005, someone telling me that they hope my house burns down (though I guess that can be more properly qualified as a death wish than a direct threat). It wasn’t my last one.

Explaining basic probability to a group of people can be best qualified as futile. For example, ZiPS projects very few batters to have a mean projection of a .300 batting average in a season. For 2014, only three hitters were projected to have a .300 BA or better as the mean outcome.

ZiPS actually projects there willd be 23 players (on average) that would end up at .300 or better, the difference being that we don’t expect everyone to play to their mean projection, we expect 10 percent of players to hit at a BA they only have a 10 percent chance of matching, and so on. Think you’re going to successfully explain this to a critic that may very well have never taken a probability class in high school or college? You’re not.

Lesson #7: Having a Programming Background is Very Useful

One of my regrets in developing a projection system is that, while I understand the underlying math and I have excellent skills in Excel and Statistica, my general programming skills are quite rudimentary. Like PECOTA was for the Baseball Prospectus crew, turning ZiPS into a program rather than a bunch of gigantic, interlocking spreadsheets would make things run far more smoothly. Unfortunately, I just took my required Computer Science courses in college and just did enough to get a passing grade.

If I had to do it over again, a better programming background and much deeper database knowledge would make a lot of ZiPS far more elegant and easier for me to implement new things. I’ve never even used R, which makes me kind of a weird old relic in the sabermetric community, despite the fact that I turn 36 in June.

Lesson #8: Results are “Stickier” In-Season than Season-to-Season

When developing in-season projections, it quickly was evident that the model that worked for season-to-season projections needed quite a bit of adjustment to work for in-season projections. Simply put, there was significantly less regression toward the mean for in-season stats than you would expect from the sample size, relative to season-to-season stats.

One notable example was BABIP, in that the BABIP overperformance, in the context of in-season projections, tended to stick more than one would expect from the heavier regression from season-to-season. That .400 first-half BABIP may be doomed next year, but players retain a surprisingly large amount of that bounce within the same season.

Lesson #9: Sometimes, Simple is Better

Going back to in-season projections for a minute, they provide a solid example of why, sometimes, simpler is better. While we strive to have our models as accurate as possible and our conclusions as precise as we can make them, that stance sometimes can get in the way of conveying information. That’s not a small issue, since something that’s 99 percent accurate and can’t be communicated to other people easily may not be as intelligent a choice as something that’s 98 percent accurate and people can easily understand.

My original model for in-season ZiPS was significantly more complicated than the one updated every morning on FanGraphs, and it’s slightly more accurate. (It’s still the one I use when I need to calculate in-season projections for a player or two rather than a large group.) The problem with the more complicated model is that it wasn’t one that could be updated easily or daily.

A simpler model, the one that’s updated every morning, isn’t quite as accurate as the more complex one, but it has the benefit of constant updates for every major leaguer in baseball. What’s the use of great data if isn’t accessible? This is a lesson that I also would write for 10 Lessons on Being a Nerdy Baseball Writer–for things like OPS, which I still use if it’s practical–but that’s a different article.

Lesson #10: Don’t Let the Projections Affect Your Rooting Interests

One of the toughest things about projections is walking away from them. When I’ve done all of the projections for the year, I’m done with them, no sweating over who’s exceeding their projections or falling short, a policy that took me several years to value completely. If you start following individual player lines obsessively and start rooting for specific players to match their projections rather than win or lose games, you’ll slowly drive yourself insane.

ZiPS projected Josh Hamilton to have quite a low projection last year, the lowest of any of the projection systems (and he underperformed even the modest ZiPS one). But there’s something almost soul-sapping about rooting for a player to play poorly so that you look smart rather than because you want your favorite team to defeat that player’s team.

You always validate your results at the end of the year, but until then, just enjoy the games. Baseball is fun, after all.

Dave, haven’t written up anything formally, but when I developed the in-season weightings (I used every player’s line through every game since 1970), I found that, for example, that the seasonal BABIP had a larger weight of “this season” predictive value than I expected from season-to-season data. I got the same result using BABIP over and under xBABIP. The differences are subtle by real.

Very interesting. Sort of related: I was arguing with someone the other day about the meaningfulness of the first start of a season for a pitcher; they were saying that since the first start is the first time you see the pitcher that year, it tells you a lot more about the pitcher than starts in the middle or end of the year. I was pushing back, saying that seasons are sort of arbitrary endpoints, etc, but what you said here makes me doubt that a little. I wonder, for example, what the difference is in how you weight the last start of a season vs the first start of the season with regards to the next year’s performance.

One of the best examples that I know of on the pitching side is James Shields – BABIP by year, starting in 08: 29%, 32%, 35%, 26%, 30%, 30%. I had him on a fantasy team for a few years, ending with the 35% year, and remember watching too many great outings turn bad, with a little single followed by a walk followed by a HR (for example). I kept thinking his luck would eventually change, but it didn’t until the following season.

The in-season “stickiness” might just be that phenomenon of being “in a groove” or “in a zone”, that within that own player’s relatively short ample size of one season, he can get/remain focused in a smaller window of time on repeatable mechanics, sustained approach, etc.

Then, the off-season “breaks” that groove or that zone simply by interrupting it, and the likelihood of repeating something you did six months ago is harder to sync up than something you did yesterday, and the day before, and the day before that, etc., without disruption.

Maybe it’s a “Groove” metric, who knows? As always, I look forward to any of your future thoughts and output on it – great work as always, Dan!

“Predicting Josh Hamilton for a down year isn’t the same as rooting for said down year”

“But there’s something almost soul-sapping about rooting for a player to play poorly so that you look smart rather than because you want your favorite team to defeat that player’s team.”

It’s hard to tell the difference sometimes.

I forget who the specific Pirates player was that I got into this about, maybe one of the pitchers pitching over his head for awhile last year (Locke?), but even some fans of the team spent much of the year warning about the Regression Monster and stating with absolute certitude that said player can’t keep it up. I get all that. I get that minor-league numbers are very good indicators of MLB performance, that it’s highly unlikely for a player to make a performance jump outside of his projections at the MLB level. It’s really tough to learn new skills at the top.

But there’s something depressing about hearing your own team’s fans … not exactly root for a guy to do worse so they can say they were right, but a certain … I dunno, maybe “smugness” is the word. “We all know he’s not that good/can’t keep it up/his projections say …” And there’s something depressing about the notion that no player anywhere ever can get any better at the MLB level, that no one can surprise us with a leap. Might as well just plug our projections into a game simulator and let us know how the season’s going to turn out before anyone plays a game. What’s the point of watching if there’s no chance of being surprised?

So while the adult fan in me absolutely gets the great value of projection systems, in the heart of the 8-year-old fan in me, I kind of don’t want to hear about it. I’d rather just watch the season play out and see what happens.

I guess I can still do that, because I largely ignore the preseason projections some fans enjoy bantying about. That’s their fun, I suppose. That it doesn’t seem like fun to me sure doesn’t make it wrong.

I had this same issue as a Tigers fan with Brennan Boesch in 2010. At the beginning of the season, Boesch was a disappointing draft pick who, at age 24, finally started hitting for power in AA but still had a K/BB rate that precluded MLB success. Then he mashes in AAA for a 2 week stretch, Carlos Guillen gets injured, Boesch is called up, and he hits like a superstar for a couple of months.

Anyone with a rudimentary understanding of player projection knew Boesch wasn’t going to keep it up. His swing was too long to catch up to top velocity, he still wasn’t controlling the zone, he didn’t have good bat control, and there was nothing in his career to suggest stardom. But of course, people got excited about the next big thing, and you couldn’t tell anyone otherwise without being a hater.

I found that it’s important to frame the conversation in terms of addressing specific claims. Simply saying “this won’t last” every time he hits a homer is annoying, and not very productive. I rooted for him to get a hit every time, and I would have loved him to beat the odds and make it as a star. And let’s be honest, players DO blow away their projections on occasion. So there’s really nothing to be gained bringing it up in the general context.

Where the projection data should be employed is in conversations about specific conclusions. “Should the Tigers lock him up to a big deal now?” Obviously not, there’s no reason to think he’ll keep this up. “Should the Tigers try to trade him?” Probably not, no other teams are going to value him highly. “Is he the next Norm Cash?” Please. I think if you keep the conversation framed in those terms, you’re getting use out of the data without robbing the game of its charm.

Yes. This is the “great value” I referred to. Especially in light of teams now trying to lock up players to extended contracts earlier and earlier, to the point where I’ve seen some Pirates fans suggest extending Gregory Polanco before he plays his first MLB game. If lower-budget teams are going to go this direction, then they absolutely have to run their projections and make hugely important decisions based on them.

But fans have sort of taken this over and used it to … “bully” is too strong a word, but to harp on those who might prefer to be open to the surprising possibilities the game still occasionally presents. I’m usually OK if they use words like “likely” or “unlikely,” but it bugs me when people start stating things as absolute certainties.

Fantastic article! I’m greatly interested in projection systems; from how they’re built, to the math behind it. I was wondering if you could point a beginner in the right direction to begin finding out more in depth info? Maybe some books or something along those lines that I could look into. Eventually I hope to even get into coding a bit.

Fascinating read. Thanks for sharing your thoughts. As a quantitative researcher by trade and fantasy baseball player by hobby, I’ve considered dabbling in my own projection system. I will now stop considering that and just use ZiPS.

Great article. Thank you for sharing your insights.
Some questions about projections:
1. Playing time. Are all aggregate counting stats (e.g., HR, SB, R, RBI, etc) dependent on the accuracy of your playing time projection?

2. Is it possible to include some kind of uncertainty statistic at the player level or the stat level? That is, if certain kinds of players, like rookies for example, are really hard to get right, why include an estimated accuracy based on prior year(s) success or lack thereof? Similarly, if, for example, batting average is significantly more difficult to predict than, say, home runs, would including a confidence interval or some other type of uncertainty statistic be possible?

3. Are you suggesting in your point about age that it is has a non-linear effect on performance? Do you model it that way?

I just wanted to say that this was a terrific article. A little peak behind the curtain at one of the guys who’s producing the numbers. I will never frown skeptically at someone’s ZiPS projection quite the same way again.

its not that historical data sucks, its that back then they didn’t realize to collect all the data that is needed now. Not only that, even now there is only more data to collect that has not been on a consistent basis. This is a matter of education over time. You don’t know what you don’t know. this is the age of big data and much of the data points that are collected and their value are in the eye of their beholder. Depends upon ones perspective, the purposes it is put to, and one’s analytical viewpoint.