What’s in a projection system?

You might have noticed throughout our 2010 season preview series that we’ve been combining various projection systems at the end of each article. Each system handles projections differently, so we wanted to get an average, just to see if we can weed out some more bias. Yet the same caveat still applies to the averaged projection: it’s just a projection. It’s not saying that this is how Player A will hit in 2010. It’s saying that, based on the methodology, this is the best idea we have of the player’s potential production.

As John Sterling often says, you just can’t predict baseball. There are so many moving parts, so many variables, so many unknowns that predictions simply cannot take them all into account. You can compensate for the unknown, but you can’t factor it into predictions and projections. Thankfully, projection systems aren’t trying to predict anything. Instead, they’re taking the available data and putting it through a process which outputs its best idea of a player’s future performance. But, because of all the factors it cannot consider, these projections are often inaccurate.

Then why have them? Because it’s better than assuming a player will repeat his numbers from the previous season. Few players put up the same numbers year after year. Production fluctuates. Players get unlucky and players go on hot streaks. A pitcher can throw a perfect game and then allow five runs in his next start. Projection systems try to smooth this out, taking all available data and processing it in order to give us an idea of a player’s next-year production.

Projection systems have their biases, too. PECOTA, for example, hammers older players. That does make a degree of sense, because production tends to decline as a player ages. Not every player, though, declines along the same path. So when PECOTA projects a rough year from Jorge Posada, it’s just using the data available as it relates to the player, a 38-year-old catcher. That doesn’t mean Jorge will necessarily decline along the same lines.

All this is to back up Rob Neyer‘s disbelief in the projections for a few Yankees’ veterans. There’s an article, written by a notorious pot-stirrer, that basically says, “here are the PECOTA numbers, the Yankees could be in trouble.” It’s pretty benign, really. PECOTA projections have been out since late January, so we’ve all had a change to look over them (or at least those of us with a BP subscription). We know that other projection systems aren’t bullish on the veteran Yankees. Yet I’m with Neyer when he says

I don’t know enough about the guts of PECOTA to rail against it. Instead I’ll just say that I don’t believe that Jeter is going to steal 10 bases this season, and that I don’t believe Mariano Rivera will save only 22 games. I will say, too, that if your system says those things, it’s probably worth checking under the hood just in case one of the belts is running a little loose.

In other words, projection systems use general principles to project individual players. While there’s certainly merit in the exercise, it sometimes can’t nail down the outliers. That might be what’s at play here.

in a sport where the more info you accumulatethe better you can infer things I love the fact that there are multiple projection systems out there. The fact that is impossible to predict a player, especially based on opnly past performance, is irrelevant. it is a guide to go with when making decisions. one small part of many.

http://mystiqueandaura.com Steve H

You’ve done it now………

Stephen in Iguazu

so you want mryankee to come out then?

http://mystiqueandaura.com Steve H

Thankfully, projection systems aren’t trying to predict anything.

Repeated for extreme emphasis, pre-emptively.

Chris

Everyone says that, but that’s just a cop out. Projection systems do try to predict things. They’re not perfect, but they’re still better than the alternative.

http://mystiqueandaura.com Steve H

I’m repeating in more in the sense that it’s completely data related. It’s not “I don’t like the Yankees, so I’m predicting regression for Mariano Rivera.” Rivera likely has been predicted for regression for the last 5 years, and they don’t adjust based on him, he’s the outlier.

To the model, Mariano Rivera is not a pitcher for the Yankees, he’s just another faceless number in their system.

http://www.pinstripepalace.blogspot.com Brien Jackson

I think Mo is basically the exception that proves the rule; projection programs do a very good job given a large amount of data to work with. They don’t necessarily do so well with extremely unique cases like Mo.

http://mystiqueandaura.com Steve H

Agreed. You can look at about 95% of Hall of Famers and say the same thing, the absolute best of the best will not fall within the normal data. That’s why they are the best of the best.

This is something that has always (mildly) bothered me: a projection system that compares modern-day players, who are massively ahead of old-time players in training, diet, medical care, etc., seems to me to have a bit of a built-in incorrect bias (using “bias” in the statistical sense).

Just think about how players used to have to work day jobs in the off-season and came to camp out of shape, versus the modern-day player who takes a few weeks off in November, but otherwise works out all year long.

Although there may be a correcting bias from the steroid years, I fear it’s too small to make a difference to my point: modern players have the potential to last several more productive years than their past peers, which should skew these projection systems.

Has anyone looked at this in a scientific way? Is this accounted for in a way I am unaware of?

Chris

Does anyone know the players used in the projections in order to estimate the performance for older players? Using players from past decades/eras does not seem reliable given the better training, fitness level, and sports medicine in practice today. I’m not saying they have no relevance, but players today can lengthen their careers through better prevention, treatment, nutrition, and training.

Wouldn’t comparison of actual outcomes vs previous projections be useful in judging the value of these projections in general?

In reality, I’m not sure how much a projection system can tell you if it’s just calculated on a stand-alone basis vs. similar players in the past. The value would be increased, imo, if the situation is calculated. For instance, a given player is going to do better, I’d guess, if he has 2 batters in front of him who see 12 pitches, rather than 3 or 4. Nick Johnson, for example, probably will see more fastballs than normal, with Tex & ARod behind him. So that’s going to factor into his contact rates, etc.

Accent Shallow

What you’re describing is far more intensive than the current projection systems, which is fine. Unfortunately, the more moving parts in a system, the more that can go wrong. Simply contrast the Marcels (which are a weighting of a player’s three previous seasons) with PECOTA (which uses comparables, ages, etc etc) — the results are closer than you would think.

http://mystiqueandaura.com Steve H

Until we can get projections for those instances, I just don’t see how we’re going to get the game played only on spreadsheets. ;)

Accent Shallow

Neyer’s a pretty smart guy, for someone who likes flannel so much.

king of fruitless hypotheticals

That’s BS. I predicted baseball last year.

Ask any of my friends–told them all the Yanks were gonna win it.

Too easy.

Geek

If the projections systems had validity then handicapping horse races would be a snap.

Accent Shallow

Actually, I was looking at some neural net software awhile back as a new way to do projection systems, and one of the major application examples was handicapping horse racing.

So maybe it is, and you’re right ;)

WIlliam

FOund this touring the interweb.

PECOTA projected the following for Jeter and Posada in *2009*:
Jeter: .288/.353/.383
Posada: .249/.336/.406 (7HR and 33RBIs)

‘Nuff Said

/WIldly boversimplified and selective examples

WIlliam

But still, that does show that while Pecota can be good for 3oish players, anyone above 35 does not collapse, which brings Pecota down.

Accent Shallow

But those look like reasonable projections after their weak 2008s.

Well, “reasonable” in that “if further decline occurs.” Since both players were playing through injury, it’s not unreasonable to expect them to bounce back after either surgery (Posada) or rest (Jeter).

Since PECOTA doesn’t know that the poor seasons are due to injury, it’ll just think that this is the end of the line.

WIlliam

Posasa didn’t decline that much that a sib-2ho avg and 6 homers were expected.

WIlliam

noot 2ho, 250

WIlliam

wow, spelling fail today

http://Nytimes.com Gardimentary

Pecora useless trash.

http://Nytimes.com Gardimentary

Pecora is useless trash.

Instead of this geek crap, just use what you know of the game and it should be pretty easy to determine that Mariano will save 40 games.