Overthinking It

The Players PECOTA Has Missed

Among the things a Baseball Prospectus subscriber might like to know, as we approach the midway point of the season, are the names of the players who’ve roundly beaten (or fallen fall short of) their preseason PECOTA projections, and the names of the players who will continue to do so. The first list of names is much easier to provide than the second. In Russell Carleton’s article today, he alludes to some relevantresearch by Mitchel Lichtman, who recently studied the subject of breakouts. Here’s how Russell explains what Lichtman did:

He identified hitters who had significantly outperformed their projections in April and then looked to see how well they did from May to September. He found that as a group, their subsequent performance was much closer to their projection than it was to their early-season hot streak, which held true even if he looked at longer stretches of overperformance to start the year. He found roughly the same for hitters who underperformed relative to their projections, and then found roughly the same for pitchers. His conclusion: Don’t get too wrapped up in an early season hot or cold streak. The player will most likely regress.

As Russell observes, while Lichtman’s findings apply to whole populations of players, certain individuals are able to continue to defy their projections for reasons other than random variation:

If we made a list of players who have exceeded expectations this year (pick whatever definition of that you want), most will probably revert to form, but some really are emerging from their chrysalis and have become beautiful butterflies. Let’s say that 10 percent of them are real breakouts (just picking a number). Saying “small-sample fluke” all the time will be correct 90 percent of the time. And only minimally useful.

The trick is to distinguish between the players who have achieved a true talent level significantly higher or lower than their updated projections indicate, and the players whose hot or cold streaks are mirages. Have I proved that I can do so consistently? Nope. Is there any reason to think that I can? Not really! But in the interest of diverting you from more tiresome concerns for a few minutes, I’ll give it my best shot regardless.

The following are the hitters (min. 200 PA) and pitchers (min. 50 IP) who have made PECOTA look silly so far (overperformers first, followed by underperformers). For each one, I’ll list the player’s actual True Average or ERA (through Tuesday), along with his preseason PECOTA and his rest-of-season PECOTA (which is updated daily and displayed on his player cards).Then I’ll tell you whether I’m taking the over or the under on that rest-of-season projection, with a brief explanation of my pick. At the end of the year, we’ll see how bad at this I am—or more accurately, how bad at this I was in one inconclusive sample. (If you like listening to things, you can hear me and Sam Miller performing much the same exercise on today’s episode of Effectively Wild.)

The rest of this article is restricted to Baseball Prospectus Subscribers.

Not a subscriber?

Click here for more information
on Baseball Prospectus subscriptions or use the buttons to the right to subscribe and get access to the best baseball content on the web.

Watching Valbuena pretty much every day, I think the BABIP might not be too much of an issue going forward. Almost every at bat he takes is solid at a minimum, and I believe his line drive % is pretty high. I could see him easily putting up a .260/.350/.440 line for the rest of the season and really giving the Cubs a long-term option at 2B - depending on how Alcantara advances and where Baez ends up.

Do the Carlos Gomez and Lincecum ROS projections give credence to the view that PECOTA places too much weight on older data? Have there been backtests done to verify how much older data to consider and how much to weight each year?

Leave it to Jean Segura (http://mlb.mlb.com/news/article.jsp?ymd=20130420&content_id=45334346&c_id=chc)...

It appears there was a typo, with Segura's value being entered along with Mauer's in his slot: ".306.260" - the values after Mauer were all bumped up one player, so the DIFFs were different. They've been shifted down a row now, and DIFFs should be okay.