Overthinking It

PECOTA Through the Looking-Glass

As backronyms go, “Player Empirical Comparison and Optimization Test Algorithm” is squarely on the intimidating side. That’s why it’s somewhat comforting that our BP-branded projection system also has a human face, even if that face couldn’t muster a very convincing mustache. For those who’ve joined us late, PECOTA’s name is jack-of-all-forecasts Nate Silver’s nod to Bill Pecota, a light-hitting utility infielder of the 1980s and ’90s whose scrappy play earned him entrance to even the most hardened of baseball analysts’ hearts, despite questionable artistic taste and the kind of stat lines that normally invite ridicule from the sabermetric set (especially when they’re associated with someone wearing a Kansas City uniform).

Odes to Pecota the player have already been written, so I’ll refrain from presenting a complete history, but we all know the type. Pecota played as many positions as he did seasons, putting Willie Bloomquist to shame in terms of positional flexibility (if not value) by spending time everywhere on the diamond that it’s possible to appear, with the possible exception—at least in some seasons—of the basepaths. Although he retired a year before the offseason that produced our first annual, robbing us of the ability to put something snarky about him on paper, we likely would have pointed to his utter lack of patience and power as compelling reasons not to employ him. Still, with 21st-century hindsight, we can say with some confidence that Pecota’s glove may have made him worth playing at times, since our new-and-improved implementation of FRAA gives him credit for 36 runs in the field, albeit with a margin of error of 23.

That said, it’s hard to deny that this action shot perfectly captures Pecota in his offensive element; all too often, his swings came up empty. PECOTA’s Wikipedia page tells us, “The acronym was actually based on the name of journeyman major league player Bill Pecota, who with a lifetime batting average of .249 is perhaps representative of the typical PECOTA entry.” That first clause is undeniably true; the second seems like supposition, especially with no citation to support it. Still, there’s something to that notion: While PECOTA wouldn’t have lasted as long as it has if it spit out Bill Pecota–like batting lines for Albert Pujols, on occasion its output comes closer to that lowly baseline than our instinctive (and often overly optimistic) internal algorithms might lead us to expect.

PECOTA’s pesky insistence on regressing to the mean would have made light of even its namesake’s occasional achievements had the system been around during his playing days. In Colin Wyers’ introductory article yesterday, he mentioned a set of simplified PECOTAs that had been used to assess the system’s historical performance. I asked him to pluck Bill Pecota’s retrojections out of that pile for me, in hopes of making a point. Take a look at a five-year snapshot of Pecota’s career, with 1992 retrojection included:

Like Michael Dukakis, Pecota tanked in 1988; the following season, he showed a bit of pop for the first and only time (in an extremely limited sample), and in 1990, he managed a league-average performance. In 1991, Pecota got off to a solid start and was unexpectedly installed as the Royals’ starting third baseman (over Kevin Seitzer, a far superior player) by manager Hal McRae, K.C.’s third skipper that season. He responded by posting career highs in every offensive category. Following that performance, it might have been tempting to forecast more of the same from the erstwhile utility man: Not only was he riding a three-year trend of improvement, but he’d undergone a change in role that could conceivably have boosted his confidence and better suited his game.

In retrospect, PECOTA wasn’t buying it, quite reasonably expressing some pessimism about the likelihood of a marginally talented player on the wrong side of 30 having established a new performance plateau. As it turns out, Pecota outstripped the projected decline, sinking all the way back to near-1988 levels of futility. As Pecota’s Icarus-like 1991 season suggests, it’s easy to convince ourselves that we know more about a player than a projection system without a mind of its own, especially when we have some personal investment in his success, and (as is often the case) subjective factors can be enlisted to bolster our argument.

Sure, maybe his manager’s vote of confidence or a more prominent role had spurred Pecota to new on-field heights; maybe his exodus from the only organization he’d ever known after the ’91 season contributed to his decline. Such things are impossible to rule out, but equally impossible to prove: Even if we came to some empirical conclusion about the typical response to a change in scenery or role, we couldn’t be sure that it would apply to any particular player. Given the difficulties involved, it may seem like the only responsible thing to do is to make like PECOTA and ignore the human element, but it’s hard (and perhaps ill-advised) to disregard unquantifiable factors entirely.

Last Friday, Tommy Bennettpenned a paean to the period of anticipation before a PECOTA release, when anything still seems possible and no cold, hard numbers have yet come forward to deflate our fondest hopes for certain players. The truth is that sometimes, much as we may value its point of view, PECOTA can be a bit of a buzzkill, the equivalent of the guy at the party who brings the proceedings to a screeching halt by asking the host to turn down the music, if not the neighbor who calls the cops. Regardless of the percentage of satisfied customers, each release is accompanied by at least a few assertions that we’ve underestimated certain players or shortchanged the efforts of the league as a whole, and this year has been no exception. In some past cases, we’ve found that we had overlooked something and that there were indeed tweaks to be made, which is why we welcome your feedback—it’s much like the internal testing we conduct prior to publication, but on a much larger scale. In others, though, PECOTA is simply doing what PECOTA does, with little regard for our feelings.

If you scan the PECOTA leaderboards of years past, you’ll notice that regardless of category, PECOTA injects statistical lithium into the coming year’s totals. For instance, here’s what PECOTA had to predict about the most prolific home-run hitters of the past three years:

Even though PECOTA underwent tune-ups under the hood in each of these years (if not quite as drastic as this offseason’s engine overhaul), the system stayed consistent in calling for 13 players per season to reach the 30-HR plateau. The ranks of the actual 30-HR clubs were considerably more swollen, although an offensive drought last season made things fairly close. We can see the same effect at work when looking at the best projected and actual individual home-run performances:

As Tommy pointed out, PECOTA’s conservative figures don't mean that outliers won’t appear; it simply means that PECOTA can’t see them coming. (In some senses, that’s a blessing, since it makes the season more interesting and saves us the trouble of freshening up Colin’s nutrient bath.) The numbers you see in the spreadsheets are weighted-mean projections, which PECOTA considers the most likely outcomes. We’ll soon be presenting percentile forecasts that offer some idea of a player’s chances of exceeding (or failing to meet) his baseline projection, but even then, we won’t be able to tell you with certainty which players will defy PECOTA’s expectations by laying waste to the league.

In some cases, players get lucky. In others, they simply cease to be unlucky, and in still others, their true talent level takes an unanticipated step forward. Once those seemingly anomalous seasons take place, PECOTA incorporates them into its projections for the following year and revises its estimates upward, but rarely anticipates a repeat performance, barring a favorable spot on the aging curve.

That doesn’tstop us from identifying players whom PECOTA might like more than the prevailing opinion, but where does it leave us with a few of this year’s trickiest test cases? Take Javier Vazquez (please). As someone whose ERA has routinely failed to match his peripherals (or more accurately, the peripherals we generally expect to predict ERA), Vazquez has come with plenty of baggage even at the best of times. Nonetheless, after Vazquez bought another ticket out of New York with an abysmal performance last season, PECOTA foresees a rebound to a sub-4.00 ERA and a healthy strikeout rate in Florida. Meanwhile, 2010 super-slugger Jose Bautista is projected to shed nearly half of his homers (which would still leave him with his second-best season to date).

In the case of each player, we can do more than simply throw up our hands and attribute last year's surprising performance to divine dice rolls. Vazquez experienced a sizeable velocity drop (whose effects can be quantified); Bautista made well-publicized changes to his stance and swing. PECOTA doesn’t know those things, but you and I do, even though we might not know their precise significance. Given the increasing granularity of baseball data capture, perhaps the passage of time and future additions to PECOTA’s code will make it possible to adjust the forecasts not only according to what numbers were produced, but to a greater degree, how they were produced. For now, feel free to indulge your inner PECOTAs, but remember to forecast responsibly.

Shouldn't you be able to aggregate the individual percentile forecasts to come up with better estimates for these expected leaderboards? The problem, as you have laid out, is that aggregating the PECOTA means implicitly assumes no variance, so it will always underestimate the tails (which is what any leaderboard represents). But the most differentiated attribute of PECOTA is that it forecasts variance, so BP is really in a unique position to make leaderboard forecasts. Maybe you don't think it's worth the time, but it would make for an interesting article. If nothing else it would address the "PECOTA thinks the HR leader will only hit 40 HR - that's crazy!" comments, which aren't accurate (PECOTA would say it is 95% confident that nobody will hit more than X HR, and X would be a much higher number than the highest mean projection). Just a thought.

I'm not sure I understand what you're looking for here. I mean, I could look at the leader board for the past five seasons and figure out a decent estimate for the HR hit by whoever happens to lead the league in HR. And I could then look at PECOTA forecasts and go "these are the 10 guys who are most likely to be that guy." But there's simply too much variance in baseball to be able to predict which of those ten guys is going to be THE guy and assign the league-leading forecast to him.

Basically if the weighted mean for 20 players is 29 HR, PECOTA isn't saying that none of these 20 players will hit 30+ HR. It is saying (more or less) that it would take the under on each of these player's individually. But if 10% of the time the HR prediction is exactly right, 45% of the time it is too high, and 45% of the time it is too low (for sake of argument) then the 20 players projected at 29 HR really means that PECOTA expects 9 of those 20 players to hit 30+ HR. It just thinks we can't know which 9.

If you take the top 10 guys PECOTA predicts, for example, on average one of them should hit their 90 percentile numbers. So if the smallest 90 percent number from the top 10 guys (say 48 HR) is bigger then the highest weighted mean number (say 45 HR) then even though PECOTA has no one projected to a weighted mean of more than 45, it may actually be predicting that the major league leader has 48.

In other words, simulate the season based on the PECOTA distributions and note things like number of players with 30+ HR and most HR and then publish those numbers.

This is helpful. I'm still struggling with this question: what does the reduction in the number of hitters with high "breakout" rates indicate? I recall a year in which Adam Dunn had something like a 44% chance of improving by more than 20% over his 3-year performance. It doesn't concern me that the weighted means might be low. It concerns me that indicators of potential outliers seem to be much more limited than in the past.

Something that would make for awesome reading and fun analysis would be a series of articles about the 90's and the 10's - players who surpassed the 90 percentile projection or failed to hit the 10% line in the previous season. Look for where the dividing line between fluke and breakout is, and likewise whether it is injury, old age or simply sucking at baseball at the other end.

Reply button isn't working for me, unfortunately. But Colin, I realize that it would be a little abstract in that your leaderboard projections wouldn't be tied to specific players. In fact they wouldn't really be leaderboards at all, but they'd be exactly the charts in the article, only on an apples-to-apples basis (the charts in the article are apples vs. oranges b/c you are comparing forecasts w/ variance = 0 vs. actuals w/ realized variance; just clarifying my point as I realize that you, of all people, don't need that explained to you). I just think a lot of people's first reaction when looking at PECOTA is that it's pessimistic, and such an analysis would do a good job highlighting that it's not, while also being an interesting check on PECOTA's percentile forecasts (by virtue of comparing the expected league-wide distribution to past results). So it's not really that I would hope to have a better forecast than the next guy on how many guys will hit more than 40 HR (unless someone wants to take a prop bet :-)), but just that it could help illuminate the PECOTA forecasts for people who don't understand this issue and be a fun article to read for those that do.

This is my first look at PECOTA analysis and I have to admit that I am a bit confused. How does this take into account a breakout year based on increased playing time or a new situation? For example, this years PECOTA lists Neftali Feliz as accounting for 5 W, 3 L and 6 S, is that because of the uncertainty of him being closer or is that because up until last year he had zero closer statistics to base the projections on? Similar situation with CJ Wilson, who is projected to save 9 games this year? I see no situation in which that happens, where does that projection come from?

Hi Daren,PECOTA just does the projections based on what similar players have done. PECOTA doesn't "know" that Feliz is most likely going to stay the closer, so sometimes for things like saves the numbers are a bit wonky. The depth charts would be a better place to check out what the expected saves totals will be for Feliz and other players.