Overthinking It

Ranking Rivera

the archives are now free.

All Baseball Prospectus Premium and Fantasy articles more than a year old are now free as a thank you to the entire Internet for making our work possible.

Not a subscriber? Get exclusive content like this delivered hot to your inbox every weekday. Click here for more information on Baseball Prospectus subscriptions or use the buttons to the right to subscribe and get instant access to the best baseball content on the web.

Recently, Mariano Rivera revealed that 2013 would be his final season. It wasn’t unexpected news, in that Rivera is 43 years old and coming off a serious injury that caused him to consider retirement in 2012. But the report, however predictable, hit many fans hard. Not only is Rivera respected and beloved both inside and outside of New York (a relative rarity for a big, bad Yankee), but he’s shown so little erosion in his skills that it’s possible to picture him throwing his cutter until he turns 50. Most players go through a decline phase, which gives us time to get used to the idea that it’s about to be over. Rivera really hasn’t, except in the sense that he’s less durable than he once was.

Rivera’s announcement inspired many written responses, one of which was an email to me from a reader named David Greene. “Rivera’s true ranking among pitchers all-time,” the subject line said.

I can't get my arms around the idea that 60 (or so) starting pitchers in the history of baseball are "greater" than Rivera, as career WARP stats would say. … So maybe the real answer to my question is how many relievers relative to starters ought to be included in any all-time team of 25 or 30 players? Is that a question for analysis or only for opinion?

How good is Rivera, really? And is it possible to compare him to baseball’s best starters?

Last season, Aroldis Chapman was probably the best reliever in baseball. (“Craig Kimbrel” is also an acceptable answer). PWARP put him at 2.6 wins, which made him the 27th-most-valuable pitcher in baseball, by that metric. He was so good as a reliever, in fact, that he nearly placed out of the bullpen, briefly becoming a candidate to start this season.

In 1996, Mariano Rivera was worth almost twice what Chapman was last year. That season, AL pitchers allowed 1.21 home runs per nine innings, the highest rate ever. Rivera allowed one home run in 107 2/3 innings, the lowest rate of any AL pitcher in the DH era. He posted the highest strikeout rate and the lowest FIP of his career, and he pitched almost 30 more innings than he has in any season since. Then he added 14 scoreless frames in the postseason. If you count those October innings, Rivera’s 1996 was the most valuable season ever by a pitcher who didn’t make a single start.*

*If you don’t count them, it was the second-best such season, behind Dick Radatz’ 1964. That year, Radatz, who’d converted to the bullpen after experiencing a sore arm as a starter, pitched 157 innings, all in relief, with a 2.29 ERA and 10.4 strikeouts per inning (more than a K/9 higher than any other pitcher who pitched at least 150 innings in ’64). Radatz’ 1962-1964 campaigns were three of the six most valuable relief seasons ever, worth a combined 13.2 wins. But like most successful relievers, he burned out after a few years. Supposedly, an attempt to add a sinker in 1965 irreparably altered his mechanics, hobbling his fastball. It’s always something. Well, almost always. We’re getting to that.

Rivera hasn’t come close to equaling the value of his 1996 season since. Here are the zero-start pitchers with the highest WARPs in each of the past 16 seasons:

According to WARP, Rivera wasn’t the most valuable reliever in baseball in any season from 1997-2012 (although he tied Lidge to the first decimal place in 2008). Granted, FRA (and, consequently, WARP) can be a bit myopic when it comes to pitchers who reliably allow low BABIPs, as Rivera does. And WARP treats a scoreless ninth the same as a scoreless first, so it gives Rivera no extra credit for his high-leverage outings. So Rivera may have been, and probably was, better than the next guy in at least a couple of those seasons, October aside. But there was no uncertainty in 1996: Rivera was leaps and bounds better than the next guy. His brilliance since has been less about being the absolute best in any one season than it has about never, ever being bad.

That’s not really Rivera’s fault. After the 1996 season, Yankees closer John Wetteland became a free agent and signed with Texas, and Joe Torre decided that he’d rather Rivera record saves than outs. Rivera never pitched more than 80 2/3 innings (in the regular season) again. In retrospect, 1996 is especially tantalizing: imagine how different bullpens might look today had Rivera remained in the 100-inning relief ace role and inspired copycats rather than staying behind the bullpen door in non-save situations. And imagine what he might have been worth to his team: he would have been immensely valuable as long as he sustained his success. But that’s the thing: we don’t know whether he would have lasted this long had he not been coddled as closer. In the ’96 Annual, we said Rivera looked “way too skinny to be durable,” and we probably weren’t the only ones who thought he’d have trouble repeating that season’s performance.

*Another interesting fact about the 1997 season, when Rivera became closer: that was also the year when he discovered his cutter. Which means that his best season—just about the best season—came without the cutter, the signature, semi-mystical pitch that gets the bulk of the credit for his success. If only Mariano Rivera hadn’t been cursed with a cutter!

Pure speculation: if he’d never come up with the cutter, Rivera might have had higher strikeout rates and been just as good or better for at least a few years. But no way would he still be pitching, let alone counted on to close, at age 43. His control could have compensated for some velocity loss, but the strain of throwing thousands of pitches harder than he had to with the cutter would have taken its toll.

Given the sample sizes involved in a typical relief season, you’d think Rivera would have had at least one year when a bunch of bloopers fell in, or a few extra balls flew out of the park, and his superficial stats suffered. But he hasn’t, except (sort of) for 2007, when he had a .322 BABIP and finished with the worst full-season ERA of his career: 3.15. (Jeff Reardon, Troy Percival, and Randy Myers, who rank seventh, eighth, and ninth, respectively, on the career saves leaderboard, all have career ERAs higher than Rivera’s ERA in his worst season.) Stephen Loftus took a look at Rivera’s consistency last week at Beyond the Box Score and found that no matter what metric he used to assess it, Rivera’s performance was less variable than anyone else’s. Great relievers come and go, but only Rivera endures.*

*That said, Rivera's retirement isn't a reason for Yankees fans to despair, given how sparingly he's been used in recent seasons. While it might be impossible to replace Rivera with a reliever as good, it's not that hard to find someone who can keep the fallout from his absence to a minimum, as Rafael Soriano did last season. Earlier this month, The Star-Ledger's Steve Politi wrote that "it is hard to comprehend a bigger challenge in sports history" than replacing Rivera, which seems a bit extreme, especially with David Robertson already on the roster. If the best reliever in baseball is a two-to-three win player, how big can the dropoff from him to, say, the 15th-best reliever be? Replacing Robinson Cano would be a much bigger challenge.

Rivera’s career overlaps almost perfectly with the 18-edition run of the BP Annual. It’s fascinating, now, to look back and see what we said before it became clear that Rivera was an outlier, not just a guy on a good run. Here are a few excerpts from his player comments, starting with Baseball Prospectus 1998—which, remember, was published after Rivera had followed the best relief season ever with another excellent one.

1998: “I think he needs a better second pitch and more work. … Without one, I think he’ll decline further this year.”

1999: “Something very bad is happening here. … I think Rivera is experiencing a loss in effectiveness, one that is going to start showing up on the scoreboard this season.”

2000: “…Can a pitcher survive on one pitch if that pitch is perhaps the best in baseball? I am still skeptical…”

2001: “The career paths of many top closers have included a high, relatively brief peak not unlike Rivera’s last five seasons. … At this point, Mariano Rivera doesn’t look much different than Gregg Olson after 1993 or John Wetteland after 1998, and no one is offering those names up for immortality.”

It wasn’t until 2004, after Rivera had been in the bullpen for eight full seasons, that we felt comfortable enough to say “expect more of the same.” And by then, Rivera was 34, old enough for us to start worrying in subsequent comments about when his body would break down. When we look back, it seems like a given that the Yankees could count on 70 or so dominant regular-season innings from Rivera, year in and year out, and that the cutter would never wear thin. But consistency is kind of like clutch: some players might have it, and you might think you can tell which ones, but it’s tough to know for sure until a lot of time has elapsed. And at that point, it’s too late to act on the information.

Predictable or not, Rivera’s consistency has been worth quite a bit: he’s the only pitcher since 1950 with at least 30 career PWARP and fewer than 269 career starts. (Sorry, Goose; you don’t quite cut it.) But 57 starters, from Roger Clemens to Camilo Pascual, rank ahead of Rivera on that list. It’s not hard to see why: including October, Rivera has pitched a total of 1360 2/3 innings in his 18 seasons. Even for a latter-day ace like Justin Verlander, that’s six seasons’ worth of work. You’d have to give Rivera a ton of extra credit for the timing of his appearances to make up for that difference in workload.

Maybe it’s because he didn’t have the durability or the arsenal to handle a heavier workload; maybe it’s because the era in which he pitched (and the managers he played for) artificially imposed a cap on how much he could contribute. Regardless of the reasons, Rivera’s career value can’t compare to that of even a very good starter. Nor can his peak value: even in 1996, when Rivera was the best a reliever could be, his PWARP placed behind those of several starters. But David asked a different question: Does Rivera (or any reliever) deserve a spot on an all-time team?

Let’s say this team would look roughly like most teams do today, with a five-man rotation and a bullpen seven or eight strong. And let’s say we’d select the five best starters in history—Cy Young, Walter Johnson, Roger Clemens, Randy Johnson, and Greg Maddux, maybe—to reprise their roles. How would we want to stock the rest of the staff, knowing we need the rest of the pitchers to go only an inning or two at a time? Would we want Nolan Ryan and Tom Seaver setting up for Pedro Martinez, with Lefty Grove and Steve Carlton coming in to attack tough lefties? Or would there be room for some real relievers?

To answer this question, we’d need to know how well all those starters would have pitched in short bursts. And we can’t know that, not really; we never saw them do it, so it’s impossible to say with certainty how they’d respond. But we can take what happens to the typical pitcher making the transition to relief and apply it to the potential members of our all-time team bullpen.

In a 2006 two-part series on pitchers who moved from the bullpen to the rotation (and vice versa), Nate Silver laid out this rule of thumb for estimating post-conversion performance:

The typical pitcher will have an ERA about 25% higher when pitching in a starting role than when pitching in relief. That is, if you take a given reliever with a 3.00 ERA, your best guess, all else being equal, is that his ERA as a starter would be 3.75.
…
If you take a starter whom you know nothing else about, you can expect him to knock off about 25% of his ERA when he pitches in relief.

Rivera has a 2.21 career ERA, pitching in a hitter’s park in a good offensive era, in a difficult division and against the DH. Factor in his 11 earned runs in 141 postseason innings (against even tougher competition), and his career ERA falls to 2.06. If any starting pitcher could match that, it would be the one with the best park- and era-adjusted ERA other than Rivera’s (min. 1000 innings): Pedro Martinez, who, fortunately for us, happens to have pitched at essentially the same time.

Pedro’s career ERA is 2.93. Cut a quarter from that, and you get 2.20—essentially identical to Rivera’s career mark in the regular season. But plug in both pitchers’ postseason innings, and the fact that Martinez pitched over half his innings in the NL, and Rivera retains a clear lead. Using Nate’s rule of thumb, then, we wouldn’t expect any starter in history to match Rivera’s career ERA in relief under identical conditions. And in that case, Rivera—and probably Rivera alone, among relievers—deserves to make the all-time team. Other consistently low-ERA relievers like Hoyt Wilhelm, Trevor Hoffman, and Dan Quisenberry don’t come close to competing with the best of the “converted” all-time-team starters. (Billy Wagner has the best case, among non-Rivera relievers, but he barely topped 900 innings.)

Of the many starters with higher career WARPs than Rivera, it’s likely that some of them would be better in the bullpen than he was, but it’s hard to know how many, or which ones. Nate’s findings suggest that the ones with the best out pitches would have an edge, but it’s tough to top Rivera’s cutter.

We can try to test this empirically. With the help of Ryan Lind and Andrew Koo, I identified every starter from 1982-2012 who made a regular-season or postseason relief appearance in a season in which he pitched at least 150 innings, started at least 90 percent of his games, and had a FRA+ of at least 120. This limited the sample to the best of the best: a handful of starters each season. Last year, only five pitchers would have satisfied all the standards: Verlander, Stephen Strasburg, Max Scherzer, Chris Sale, and Gio Gonzalez.

I ended up with a group of 19 pitchers, two of whom qualified in more than one season. The list reads like a who’s who of starting pitchers of the last three decades: Randy Johnson four times; Roger Clemens twice; Pedro Martinez, Bret Saberhagen, and Tim Lincecum during Cy Young seasons; Curt Schilling, Nolan Ryan, CC Sabathia, Roy Oswalt, Mike Mussina, and more. With standards so high, the result was a very sample: 29 appearances and 63 innings, or roughly as many as Rivera pitched in 2011 alone.

In those relief outings, the aces walked 4.3 batters per nine, struck out 12.4, and posted a combined 2.00 ERA. Essentially, they were almost exactly as effective as, well, Mariano Rivera. Keep in mind that we’re comparing starters at the height of their powers to Rivera’s entire career, and that those aces who did pitch out of the bullpen were likely the ones who were expected to take to it well. Considering that and the small sample, if my all-time team needs three outs, I’m still signaling for Rivera.

On the first episode of ESPN’s Behind the Dish podcast, Joe Sheehan told Keith Law, “If you had a choice between—just to pick a teammate—Andy Pettitte’s career or Mariano Rivera’s career, I think you take Andy Pettitte’s career.” The value stats agree; WARP gives Pettitte almost a 20-win edge. But when the two teammates hit the Hall of Fame ballot, perhaps simultaneously, Rivera will almost certainly be inducted immediately, while Pettitte’s candidacy could linger until his years of eligibility are up. Both pitchers, despite the different ways that they’re used, have essentially the same job—to get batters out—and Pettitte has retired many more of them. It’s not necessarily fair that Rivera will waltz into the Hall while Pettitte watches and waits, but no one will object. Rivera, after all, is the one with the case for the all-time team.

So, back to David’s question. How many relievers ought to be included in any all-time team of 25 or 30 players? Just one, I think—and it’s exactly the one you’d expect. Enjoy watching him while you can.

Thanks to Andrew Koo, Ryan Lind, and Colin Wyers for research assistance.

The words "discovered his cutter" in the article link to a story about how Rivera has failed to pass on his cutter to anyone else. His kind of control can't be taught, and evidently his kind of cutter can't either.

Am I right in thinking that WARP is more-or-less based off ERA? This being true, the offhand remark that "WARP treats a scoreless ninth the same as a scoreless first, so it gives Rivera no extra credit for his high-leverage outings" would seem to dismiss a critical factor.

Would it make sense to multiply pitchers' WARP by the leverage value for the situation they inherit?

"And WARP treats a scoreless ninth the same as a scoreless first, so it gives Rivera no extra credit for his high-leverage outings. "

Well, that would suggest that WARP alone is a flawed way to make the "value" evaluation. Rivera's leverage adjusted WARP will compare favorably to starters who pitch, on average, in dramatically lower leverage situations.

There's a philosophical argument about how much extra value is Rivera's there. For example, if the starter had prevented a run in the 1st (middle-to-low leverage) it would have had the same impact on the score as Rivera preventing one in the 9th.

It also gives the player credit/blame for something beyond his control. If the Yankees adopted a bizarre bullpen setup where Mariano Rivera always pitched the 6th inning if the game was close, he still would have to get 3 outs. Presumably, he would strike guys out, induce weak contact and keep his low BABIP, and suppress HR rates. How much "less great" would he be only because of his manager's odd decisions?

Should it? I'm not sure. If you go by FanGraphs WAR, the difference is almost 30 wins.

Rivera has pitched 1219.7 regular-season innings; Pettitte has pitched 3130.7. That's an enormous difference. Pettitte has pitched over 2 1/2 Mariano Rivera careers! Shouldn't that matter more than the fact that Rivera's innings mostly came toward the end of the game?

Maybe. But if his innings are highly-leveraged, say, twice as valuable as Pettitte's, then they're the equivalent of 2440 neutral-leverage innings. If Rivera allows runs at 56% the rate Pettitte does, and pitched the equivalent of 77% as many innings, it's not implausible that Rivera is in the same value ballpark as Pettitte.

It seems to me that the alternative is to "half count" Rivera's innings. He pitches only a fraction of the innings that Pettitte does precisely *because* they're high leverage. A closer can't possibly rack up the innings a starter does but the value (in terms of wins gained) of 80 high-quality high-leverage innings can quite plausibly be as great as 200 almost-as-high-quality medium-leveraged innings. Moreover, there's nothing to say that the guy providing the former couldn't as easily have provided the latter.

If the Yankees win 5-4 and Rivera pitches a clean ninth, his contribution to the win is significant, but no more so than that of most of his teammates - everyone who scored a run, stopped a run from scoring or drove someone in contributed similarly, as well as the starting pitcher and other relievers as well.

As a thought experiment, imagine Joe Girardi brings Rivera in to pitch a scoreless first against the Angels. Then, a 'setup' guy pitches a scoreless second. Finally, Andy Pettitte comes in, gives up 3 ER over 7 IP, and the Yankees win 4-3.

Here, everyone did the same job as in a 'traditional' 4-3 victory (*), but you're assigning less value to Rivera because his inning came first. Suddenly, Pettitte has a much higher 'leverage' despite throwing the same pitches and getting the same results. That's the disconnect.

Further, ask yourself if Rivera is more or less important than the 'setup' guy in this hypothetical game.

Now, there are real differences in pitching the 9th. The offense can alter their strategy to 'play to the score'. The offense can deploy pinch-hitters and pinch-runners aggressively. They can make substitutions with minimal concern for the resulting defensive implications. But I don't think this adds up to "twice as important" when the runs are added up at the end of the game.

(*): Actually, Rivera would have been forced to pitch to the top of the lineup, so he'd have a harder-than-average task. In a close game, this is an argument for having Rivera pitch the 8th if the #1 or #2 hitter is leading off, and letting the 'setup' guy handle the 9th against weaker hitters. Nobody does this, but it would help.

Has someone proven that the coming into the ninth inning with no one on base automatically deems it to be a high leverage inning? Or is that just an unchallenged assumption here?

If this has already been established please point me to the evidence.

I thought that coming into the seventh inning with one out and men on second and third would make it a much higher leverage inning. Along with many other situations which are far more likely to see a lead overturned than getting to start the ninth inning with a clean plate, which has a known success rate of 95%.

Everyone agrees there is such a thing as high leverage and low leverage situations. Everyone also agrees that WARP places no value on the difference. Until someone can conclusively argue there is no difference between high and low leverage innings in value, we can certainly say WARP is a flawed metric in evaluating relievers who have a large portion of their perceived value wrapped into the value of leverage.

We can discuss the relative value of leverage, which is underexplored, but to rely on a stat that implicitly assumes the value to be 0 seems to be severely flawed, again, unless you can prove pitching the 3rd inning of any random game facing the 7-8-9 hitters has exactly the same value as facing the final 3 hitters of a game when the opposing manager has every incentive to use the three most favorable, available, matchups against you, every time.

Adding to the problem is the simple fact that not every one of Rivera's outings, nor every one of his save situations, were high leverage. Though the 9th is probably, on average, the inning in which most high leveraged pitching occurs, but it is certainly not the only.

By not assigning value to leverage, the designers of WARP are essentially saying leverage averages out. I doubt this is actually the case, but I can imagine calculating a pitcher's total Leveraged Innings Pitched Score would be extremely difficult to do.

It would be interesting to see the Median and StdDev TAv against for Rivera and Pettitte for batters they faced during their careers, which could be enlightening and is probably possible in you guys' datasets.

You say the contribution of leverage to WARP is 0. That means either leverage is meaningless or that on average it works out to be the same for all pitchers (or the variance is negligible). If leverage is meaningful and it does not average out for all players, then there is a deficiency in WARP.

I suspect for starters, the influence of leverage pretty much averages out. However since relievers enter games under a wider range of situations and pitch fewer innings, I suspect the variance is greater.

When calculating WARP for pitchers, do you use different replacement level baselines for starters and relievers? Taken a step further, do you use different baselines for relievers who are primarily closers vs. non-closers?

Come on, people. We know that the idea of double and triple counting higher leverage innings just because of the way that modern bullpens are constructed is nonsense. Rivera wouldn't have been a significantly worse pitcher if he'd always pitched the 6th, 7th or 8th inning. Relievers are less valuable than starters because they pitch so many fewer innings - it's really that simple.

How has John Smoltz's name not entered this discussion? He was to a great extent contemporary with Rivera and was both an elite starter and an elite reliever.

Rivera was probably a slightly better closer than Smoltz, so if you don't mind extrapolating from one data point, you'd guess that Rivera would translate to a slightly better starter.

Of course, this whole analysis (mine as well as the one above) is predicated on the assumption that "starter" and "reliever" is the same job. To me, that's like saying "outfielder" and "shortstop" are the same job because they're both come to the plate and hit the ball. A 2.50 ERA from a starter is different from a 2.50 ERA from a reliever in the same way that an .850 OPS form a 1B is different from a .850 OPS from a SS. For position players, we make adjustments to their WARP based on their position. It seems that we haven't found an intelligent way to make a similar adjustment for pitchers, though.

Excellent point. I might expand on it say "why hasn't Eck entered the conversation?" (Aside from the post below where its pointed out he's in the HOF) Eck was a very mediocre starter before he became a HOF reliever.

A starter pitches three times as many innings as a closer, and "leverage" can't possibly make up for that because starters face more leverage too. Roger Clemens once said the two most important keys to winning were shutting down opponents on the first inning and shutting them down right after your team scores. Getting ahead early sets the tone for the rest of the game, therefore the first inning just as important as the ninth inning. Same for the inning after scoring, as momentum is a key to victory. Then there's those games where the other team's most dangerous hitters come up in the seventh or eighth inning as opposed the ninth. There's a BP book with a chapter that points out this inning rather than the ninth is the better place for your best reliever. That's three leverage situations more likely to go to a starter than a closer, so starters not only face three times the innings, but three times the leverage.

Starters can't throw as hard because they need to last six innings, and they need a third pitch because getting the same hitter out three times a day is more than three times harder than getting him out once. Closers go one inning so they can go all out and usually only need two pitches. Saying the ninth inning is more important than the other eight innings is like saying home runs in the ninth count more than homers in other innings.

Eckersley was a very good starter, especially early in his career. He finished 4th and 5th in WAR, leading the league among pitchers. He had ERA+ of 144, 139, and 149. The problem is it's a little hard to figure out how big an effect drinking had on his performance after that. His relief career came after he was sober, it's hard to know whether he might have also been a Hall of Fame starter had he stopped drinking at 26 instead of 32.

The nice thing about putting Rivera in, as opposed to Sutter, is that it doesn't open the door for anyone else. No other reliever could convincingly claim that because Rivera is in, he deserves to be too.

I saw Dick Raditz pitch when he was with the Red Sox from 1962-1966. His "greatest year" was 1963, not 1964. Raditz began to be more hittable in 1964 (and I don't just mean John Callison's All Star Game winning homer off of the "Monster")
We are talking about a much different era in baseball history,
although by 64 most teams had at least one pitcher to close out games. The great thing about Radatz was his control and his pitching motion which made it difficult for hitters to pick up his fastball

It may be a little simplistic but Rivera has the second highest career WPA of any pitcher, behind only Clemens. The rest of the top twelve goes: Johnson, Maddux, Martinez, Mussina, Smoltz, Halladay, Schilling, Hoffman, Gossage, Eckersley. Seems on its face to be a pretty fair ranking.

I think you'd have to compare WPA to a "closer" baseline, rather than a generic reliever baseline. IE, how much better was Rivera than if a replacement level reliever would have been in the same high leverage situation?

I'd like to find out what the Yankees record is going into the 9th in a save situation over the Rivera era compared to other teams. Then we'd have a better idea just how monumental the task of replacing is.

WAR is a statistic that aggregates all contributions over the course of the year. For most players, however, actions they take in the course of a game do not contribute to a real victory. If a batter, say, goes 4 and 4 with 2 home runs and the team loses, the fractional WAR value from that game contributed by that player is unrelated to an actual victory.

This is where closers are different. By definition they are being brought into a game in a situation where ANY fractional WAR they collect (either positive or negative) contributes directly to a win or loss.

This is why Rivera was so devastating. He is/was a magnificent pitcher AND those very valuable innings were being focused directly where they counted.

Yeah, but given that closers are relatively replaceable, it doesn't make sense to give them huge credit for getting the last three outs. If you take any average or better MLB pitcher, tell them they only have to go one inning and see what happens, you have a pretty good chance of getting a scoreless inning out of them.