Prospectus Perspective

Racing for the Cy

the archives are now free.

All Baseball Prospectus Premium and Fantasy articles more than a year old are now free as a thank you to the entire Internet for making our work possible.

Not a subscriber? Get exclusive content like this delivered hot to your inbox every weekday. Click here for more information on Baseball Prospectus subscriptions or use the buttons to the right to subscribe and get instant access to the best baseball content on the web.

It’s hard to believe that the regular season is almost over, as it feels like yesterday that Ubaldo Jimenez had his sub-1.00 ERA and Roy Halladay tossed his perfect game. The year has flown by, and as we enter the final week of the regular season it is impossible to avoid discussions of potential award winners. Today, we will focus on the Cy Young Award, as this was the “Year of the Pitcher” after all, and in addition to a couple of top-notch candidates in both leagues, there are a bevy of pitchers whose numbers may have merited more serious inclusion in another year. Unfortunately, discussions of the award tend to veer off in different directions because the award itself is perceived as ambiguous. Before getting into any conversation centering on either who will or who should win the award, who is supposed to get the award?

Realistically, the award is used to honor the best pitcher in his particular league in a given season. Now, how do we define “best”? Cue the convolution. For the most part, voters tend to use wins and earned run average. The former doesn’t tell us anything of tangible value, while the latter is merely a useful tool in shaping the resume of a pitcher, not the end-all tool. Using a statistic like wins above replacement isn’t entirely valid either, as it includes the batting and fielding components; does the difference in the batting lines of Adam Wainwright and Roy Halladay have anything to do with their pitching attributes? Yes, Wainwright might stay in games more because he can handle the bat, but his opponent’s TAv is what is of interest, not his own.

Beyond WARP or similar all-encompassing metrics, is it better to gauge value using actual run-prevention measures or the stats that hold more predictive value like SIERA? Again, this is integral to determining who is the best pitcher, but nowhere near the vicinity of clear-cut. If we go too far in either direction we run the risk of caring too much, or too little, about the defenses behind these hurlers, as well as the luck associated with their lines. Luck has varying degrees, but right now we just do not know with real certainty how to make the differentiation between the different forms. For instance, Francisco Liriano has terrific peripherals but his ERA is four-tenths of a run higher than his SIERA. His numbers are inflated by a high batting average on balls in play, but how do we properly account for the number?

We don’t know if he is being penalized by bloops and ducksnorts on solid pitches, or if the opponents are just taking advantage of his mistake pitches at a higher rate. While both scenarios result in added hits, the former scenario is probably less likely to persist. Completely eliminating the results of the inflation doesn’t seem right, nor does including everything. Lastly, too many people get caught up in who will as opposed to who should win the award. Why spend time trying to get into the minds of voters? Is it really interesting to figure out which pitchers will have their wins overvalued? Personally, I don’t think so, and for that reason I decided against running a regression in order to determine which three or four variables have the strongest relationship to an award win.

Discussing the best pitchers should involve discussing the best pitchers. If the “will” and “should” happen to be one and the same, wonderful, but they aren’t always intertwined. To that end, I chose seven statistics that summarize the performance of a pitcher, and ran the pitchers in each league through a simple ranking system. As in, the lowest score would provide us with the best pitcher in each league, as he would have been closer to the league lead in each category than anyone else. The only pitchers I excluded were those with fewer than 150 innings, as I wanted to restrict the sample to starting pitchers who have pitched for the entire season.

The statistics of interest were innings pitched, ERA, FRA, SIERA, WHIP, SNLVAR, and SNWP. Yes, there are other stats that could have been included, but I felt this septuplet got the job done. Quantity doesn’t imply quality, but if I had the choice between two pitchers with identical SNWPs, with one having thrown 40 more innings, my selection would be very easy. The triumvirate of run-prevention marks gives a broader scope of skill than just ERA, and for those unfamiliar, FRA is essentially ERA with an adjustment for inherited/bequeathed runners. This way, pitchers aren’t under- or over-credited based on the skills of the succeeding relievers. Lastly, the support-neutral metrics paint a picture of skill independent of the run support received while adjusting for the strength of the opposition. Here are the results for the National League:

Starting with the senior circuit, the numbers confirm what everyone likely expected: it is Halladay, Wainwright, and then everyone else. Halladay and Wainwright are tied for first, with each pitcher ranked near the top in all seven categories. Interestingly, Wainwright has a smaller deviation in his ranks; he finished second, third, or fourth, while Halladay finished anywhere from first through fifth in the categories. Without any type of context, this looks to be a wash. In cases like that, qualitative tiebreakers are fair game, and Halladay performed this well for a division champion in dire need of consistent value with all of its injuries, while also throwing a perfect game. Either one of these pitchers deserves the award, but Halladay seems to deserve it just a bit more.

One of the most interesting tidbits on the National League list is the third-place finisher: Roy Oswalt. Raise your hand if you thought he would be third. OK, you in the back, stop lying. Oswalt has been sensational since joining the Phillies, and while the differences between he, Tim Hudson, and Josh Johnson are relatively small, it’s great to see a strong rebound after 2009’s injury-plagued mediocrity. Johnson fell victim to the injury bug and secured the ERA title, though if he had remained healthy he may have been able to give Halladay and Wainwright a run for their money for the Cy Young.

The rest of the list isn’t really that unexpected. Cole Hamels and Matt Cain have been brilliant. Jimenez has stayed fantastic for the entire season. Mat Latos is already an ace, regardless of innings restrictions. And if not for the divorce of their owners and their pitiful season, the Dodgers’ Clayton Kershaw would be garnering much more attention for his efforts. At the end of the day, my vote for the best pitcher in the National League this year would go to Halladay, but I wouldn’t exactly lose any sleep if Wainwright took home the hardware a year after he was arguably the best pitcher in the league and lost by a hair. Incidentally, Halladay and Wainwright represent a meeting of both sides, as they top everyone else in both basic and advanced metrics. Everyone agrees these two have been the best.

The American League results bring up one of the most debated questions of the last few weeks: Felix or CC? A quick look at the data not only indicates that Felix Hernandez has been, far and away, the best pitcher in the league, but that CC Sabathia isn’t second on the list. He isn’t third on the list. Heck, he isn’t even fourth. Now my system here wasn’t formulated by NASA engineers, but it is a decent proxy for success, and what is undeniably clear is that Sabathia ranks first in only one category: wins. He leads in a category that is about as overhyped as the movie Donnie Darko, and that’s it. Now, his numbers are not poor, but they are nowhere near as great as Hernandez’s, and giving him the award would be reminiscent of Bartolo Colon beating Johan Santana in 2005. In fact, check this out:

By all accounts, Colon’s winning the award was a cheap one given that Santana had a superior season in every conceivable way aside from wins. I would argue that the gap between Santana and Colon is smaller than the one between Hernandez and Sabathia, and SNLVAR agrees. I haven’t weighed in on this subject in a while, either here or on Twitter, but I have to say that, hyperbole aside, the vote for this year’s AL Cy Young award has the potential to be one of those monumental moments in baseball. If Hernandez wins the award, which he clearly deserves without any debate, it will mark a true changing of the way people think about value in baseball which is, for lack of a better description, incredibly cool.

When I first started writing, my goal wasn’t to be smarter than anyone else or to make fun of the mainstream media. My goal was to help anyone interested understand the best ways to assess value. Wins, very clearly, are nowhere near the top of that list. And while I still think it looks good when someone wins 20 games or sports a 19-7 record, I understand that the perceived value is much greater than the actual value. The only quantitative advantage Sabathia has on Hernandez is in the wins department. Everything else is qualitative, such as his playing for a contender.

A win by Hernandez with a .500 or worse record—I actually hope he pitches another eight inning, one-run gem of a loss so he finishes 12-13—would show that more and more people are coming to terms with the fact that receiving run support from an offense in no way makes a pitcher better or worse than another. Hernandez has had a much better season than Sabathia and should be honored as such. For those who argue that, if not Sabathia, it should be David Price, well then I would agree—if the contest were to name the fourth- or fifth-best pitcher in the league this year.

Unlike the NL, the American League race represents a parting of the ways, as the advanced and basic stats disagree. Fortunately, everything other than wins is so overwhelmingly in favor of Hernandez that to ignore his qualifications would be akin to admitting sheer ignorance. I am not expecting every writer in the country to suddenly start using SNWP or FRA in their stories, but I have yet to read or hear any argument favoring Sabathia that does not hinge on his “being a winner” or other such garbage. It has become so tough for me to understand why certain people will force that puzzle piece to fit, just to try and keep the “wins” stat afloat when every other piece of evidence points them in a much different direction.

At the end of the day, the best pitchers in the National League are Halladay and Wainwright. I would vote for Halladay first, but Wainwright is tremendous as well. In the American League, I would vote for Hernandez (shocker!), and my second place vote would go to Jered Weaver who, ironically, is excellent behind a 13-11 record. Sabathia would maybe go fifth on the list. This isn’t a Halladay-Wainwright situation at all. We will have to wait and see how things shake out, but if Hernandez does not take home junior circuit hardware, we’re going to have one sad Seidman on our hands.

Eric Seidman is an author of Baseball Prospectus. Click here to see Eric's other articles.
You can contact Eric by clicking here

Buchholz had more than 150IP at the time of this article (173 now), is there a reason he isn't included? I imagine he would have to rank in the 5-7 range for the AL even though his SIERA is terrible. He's top 5-7 in every other category...

I can't even read anything about this non-debate anymore without thinking of http://deadspin.com/5645269 and cracking up. "You are just staring at a pipe and telling me it isn't a pipe. Which makes you a great surrealist, but also a dummy."

Where are you getting, "If you think about level of competition, I think its fairly clear that Lester has been better than Hernandez," from?

If you look at BREF's neutralized statistics, they are close. Even with the hard data, Felix pitched 134 IP against teams over 500, whereas Lester threw 130 (for whatever that is worth, not much). I wouldn't penalize Felix for throwing an additional 40 innings against weaker competition.

Personally, I'd give the nod to Felix due primarily to those additional forty innings, but slightly better K/BB of 3.34 to 2.82 pushes it well into Felix's favor, for me at least.

Lester's win total could still wind up as bright and shiny as Sabathia's, and if the writers must find a compromise candidate with lots of wins instead of voting for the real best pitcher in the league, he would at least be a better option for them than Price or Sabathia.

In general, I rail against the mainstream Cy Young voting in particular. But this year I actually see the case for CC or maybe Lester. The AL East is superior to the AL Central and West that the difficulty of the opposition faced by the pitchers in that division needs to be a factor when you consider how to vote.

The problem is that "division pitched in" is a poor substitute for "difficulty of opposition." If you want make that argument, I think you actually need to go through and compare the actual strength of the opposition in each game pitched.

BP calculates the Opp_Qual_OPS of Felix's opponents at .728.

For Lester, it's .731.

For Sabathia, it's .719.

For Price, it's .742.

Price has clearly faced the best opposition using this metric, and the difference between Hernandez and Lester is negligible.

Does the .01 difference in quality of opposition faced really negate the clear superiority of Hernandez's numbers? I think not.

I agree that the argument that Felix has faced lesser competition falls apart upon investigation. The leading run scoring team in the AL is the Yankees. Felix has started against them 3 times (and dominated them each time). Obviously, CC has started against the Yankees 0 times.

I wonder if Lester will then benefit in the CY race for the same reason Francona has been benefiting in the MOY race: in that, without him (them) the Red Sox would have been rendered irrelevant in July rather than even today maintaining a hair's breadth-thin possibility of playing beyond Sunday.

I would like to know what the park adjustments are, tho'. I recall that Bill James a few weeks back for this season had Yankee at 1.2 and Qwest at .8, a humongous difference which for James had CC just barely behind Felix, with Lester actually a smidge in front of both.

reading BP's definition of the stats, I don't get any indication that they are park adjusted. SIERA states that the park effects are eliminated by virtue of the fact that it is based only on strikeout, walk , and ground ball rates. But the glaring error there is that strikeout and walk rates are still affected by park.

Actually yes. Foul balls that are caught shorten at-bats. Foul balls that reach the seats lengthen at-bats, and some end up as K or BB. Likewise, fly balls that are caught bring fewer batters to the plate, ones that fall in or go over the fences bring more batters to the plate.

I don't know the math to estimate how big an effect that is, but I've seen that it is at least measurable and reasonably persistent.

My biggest problem here and with the whole AL side of this discussion in general is this statement:

"If Hernandez wins the award, which he clearly deserves without any debate, it will mark a true changing of the way people think about value in baseball which is, for lack of a better description, incredibly cool."

The stat community seems to have adopted Felix as their poster child this season, partly because if he wins they will feel validated. The problem with this is that it ignores other very real stats and some context as well. I am always a bit leery of using sheer numbers in any analysis. They are a good tool. Some of the newer stats are very good tools and some are a lot better than others. My problem comes in knowing that all raw numbers are generated in a particular context. Sometimes the new stats and those who develop and use them ignore this context and sometimes they simply don't understand it because context is not something that has a button the calculator. I'm not trying to demean the newer stats by saying that, just trying to show that there is always something more than sheer numbers no matter how you try to combine them.

One small example is that we are told not to use wins against Felix as he pitched for the Mariners, while we then are asked to consider him as the leading candidate (without debate) using stats that are at least in part a function of him pitching for the Mariners. His ERA and WHIP are both functions of both his home park and the fact that the team for whom he pitches was built this season to be able to defend.

Also, Cliff Lee did not have that much trouble winning for the Mariners this season. This is one aspect of the debate that I have not seen anywhere else. Lee was 8-3 at the time he was traded, not even halfway through the season and having missed some time at the start of the season.

The fact is that, as a number of previous posters have pointed out in bits and pieces, there are a significant number of AL CY candidates and that if we look at all of the numbers and try to synthesize all of the old and new stats, the race is quite close. Any of the guys listed in the article are pretty legitimate candidates and all have some set of stats that recommends them. Some even have also won a fair number of games.

Bottom line is I'd go with Lester as the guy who best put it all together, but not by any large margin. Weaver, Price and Felix are the next tier with Verlander, Liriano and CC in the next group.

I agree that giving the award to CC simply because of the wins would be a travesty, but it seems disingenuous to acknowledge Felix as the only true worthy candidate and continuing to do so, especially if he does not win, will likely do even more to expand the chasm between those who disdain the newer stats and those that embrace them.

well, you haven't seen the the 'lee won with the mariners' aspect of the debate anywhere else, because it's ridiculous. Lee didn't have any special ability to make his team score more or play lesser opponents when he pitched.

I think this response emphasizes my problem with the whole approach these days. We have such a proliferation of stats now that it is so much easier to pick and choose those that make the case you are trying to make. If there are not enough to do that, then you can simply create some new stat to do the job. If both those approaches fail, you can simply pretend that the facts are different than they actually are.

In the time that both Hernandez and Lee pitched for the Mariners in 2010, the team scored 4 or more runs in 7 of Lee's 13 starts (54%). He was 7-0 in those games. The team scored 4 or more runs in 11 of Hernandez's 18 starts (61%)and he was 6-0.

The Mariners scored fewer than 3 runs in 5 of Lee's starts (38%)and he was 1-2 in those games. They scored fewer than three in 6 of Felix's starts (33%) and he was 0-4.

Let's review. The team scored enough runs to win more often for Hernandez yet he won less. They provided Lee with poor run support more often and yet he found a way to win one of those games.

Lesser opponents? Lee faced Texas (twice), Tampa (twice), San Diego (twice), Minnesota, Cincinnati and the Yankees in those 13 Mariner starts. Felix faced Texas (three times), San Diego (twice), Minnesota, Cincinnati and the Yankees in his 18 starts during that time. Lee faced quality opponents (eventual playoff teams) in 69% of his Mariner starts. Over the same span, Felix faced equal quality in just 44% of his starts. Lee was 8-3 in 13 starts. Felix was 6-5 in 18 starts.

So you are right. Lee did not have a special ability to make his team score more or play lesser opponents because neither happened during that time.

Scott, you can break it down like that and come to the conclusion that 'winning' for the Mariners isn't that difficult. But let's dig a bit deeper...you write:
In the time that both Hernandez and Lee pitched for the Mariners in 2010, the team scored 4 or more runs in 7 of Lee's 13 starts (54%). He was 7-0 in those games. The team scored 4 or more runs in 11 of Hernandez's 18 starts (61%)and he was 6-0.

Alright, so in Felix's ND's what happened? Well in the first two the Mariners scored 2 runs (in a 5-3 win) and 3 runs (4-3 win) in the 9th inning, after Felix was out of the game. In a 6-5 loss Felix saw a reliever give up 5 runs in the 8th. In another 6-5 loss the Mariners pen gave up 3 runs for the loss. And in a 6-4 loss the pen gave up 4 runs. So much for all those ND's.

Also, Felix has given up 2 or fewer ER's in 25 of 34 starts. He's 13-5 in those games, giving him 7 ND's in those starts. CC has 17 starts (out of 34) of 2 or fewer runs and has gone 14-1, with 3 ND's.

Give Felix that same ratio, while leaving the rest of his record untouched, and he'd be 21-8. Just saying.

You're leaving out some important details. Lee essentially only pitched May and June for the Mariners. May was Felix's worst month. June was Lee's best month.

Or see it like this: Felix had four games this year where he gave up 4 or more ER. Three of them are in the game set you've selected -- May 1, May 7, and June 8. Lee gave up 4 or more ER in ten games -- only two of them in Seattle. (And just to note, Felix did in two road games; 4 of Lee's 8 4+ER games were on the road.)

The logic you're using is similar to Jon Paul Morosi's a few weeks ago when he said that Felix didn't deserve the Cy Young because he didn't win on May 1, when the M's were 1/2 game out of first, at the start of an 8 game losing streak. Of course, he leaves out that the M's were also under .500 -- as was the whole division -- and the complete lack of offense was already killing the team. That Cliff Lee had his second best month in May couldn't save them. (And I love how Morosi conveniently leaves out that Lee had one of his worst games of the season on May 5, extending the losing streak).

Is Felix the only candidate worthy of the sabermetric darlingship? Of course not. Lester had a slightly better xFIP, Lee is .2 better on fWAR, Liriano has some good arguments. But Felix represents everything wrong with the most useless of pitching stats. No CC-lover could argue against Lee or Lester or Liriano having good seasons, but Felix only being 13-12, well, he clearly isn't in the same league as CC.

Saying it will "expand the chasm" is also silly. You hear a lot of writers actually coming around because they see the clear difference.

The one point I really agree with in frugalscott310's comment is that what seems to be a bit glossed over is that we're giving Hernandez A LOT of credit for statistics that are not taking into account home park. There is (obviously) much debate about how much home park impacts a pitcher's results but the more you accept park effects the better the case gets for Lester or Sabathia vs. Hernandez (or someone else). As Richie pointed out in another comment, Bill James has very heavy park factors in his analysis - enough to knock Hernandez out of prime the position.

The point I'm making is that I don't think Hernandez winning is a clear issue without debate - I think he's a fantastic candidate and easily one of the best pitchers in baseball but I don't think it's such a foregone conclusion he was the best pitcher in the AL this season. Every year our analysis becomes more refined and advanced: ten years ago a player like Carl Crawford, Andres Torres or Brett Gardner - those who derives a lot of value from defense and base running - would be underrated as those aspects of the game weren't evaluated as effectively as they are today; whereas, a player like Adam Dunn would have been a bit overvalued by most analysis as not enough emphasis would be put on his defensive and baserunning issues (which isn't to say anything negative about Dunn, he's a damn fine offensive player). Maybe we find superior replacement for SNLVAR or SIERA? Who knows if in another ten years we'll have a clearer picture of the impact of defense, home park, opposition faced, injuries and even a pennant race on a pitcher that makes us look back at this year and wonder why we thought Hernandez was a foregone conclusion instead of, say, Cliff Lee or someone else?

I think speaking in absolutes tends to upset the mainstream baseball community as something like the Cy Young or MVP award is not meant to be an xFIP, SIERA or WAR title. I think there needs to be room for doubt in the analysis or else it should be a pure math award like an ERA Title, a Batting Title or a Home Run crown. I hate cliches about players being 'winners' or 'grinders' as much as the next BP reader, but we have to be careful about going too far in the opposite direction. I'm concerned about calling Hernandez far and away the best pitcher when well educated analysis can come to a different conclusion.

All that said my ballot would probably be: Hernandez, Lester, Sabathia, Lee, Price.

Is it just me or does this whole article feel like it's been jury-rigged to undermine Jon Lester?

On WARP, which was disregarded for seemingly insufficient reasons (especially for AL pitchers), Lester is a very close second to Felix.

Instead of WARP, a series of highly correlated measures are used, and then added together, a process which magnifies Felix's lead. Under this metric Lester's still third -- but never mentioned. Instead, the question is framed as Felix v. CC. Where are these mythical sportswriters clamoring for CC to win the award but ignoring Lester?

Felix has a strong claim on the award, but the obvious question to debate is Felix v. Lester, not Felix v. CC.