And Sandy Koufax

A few years ago, I wrote an article called The Best Pitchers of All Time. It was an admittedly grandiose title, one intended to get lots of hits from search engines and that sort of thing.

The article itself was a vehicle for rolling out a variation of Bill James’ Win Shares system I had created, called Win Shares Above Bench. Previously, I had rolled out a coupleof articles ranking the best hitters of all time, but the titles of those pieces were pretty boring.

Evidently, my titling strategy worked because I still get emails about that pitching piece (I never get emails about the hitter rankings). But, mostly I still get emails about the article because I left Sandy Koufax off my list of the 40 greatest pitchers ever.

The response is understandable. From 1962 through 1966, Koufax was 111-34 with a 1.95 ERA. He finished first in ERA all five years and won three Cy Young awards. He won one MVP and finished second in MVP voting twice. He was also terrific in the postseason, with a 0.95 ERA. He was dominant.

Still, I left the guy off my top 40 list, and I feel I owe people an explanation. Instead of responding to emails individually, I’ve decided to just post my reasoning here and refer to it in the future. It will save me a lot of typing.

So…where to begin? Well, one commenter told me I never would have left him off the list if I had actually seen him pitch. Let me assure you that I am an old man. I saw Koufax pitch. He was indeed a great pitcher in the early 1960’s—when I was at my most impressionable, to boot—but my article wasn’t about impressions. It was about a new metric that I was pretty proud of (still am, in fact). And in Win Shares Above Bench, Koufax not only doesn’t rank in the top 40, he’s 91st.

Crazy, you say? Well, maybe. But other, comparable systems also rate Koufax below the top 40. Sean Smith’s WAR rates him 61st all time. In the 2007 THT Annual, David Gassko rates him 64th. If you want to argue that WSAB is wrong and Koufax should be 30 places higher, I can accept that. But 50 places higher? Nope. Here’s why.

One. Sandy Koufax pitched in a very pitcher-friendly environment. Here is a graph of the average number of runs scored per game throughout baseball history, with Koufax’s five prime years highlighted:

Teams scored an average of 4.09 runs per game in Koufax’s prime. The overall major league average has been 4.39. Plus, he pitched in Dodger Stadium, a notorious pitcher’s park at the time, with a pitching park factor of .91. So the effective run environment in which Koufax pitched was .91 times 4.09, or 3.72 runs per game—more than half a run less than the historical major league average.

To hammer home the point, Koufax’s ERA in Dodger Stadium was 1.37. Not for one year, for his career. You could even go so far as to say that the ballpark was primarily responsible for the first of his five superb years (1962).

So the baseline, the essential comparison point, for Koufax’s fantastic five years is different than that for most pitchers. As a final consideration, look at Koufax’s ranking in ERA+, a measure that accounts for run environments. Koufax is 37th in all-time ERA+. Yes, that’s in the top 40. But I’m just getting started.

One(A). ERA+ is flawed in a way that helps pitchers in low run environments. In fact, this is one of my pet statistical peeves. However, it’s not a big deal and that’s all I’ll say about it.

Two. Koufax pitched in the aftermath of major league baseball’s first big expansion. On a percentage basis, the hitting talent may have been more diluted from the previous years than in any other time in history. Koufax posted a 1.90 ERA against the Astros and a phenomenal 1.44 ERA against the Mets. The next lowest team-specific ERA he posted was 2.44 (Cubs). I don’t think it’s entirely a coincidence that Koufax’s five best years occurred immediately after the league expanded and the Dodgers moved into Dodger Stadium.

Three. Koufax only had five great years. Now, five is a lot and what’s more, he had them consecutively. That is extremely impressive. In fact, if you rank pitchers by something I called All-Star Win Shares (which only considers years in which players were above average and doesn’t discount the below-average years), Koufax is 31st all time. Koufax is justifiably revered for those five fantastic years.

But he had very few “pretty good” years before that and, of course, his career ended abruptly due to arthritis. Outside of the big five, he wasn’t an above-average pitcher overall. If you don’t want to count those years against him, perhaps he should rank in the top 40. But when you do include them, it is very difficult to say that he was a top 40 pitcher.

Now, don’t get me wrong. I love Sandy Koufax. He was an amazing pitcher and, from all accounts, he’s a terrific person. But I’m the type of person who ranks people according to stats and not according to impressions. I also believe in ranking players fairly, taking care to incorporate as much context as possible and including their entire careers—not just their peaks. And that’s why Sandy Koufax didn’t make my list of the greatest pitchers of all time.

Just nitpicking here, but if Dodger Stadium’s park factor was 0.91, and we assume Koufax pitched half his games there, then shouldn’t you correct by 0.955 (halfway between 0.91 and 1) to take road games into account?

The correction may be slightly more since Dodger Stadium’s 0.91 should not be included in the mean value for the road parks.

Regardless of where Koufax ranks all time, he’s at the top of my list of pitchers I wish I had seen.

I thought Baseball Reference’s Park Factors were already calculated so that you could apply them in a straightforward manner, making the ratio of RS+RA in Dodger Stadium to RS+RA on the road roughly equal to 0.8.

I have always loved baseball because of the ability to measure different eras with stats. Stats are great but if I needed one pitcher to pitch one game, the 7th of the World Series, I’m not sure who I would pick over Koufax. If you saw him pitch, you measure his won-lost, you can argue it, but it’s a tough argument. A minor point is that Koufax pitched for notoriously weak hitting teams. He won every way you needed to win. Goes to my first point.
Stats people love to talk about bookends but to me, when you say the greatest ever, you’re talking about the guy you want out there when it counts. Great discussion.

I have to agree with Dave. For the first 7 years of his career, Koufax was basically a .500 pitcher with a walk rate of 5-6 per nine innings. He was certainly dominant the last 5 years when he found the strike zone but a 5 year string of dominance does not completely wipe out 7 years of mediocrity.

Now, if he’d been able to maintain the dominance on his last 5 years for another 5 years, we might not be having this conversation…

Two things. First, I wasn’t offering an impression, I was trying to make the point that some 5 year careers transcend longer careers because of their sheer uniqueness and dominance. I know this is extremely arguable but I’d take those 7 years of mediocrity to get those 5 years. And I would take that over most more consistent pitchers. Secondly and perhaps more importantly was Koufax under pressure. That is a different statistical measurement and to me, if I was a manager, more germane to the question of who is the greatest. He rarely worked with much of a lead and yet he won and won. And he won with dominance when it mattered. Ridiculous winning pct. given that. So to conclude my final point pressure performance creates greatness and while I don’t know if that makes him the greatest, to see how others stack up against that and not the length of the career would be a more significant measurement of who is the greatest.

Tim, regarding the tradeoff between his mediocre years and prime years—unless you’re willing to quantify the difference, it’s an impression, or perhaps a judgment call, but not a quantifiable difference.

I don’t know what you’re saying about pitching under pressure. Are you saying that his won/loss record is even better than his runs allowed would indicate? Do you know something about his run support?

He DID pitch very well in high-leverage situations, which is no surprise, really. Is that what you’re referring to?

So I did a little more digging at Baseball Reference. Based on the run support Koufax received during his five prime years and the number of runs he allowed, he has a pythagorean winning percentage of .770. His actual winning percentage was .766. Seems pretty much spot on, which implies that his ERA fully reflects his pitching value.

BTW, the Dodgers scored 4.2 runs per each one of his starts during those five years, which was higher than the overall major league average at the time.

Having said that, it does appear that he pitched to his run support to some degree, and he was awesome in high-leverage situations. So there may be something to what you’re saying, but not enough to fundamentally change his standing, I don’t think.

As to this debate about mediocre/dominant seasons vs. more consistency, I remember reading an article contrasting the careers of Don Drysdale and Don Sutton – the conclusion was that their careers were of similar value, if Sutton’s wasn’t more valuable overall, yet the way they distributed that value made a difference – Sutton had more consistent value over a long period of time, while Drysdale concentrated most of his value in a few years (like Koufax). If the overall goal is to win the World Series, then, the article concluded, having a few years with really high value are more helpful than having a lot of years with decent value: I don’t think it’s a coincidence, then, that those 60s Dodgers teams, teams that had two such Peak Value guys, won two titles and made it to the Series another time in that five year window we’re talking about. So in that I want my team winning championships, I’m taking Koufax, because he’ll win me a few titles, even though there will be other years where he contributes nothing, as opposed to a more consistency/longevity guy, who has me contending year after year but not winning it.

Good article, but I feel that ERA+ actually favors pitchers in higher run environments. Unlike OPS, which has no upper limit, an earned run average can only get so low. In 1966, for Sandy Koufax to have an ERA+ that matched what Pedro Martinez had in 2000 (291), he would have needed an ERA of 1.13. While it is possible to manage an ERA that low in a 300-inning season, it is a pretty unrealistic expectation. It’s just too rare for a team not to be able to manufacture a couple of runs, no matter how depressed the run environment is.

Whoops, I just realized I got involved in the wrong discussion. I checked your original story about the Best Pitchers of all time and saw Bert Blyleven. While there were a few other names, quite a few, that stuck out I almost fell out of my chair when I saw that. Over Koufax? Right.

Well yes, and I don’t agree with your conclusion. The fact that it was more difficult to post an ERA+ of 130 in the 60s than it is today makes a 130 ERA+ in the 60s more impressive, not less so. Also, when comparing pitchers with an equal ERA+, I would favor the pitcher with the lower ERA. We chalk up era differences to dynamic playing conditions, and that is true to a large extent. But it is also possible that there were just a larger percentage of talented pitchers in the 60s than there are today, and we shouldn’t penalize Koufax etc. for that.

I’d say there is a lot of doubt about that, for all the reasons listed above.

It is highly subjective, but there are ways to compare pitcher’s stats with the league they played in, etc.

I don’t believe the way I’ve approached it is subjective at all. I’m sure the approach can be improved upon, but I wouldn’t call it subjective.

But it is also possible that there were just a larger percentage of talented pitchers in the 60s than there are today, and we shouldn’t penalize Koufax etc. for that.

Of course we should. Who knows what drives offense/defense up or down, but in low-offensive eras, good offensive players are more valuable than good defensive players. If you focus on value instead of how “good” a player actually is, you can come up with real answers to your questions. Otherwise, you tie yourself up with Gordian Knots.

That’s the same reason a high ERA+ is less valuable in a low-run environment era. The fact that you can’t allow less than zero runs means that there is a limit on your value in that type of environment.

Bottom line in all of these discussions: focus on value, not on “true talent” or whatever you want to call it. The former can be objectively defined, albeit imperfectly. The latter is truly unknowable.

Of course we should. Who knows what drives offense/defense up or down, but in low-offensive eras, good offensive players are more valuable than good defensive players. If you focus on value instead of how “good” a player actually is, you can come up with real answers to your questions. Otherwise, you tie yourself up with Gordian Knots.

That’s the same reason a high ERA+ is less valuable in a low-run environment era. The fact that you can’t allow less than zero runs means that there is a limit on your value in that type of environment.

Bottom line in all of these discussions: focus on value, not on “true talent” or whatever you want to call it. The former can be objectively defined, albeit imperfectly. The latter is truly unknowable.

I am focusing on value. What you are doing is treating your variables incongruously.

Our big objective in comparing across eras is normalizing variables. We want to know how Koufax etc. would have performed under neutral conditions so we can compare them on an even keel.

Koufax had a huge advantage in playing in a pitcher’s park with an illegal mound height, so we penalize him for that. He had an advantage in playing during a pitcher’s era, so we penalize him for that. ERA+ accomplishes both of these provisions to a large extent (though you could certainly argue that it does not penalize him enough for the park effects given his home/road splits).

Now where Koufax was at a disadvantage is in the fact that his ERA can only be so much better than his peers due to the lower limit of ERA. So your solution is to penalize him yet again, using a different definition of the word “value” than you did previously.

Do you see what has happened? We’ve taken Koufax out of his park and league context for the purposes of his ERA+, but then you’ve left him in his league’s context for the purposes of mERA. If we are trying to neutralize league and park effects to gauge a player’s value, you can’t leave him in his league’s context for anything.

Now if you use the word “value” to mean a pitcher’s value totally in-context of his time, than your mERA adjustment makes sense, but then the original league and park adjustments for ERA+ do not. Frankly, if this is your definition of value, than win-loss record is a great metric to use. Like the lower threshold for earned run average, run support is something beyond the pitcher’s control, yet something that affects his value to his team.

I wouldn’t judge a pitcher’s value that way, but at least in doing so it would be internally consistent.

I’d love to see a list that compared (for both hitters and pitchers) Wins Above All Star and Wins Above Bench (or Replacement) and which players see the biggest differences.

In the hitters article, I talk a lot about the relative differences. Not so much in the pitching article. One of the biggest jumps from WSAB to All Star WS is Smokey Joe Wood, who goes from 134th to 22nd. Dizzy Dean also has a big jump, slightly more than Koufax’s.

Our big objective in comparing across eras is normalizing variables. We want to know how Koufax etc. would have performed under neutral conditions so we can compare them on an even keel.

…is where we disagree. To me, it is impossible to neutralize conditions. How do we account for steroids, blacks not playing ball, etc.? I don’t think we can or ever will.

When we “normalize” a player’s environment, we get a better handle on his value. A 1.73 ERA is worth less in a low-run environment. That is a fact and stats like ERA+ help us quantify that fact.

I think a pitcher’s won/loss record is a poor way to quantify his impact because I don’t believe in giving a pitcher credit for the value of the offense of his team. However, I’m one of those people who doesn’t have a problem using WPA (WPA/LI for starting pitchers), once it has been properly set up for the run environment.

Other issues that should be addressed is the height of the mound Koufax and his peers pitched from versus those after 1968 and the number of innings they pitched relative to today. No one from that era achieved 300 wins. Also the practice of having valuable starters pitch in relief stopped in 1960 for some reason. Ford, Spahn, etc all had 8 – 12 relief appearances a year. Once Koufax bacame a full time starter in 1958 he has 14, 12, 11 and 7 relief appearances each year, which was not uncommon. From 1961 on, his great years, he had a total of 5. Big change.

As to wins, over a period of time there seems to be some value. After all that is what they played for. Comparing the greats of that era in terms of wins over their 6 best consecutive years:
Koufax 129
Gibson 119
Jenkins 127
Marichal 133
Spahn 126 (who obviously won over 300 in this era)
Alot of National Leaguers on different teams, teams up and down, with similar 6 year win totals.

In my mind, Dave has the rating correct. Koufax was comparable to his contemporary greats for 6 years, worse for six years, and they had much longer and more successful careers. They all won games in the environment they were faced with. Top 100 all time is still pretty darn good.

@Keith, I was thinking about my reply and I don’t know that I addressed your concern directly enough. Let me give it one more try, for old time’s sake.

Think of ERA as a value statement. A 2.43 (or whatever) ERA has an implied value to a casual observer. ERA+ is also a value statement—one that takes into account the player’s run environment. It doesn’t “normalize” a variable (at least not to my way of thinking). It’s a separate value statement that has a consistent continuum across different eras of baseball.

Really, it’s just like linear weights. Linear weights vary by era, depending on the underlying run environment. You multiply a batter’s stats (which also have their own implied value) by the appropriate linear weights to get a consistent value continuum across eras.

The ERA+ formula, just like the linear weights, has to apply appropriately to the player’s actual run environment. If it doesn’t, then the comparisons don’t work.

Here’s the key, pulling from your comment. We don’t “pull him out of his run environment” for purposes of his ERA+. We want ERA+ to express his value, in his run environment, in a manner that can be consistently compared to ERA+‘s from other eras. Pulling his stats out of his run environment is exactly the wrong thing to do.

I wanted to make it clear that I’m not treating variables incongruously. My notion of the process is different from yours.

I think the corollary you are looking for with regards to linear weights is not the multiplier for the value of an out – that is basically the same thing that ERA+ does by setting the league average to 100 (with batting or pitching runs, zero). Instead, the corollary to using mERA over ERA+ I believe is using batting wins in favor of batting runs.

I do realize I am in the minority of preferring batting runs to batting wins; maybe I am looking at player comparisons across eras differently than most. At the same time, I think if you asked the majority who would pitch better under the same conditions, the pitcher with the 2.50 ERA and 130 ERA+ or the one with the 3.50 ERA and 130 ERA+, they would select the former, even though mERA and pitching wins tell us the opposite is true.