Prospectus Idol Entry

Babe Ruth the Pitcher

One of my favorite essays on any topic is Nate Silver's "Is Barry Bonds Better Than Babe Ruth?" from Baseball Between the Numbers. Rather than rehashing the same tired arguments about how much harder it was to hit a home run in Ruth's time or how much better the competition against Bonds was almost a century later, Silver uses a variety of metrics to demonstrate how the two players would actually perform on a level playing field.

The two players are very close until Ruth's pitching is taken into account. While his hitting is examined in great detail, his pitching is only valued via NRA. NRA has nothing to do with rifles; it actually stands for Normalized Runs Allowed, which basically takes Ruth's run prevention skills from the Teens and tells us what they would look like in a more typical "all-time" Major League season, adjusting for his park and league along the way. An NRA looks like an ERA in a normalized environment. An average pitcher will have an NRA of about 4.50. While I don't find fault with any of Nate's conclusions, NRA does have some inherent issues that could make it a faulty metric when we want to see the quality of a pitcher's work. It adjusts the runs to more recent times, but doesn't take the player's peripheral stats into account. In other words, if a player had incredibly good luck piled on top of good defense, NRA would like him a lot more than it should. All of this got me thinking about Babe Ruth and how he would really fare against not just a level playing field, but also the players of today.

In order to know how Ruth would fare we have to first become acquainted with Davenport Translations, or DTs. If you've ever walked into a sports bar, you probably heard some drunk guy shouting about how steroids are ruining the "statistical integrity" of the game. Minding your own business, you knew with certainty that the game never had the type of statistical integrity the fellow was shouting about. A home run in 1920 was different than one in 1960, which in turn is different from one hit today. DTs account for just that. They convert the numbers for all players in all time periods to tell us how they would perform in a neutral environment.

There is a DT for ERA, but it is essentially very similar to NRA-it doesn't account for the pitcher's peripheral numbers. Fortunately, those peripheral numbers are also included in the DTs. K/9, BB/9, and HR/9 are the central stats that the pitcher has control over. They are also the stats we need to look at to find out how Ruth would have pitched today.

We'll concern ourselves with the years 1915 through 1919 when Ruth did the vast majority of his pitching. He had 3.70 K/9, 3.22 BB/9, and 0.09 HR/9. Anyone who's looked at a box score of a game that Mike Pelfrey has pitched knows a lot about these numbers. Having a similar number of walks and strikeouts is bad, but on days when the pitcher in question doesn't allow any home runs, they look like some kind of all right.

Ruth didn't allow homers because of the era he pitched in, but a look at the Davenport Translations of these numbers from Ruth's DT page tells us that he wasn't as bad of a pitcher as the Ks and BBs seem to suggest. His average DT rates for the five years we are concerned with were as follows: 5.56 K/9, 3.46 BB/9, and 0.99 HR/9.

Of course, the homers shoot up because the neutral environment reflects how much power has gone up in the years since Ruth pitched, and strikeouts come up as well. Ruth pitched at a time when pitchers didn't get high K rates. One reason is that they were expected to throw hundreds and hundreds of innings. For example, the Babe threw 317 in 1917. We couldn't expect him to keep up a high K rate any more than we might expect Mariano Rivera to keep up his if he had to throw 6 innings every time he came out.

The DT rates exist on a level playing field. It is a sort of imaginary "all-time" season where the average pitcher has rates of 6.00 K/9, 3.00 BB/9, 1.00 HR/9, and a 4.50 ERA. We can use these numbers to compare whichever pitcher we're looking at. In this case, Ruth has walk and K rates that aren't bad but still aren't as good as an average pitcher's rate. Although the average DT rates represent those of a typical environment, I'm not sure where we can find a real season exactly like it. Nevertheless, rates in the last ten years of Major League Baseball aren't entirely dissimilar: 6.40 K/9, 2.91 BB/9, 1.04 HR/9 and a 4.17 ERA (I am only focusing on starting pitchers who logged at least 150 innings here since their job is more similar to Ruth's). Clearly, the numbers aren't that far off, except maybe the ERA, which doesn't concern us since we'll figure out our own from the peripheral stats.

The first thing we need to do is convert Ruth's numbers from the neutral setting to the contemporary setting, i.e. the last ten years. For the mathematically interested: the ideal way to do this is through the use of standard deviation. But in order to do that we need a season very similar to the neutral setting, and I don't have one on hand. That said, the neutral stats are similar enough to the actual stats that the quick and dirty method of "splitting the difference" will suffice and give us numbers that aren't far off. For example, we saw that the average K rate changed from 6.00 in the neutral DT environment to 6.40 in the last ten years. We can assume that hitters strike out a bit more often these days (see Howard, Ryan), but pitchers are also probably more talented and Ruth's level of talent would remain the same. So we'll split the difference and bump up his K rate by 0.20, half of the full 0.40.

Ruth gets an altered line of 5.76 K/9, 3.41 BB/9, and 1.01 HR/9. These numbers aren't sexy and the Babe certainly wouldn't be confused with a Cy Young candidate, but pitchers like Jeff Suppan and Miguel Batista have made careers out of lines like this, and it is certainly better than replacement level.

Let's take an even closer look to see what kind of ERA the Sultan of Stock Pitching would have in the years 1999 to 2008. Our typical ERA is 4.17 so we'll start there. Last week, I went over the value of preventing home runs. Ruth's home run rate is not only estimated without the benefit of groundball rates, but it's also so close to average that it's not going to make much difference, in terms of accuracy, to apply it here. However, in the spirit of being thorough while assessing a pitcher's ERA, we'll apply his home run rate and find that he comes down to 4.13.

The methodology I used to find how much a good HR/9 lowers an ERA may also be used to find how much it is affected by Ks and walks. Using the pitchers in the last ten years that we are comparing Ruth against, each K/9 below or above average tended to respectively increase or decrease an ERA by 0.17. For instance, if Ruth had 5.40 K/9, exactly one less the average of 6.40, we'd raise his ERA by 0.17. As it stands, he has 0.64 K/9 less than the typical starting pitcher so his ERA is raised by 0.11 to 4.24.

Each BB/9 will tend to increase an ERA by 0.29. Ruth would walk 0.50 hitters more than average according to our conversion, so he gets a final lift of 0.15. Ruth's ERA stands at 4.39. The Bambino might have been what we like to call an innings eater: he eats up innings and vomits runs, but at least he makes it through enough of them to give the bullpen a rest.

Of course, we looked at all Ruth's pitching years as a whole. Let's see how he would have done as a contemporary pitcher in each of the individual seasons he pitched:

Looking at him on a year-to-year basis, a clear pattern comes into focus. Ruth had a nice year in 1916, but the following years don't reflect a positive trend. Even with a healthy conversion rate, his strikeout rate dips into the Carlos Silva danger zone. Meanwhile, the walks and home runs don't do him any favors either.

Earlier, I mentioned Ruth's NRA and DT ERA and how they convert his actual ERA, while ignoring peripheral stats. Here, we find that the above ERAs, based on his translated peripherals, are higher than the two former metrics suggested, even despite the lower baseline ERA in the last ten years. The fact that Ruth's translated earned run averages come up better than they should indicates that Ruth's ERA was better than it should have been in the years he actually pitched. In other words, his pitching was somewhat overrated even in his own time because of good luck and defense.

Ruth famously quit pitching so he could hit every day, but his would not be a memorable pitching career whether or not it had continued. If he pitched like this in the twenty-first century, he'd have a job, but by his fifth year, he'd be borderline and would probably be looking at the end of a shorter career than he might have liked. In his day, he would have had a longer pitching career that stretched into the twenties, but he wouldn't be a very notable figure in sports history. Good thing he could hit a little bit too.

I really like how easily Brian broke down some really heavy statistical concepts into chunks. It took him a while to explain NRA, but by doing so slowly, methodically, and (!!) entertainingly, I was able to not only follow along, but barely notice that I was absorbing this. He gets a bit too dense for me (and yes, I'm dense on subjects like this) in the middle, but he couldn't keep going as slow as he had without ending up about 5,000 words. There's some wobble from the strong start but he ends with a strong conclusion that isn't just strong, but supported well.

Jun 20, 2009 22:11 PM

Steven Goldman

BP staff

Good for Brian for having the guts to wade into the murky waters of Deadball Era translations, where even Clay Davenport sometimes fears to tread. Through no fault of his own, he's handicapped a bit by not having doubles and triples allowed for the Babe, as with these numbers he could have extrapolated a bit more about Ruth's pitching tendencies. I would have liked to see some effort to confirm his assumptions about the defenses that played behind Ruth on the Red Sox, even if it was simply citing the defensive efficiency numbers for those teams (in 1916, the year Ruth had a 1.75 ERA the Sox led the league). At first I was a little unsure of Brian's conclusions given just how dominant Ruth was that year, but the more I think about it, the more I think he's right that Ruth was on a downward trajectory as a pitcher. Either way, as Will said, he did a fine job of making the methodology transparent.

Jun 21, 2009 00:00 AM

Christina Kahrl

BP staff

Brian delivered a thoughtful exposition on data interpretation while once again flashing quality writing skills. While a lot of the attention has been given to Matt and Tim for their week-to-week delivery of outstanding work, points to Brian for doing as he did last week, and taking things up a notch and quietly make a case that he's not someone to sell short. Because of my own reservations about the dangers of getting overly worked up on performances achieved in small, white leagues with debilitating problems in terms of competitive balance, I was a little concerned initially with the topic, but then he wound up going somewhere I'd call a happy place in terms of demystifying that performance a bit.

Jun 21, 2009 08:38 AM

Kevin Goldstein

BP staff

Concept AND execution, which is something I wish we had more of this week. I know people read these in a random order, but this was the last of the week for me, and the two Brian's really saved the week for me.

This really gave a clear explanation of Ruth's effectiveness as a pitcher. I was under the impression he was much better than this. Brian did a super job of clearing up that misconception for me and I assume many others.

Now, that's interesting! That should have been the point of your article. If you made that case, I would have given you an A. If you did, I missed it, because I couldn't care enough about how precisely good Ruth was as a pitcher to trudge through your long explanation.

My comments:
Wow. After hitting a home run last week with his delightful Brett Cecil article, Brian completely lost my interest around the mid point of this article. I couldn't finish it.

Content: C - way too much about so little. Apparently, Brian came up with a well calculated conclusion about Ruth's performance as a pitcher. I didn't have the patience to get there.

Really? What good are translations if the 40th best pitcher in the league (maybe giving Suppan or Batista too much credit?) in the 2000s would be the best pitcher in the league in the 19teens? Any way you slice it, Babe Ruth was one of the 5 best pitchers in the AL in 1916-1918.

Granted, you did the best with a very tough area to translate, my only problem is the conclusion drawn. That you have convinced many posters here that Ruth wasn't that good of a pitcher is a little troubling to me, because I'm not sure that is correct.

Because translations are based on relative performance. What Tim's talking about is the time machine problem -- teleport Jeff Suppan back to the 1910s in the condition he is now, and his conditioning, repertoire, velocity, command, strategy, etc. would likely be lightyears ahead of anybody else's.

Of course it's not really fair. Take a young Walter Johnson to the present day and give him a modern ballplayer's upbringing and he could be the same Cy Young-winning legend. Or perhaps he'd never get the feel for a change-up and wash out of Rookie ball. It's impossible to say. But that's what this article was trying to do -- take a stab at what a pitcher with Babe Ruth's (pitching) talents might look like today. It's not necessarily terribly meaningful, but I thought it a fun exercise.

What we know is that he was one of the most dominant pitchers of his day. How does his dominance compare to Koufax's, or Santana's? That's what translations are for.

Which isn't to say that I necessarily disagree with your overall point, that you're not sold on the translation. I was referring to your first paragraph and what Tim meant by his comment. Sorry for the double post.

Second to last article for me to read this week (with Ken's being the last), but my favorite so far. I am sure someone can find something to nitpick here or there about some assumption, translation or calculation that should've been done differently, but I liked the adventure itself. Where, in prior weeks, Brian had been vague, he does a great job of drilling down here and I thought his topic choice was pretty daring.

Well done, Brian - You have consistently given us something unique in your look. I'm not sure I buy the translation 100%, but like Will, numbers aren't my strong suit. Nontheless, big points for looking at a time-honoured assumption (Babe's pitching prowess) and deconstructing it.

I have made it a point to not leave negative comments on any BPI articles (if this is considered negative I apologize in advance) but this week's crop didn't dazzle me at all. Having said that, I appreciate this author's take and his attempt to delve into that (to some) scary era and bring it on home to the 00's. He will get a thumb's up for me just based on that - considering there isn't anything else this week that makes me go 'wow'.

I will say this though - I couldn't write even one structurally sound or interesting piece - so I give mega-kudos to all of the contestants. There have been some very good works by the remaining writers in previous weeks, thank you all for some good reads.

Great choice of topic -- and that's an important skill -- but I think you bobbled the execution.

Paragraphs 2 through 9 dragged on and on; there has to be a more concise way to say that. Especially since, as you nearly said explicitly, you simply can't translate Ruth's K/9 or HR/9 to a translated modern equivalent and make comparisons to modern pitchers with those rates.

Modern pitchers are effective by minimizing baserunners and HR, because crooked numbers lurk in every PA. Dead-ball pitchers had different primary worries, and were successful in different ways. Was Ruth declining in his ability to do the things that made pitchers successful in 1917? It would have been nice to learn the answer to that.

I agree. This was the first article I clicked on because I found the subject matter very interesting. I think you spent too much time trying to explain the statistical conversions, particularly since the eras are so different as to make comparisons very sketchy. I would have liked to see you spend more words around your conclusion and perhaps even relate the Babe's situation to Micah Owings who has generated a lot of chatter lately primarily due to the Reds' offensive woes. Still, I voted thumbs up since I was interested in the topic learned something I didn't know before about the Babe's being unlikely to be a standout if he continued pitching fulltime.

That was really cool, excellent article! Honestly, I wasn't a fan of your first few articles, Brian, but the last two have really won me over.

Also, I must disagree with the idea that the discussion should have been significantly condensed. It probably could have done without the little rehash of what DTs are for -- we're at BP, we know that -- but that's nitpicking. When you're doing your own translation, as Brian is here, you need to be really explicit about what you're doing and why. he managed to do it thoroughly without getting caught up in the minutiae. Superb job.

Brian's best effort to date, by a considerable margin. His work continues to improve. In fact, it is the first time he has earned my vote.

However, given the way his conclusion flies in the face of conventional wisdom, it would be stronger if he included some translations of others from Ruth's pitching era, be it good pitchers, like Stan Coveleski and Hippo Vaughn, or inarguable greats, like Walter Johnson and Pete Alexander.

It would be interesting to see if Johnson graded out as a number 3 starter. If he did, then I think, though the methodology is mathematically sound, further tweaking to compensate for the strikingly different era is warranted.

As Christina's comment implies, translating stats from an era of whites-only, dead-ball baseball is pointless in an of itself. That said, as someone who's read countless books and articles on the deadball era, and the great Redsox teams of the teens in particular, the conclusions Brian draws make sense to me. Unfortunately, this brings to light a problem I've had with the 3 articles I've read so far - little to no effort made to actually research contemporary sources for the eras in question. I know there are deadlines and all, but it shouldn't be that difficult to find contemporary sources to back up your data. It seems to me that just using modern data metrics to apply to historical concepts is the easy way out.

Still, this was overall a fine piece of writing, as Brian does seem to have kicked his writing up a few notches these past 2 weeks.

In a low-HR, low-K, are H/9 really an ignorable statistic, entirely defense dependent? Is it OK for the article to make that implicit assumption without discussion?

I don't think it is OK, given that the Babe finished 2, 1, 3, 4 in H/9 in his four full seasons of pitching. He's much better than teammates at preventing hits in '16-'18 (15-20% better than the rest of the team, and none of the other starters were bad pitchers). So, the more I look at it, the more I have to question that it's not at least addressed that his record is built by avoiding hits, whether by luck or skill. I mean, how else does a pitcher with average HR, BB and K rates get so much better results than the other pitchers on the staff?

I don't understand the shift from league averages (via DT) to a sample that only includes 150+ innings hurlers...that's definitely not kosher. Maybe there's no difference between this 150+ IP sample and league average, but you don't say that.

And it has to be worth a word on the HR translation being is a bit silly because we're talking about 8 homers in 1200 innings. And 0 HR in 1916 translates into 0.72HR/9 somehow...I understand the idea, but a few caveats seem in order...obviously these translations break down at certain extremes, and HR rates from the deadball era is one of them. For example, if he'd allowed 1 HR that year, he would have had a MUCH higher translated ERA for you, right?

The writing is clear and concise, though more slangy than I like it. I certainly never doubt what you're trying to say, which is a solid plus.

I don't think you've proved your point, and I get the feeling that the stuff you've left out was left out because it doesn't fit your conclusion...and that's no good.

Could the Babe limit hits of opposing batters in his era? Maybe. His hit translations come up to around 8 H/9. That could just as likely be due to luck and defense. In either event, it is unlikely that the ability would convert to today's game so I needed to omit it in order to eliminate luck and defense from the equation, which, of course, was part of the point of the article.

As far as using starting pitchers in my comparison, the DT Translations make him into a starter and convert his innings. Furthermore his original stats as a starter were converted. That is to say that relief pitchers have different data attached to their rate stats because of coming in in the middle of an inning and so forth. Consistently comparing him to SPs seemed like the way to go.

"For example, if he'd allowed 1 HR that year, he would have had a MUCH higher translated ERA for you, right?" That is, perhaps, a better question for Davenport, but for my part, it seemed appropriate that they translated close to league average. I assume that there is a lot of reversion to the mean going on there. I would have loved some ground ball data to use, but I couldn't even find anecdotal evidence of whether he was killing worms. You do the best you can with what God or the record-keepers of the time give you.

I don't get your response on H/9. I can't see how it would be explained by team defense given his huge margin of outperformance of his teammates. But my main point - maybe the only point that's really fair, given the time constraints you had - is that you can't just ignore this thing that happened.

On the translation - thanks for clarifying...I had assumed that the DT was done using straight league averages.

On the HR/9, you could do the DT with 1 HR vs his actual 0 HR in 1916, just as an exercise to measure the sensitivity. Since the league average is so low, each HR has a disproportionate effect on his translated stats and translated ERA. It really does call into question the whole thing, in a 'why bother' kind of way.

Nice article. I have a couple concerns, but they are not so much criticisms, as just suggestions to go beyond the intended scope of this piece.

First, I have concerns about how well DTs work when stretched into an era SO different from the one we are all concerned about. For example, HR/9 can be converted to what it would be, relative to the era, in a "neutral" year, but what we haven't converted for is the relative importance of the stat. I mean, you can figure out how to convert HR/9, but you need to take account somehow of the fact that it was a much less meaningful or important stat in 1917 than it is today, perhaps comparable to triples/9 innings for pitchers today, something that is not really tracked and probably pitcher-to-pitcher differences are virtually random (maybe why Babe was so close to average on his converted numbers).

Second, the entire discussion focuses on rate stats, rather than counting stats, and thus robs the Babe of credit for pitching so many innings (which was mentioned in passing). Surely a pitcher with the same rate stats is contributing a lot more to his team if he maintains that pace over 1/4 of his team's innings or more (per 1917) as compared to a typical-for-today more like 1/6 or 1/7 of his team's innings.

Just with regards to the last paragraph, I was trying to illustrate that his translated rate stats improve partly because he would be pitching less innings. You can see the counting stat translations on his DT page (including IP) and they illustrate the same point. Sorry if that was not made clear in the article.

I got into a fun discussion about Ruth's pitching about four years ago over at BaseballHQ ... the arguments for his being worthy of praise as a pitcher were his ERA+, his Win Shares, and his age (very young). But when the talk got around to projecting him as a clear (would be) All-Star and potential HoF-caliber pitcher with comparisons of his 1916-1917 seasons to other contemporary and future greats like Walter Johnson's '16-'17, Koufax's '65, Gibson's '68, and Guidry's '78, I went to the translated peripherals, as you did here, to put a clamp down on that argument.

Ruth's 1916 numbers translated to 7.2 K/9 and 2.2 K/BB, with a 26% hit rate (BABIP) and 74% strand rate. His 1917 numbers translated to 5.9 K/9 and 1.9 K/BB with a 26% hit rate and 73% strand rate. Just Walter Johnson's lines -- from a peer in the exact same seasons -- showed the gap between Ruth and a true all-time great. Johnson's 1916 numbers translated to 9.1 K/9 and 6.7 K/BB, but with a 28% hit rate and ugly 65% strand rate, and his 1917 numbers translated to 8.7 K/9 and 5.1 K/BB with a relatively "normal" 29% hit rate and 71% strand rate.

As your article suggested, Ruth was the beneficiary of good luck/defense in addition to being a good-but-not-great pitcher.

We can't just stipulate that preventing hits in the deadball era was all luck and defense. The fact that Ruth delivered hit rates 10-22% lower than his teammates for each of the 4 years (1034 innings) suggests that it was neither luck nor defense.

In 1919, he started 15 games, and simply didn't have it - he gave up 10 hits/9 with the highest BB/9 and lowest K/9 rates of his career. Maybe because he got 500+ ABs. Or maybe Ruth had peaked as a pitcher in 1916 - his K/9 and K/BB rates were falling fast from that peak. But it's generally attributed to him not dedicating himself to pitching - he got PH ABs in 1916-1917, but didn't appear in the field until 1918, when he started 19 games on the mound and played in 72 in the field (and got his second ring in three seasons).

Anyway, I would suggest that the burden of proof is on those of you who say that Ruth's low H/9 rates are luck and defense. You both seem to have it as an article of faith.

I definitely identify with the postmodern camp where it is okay to blend some pop culture with your academia. It seems like they shouldn't go together at first, but they really make a tasty concoction, sort of like maple syrup and sausage.

As a sim-baseball player, I find Ruth to be over-valued as a pitcher, generally. Definitely better options to choose from in his peers, as you point out.

Question for you, however, about one of the details of the translations from the deadball era in regards to home runs allowed. Since HR/9 is a key component of your analysis, and correctly so, understanding how the DT handles HRs seems to be critical, including park factors. Do the DTs take into account the park effects for the pitchers HR rate? If so, how?

More specifically, in his best season (1916), Ruth allowed zero home runs over 323 innings, with theoretically half these games at Fenway. Yet the DT for that season has him projected for 22 HRs over 274 innings. Why?

At any rate, the fact that I want to know more about the details of the analysis means I'm giving you another thumbs up.

While I am not privy to the exact DT calculations, my understanding is that they take season, era, ballpark, and pitcher into account and then do a good deal of regressing to the mean. In other words, 22 HRs is about as well as you could expect Ruth to do if you played that season over again in a different era.

PECOTA might take a first year professional player who didn't allow any HRs in A-ball and project a poor HR rate for the majors in the following year. It is using the same types of principles (not counting ground ball rate).