Wednesday, September 18, 2002

Pitchers for the Hall of Merit

Let’s start discussing the pitchers here. I don’t have any adjusted numbers to post yet, but there’s no reason we can get the discussion cranking.

I take that back. I went through season by season a ways back and came up with pythagorean W-L records for each pitcher, based on his ERA vs. park adjusted league (season by season), adjusting for an average number of decisions in each season (based on the pitcher’s career IP/dec ratio for his career). Those numbers will be in the extended text.

Reader Comments and Retorts

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

So no 20th-century Ps with 4000+ IP and better than a 110 ERA+ are not in the Hall of Merit.
Jack Quinn's career numbers of 114 ERA+ and 3920 IP are close, but he's within 500 IP only of Bunning (also 114 ERA+) on the ERA+ list just above.

In "2009 Ballot Discussion",
244. Chris Cobb Posted: October 19, 2008 at 09:31 PM (#2988410)
>>
Although the pitcher OPS+ vs. ERA+ is a kind of interesting debate, in my view neither of these measures is as useful a measure as ones which adjust systematically for pitcher's defensive support.

Here are some data from one such measure: DERA from Clay Davenport's WARP1.

WARP1 adjusts pitcher's runs allowed in two ways. First, it normalizes runs allowed to a 4.50 r/g environment. This turns a pitcher's RA into NRA--normalized runs allowed. Second, it adjusts NRA for team defensive efficiency to show what a pitcher's rate of runs allowed would be with league average defensive support: this is DERA.
<<

Just now I have uploaded some career data for 400-odd pitchers at yousendit.com
>>
We've stored your files on our server and sent your recipient(s) an email with instructions for retrieving it. Your files will be available for 7 Days.

Here is the link for the file you uploaded:
http://www.yousendit.com/download/Y2ovS3d1K3hJMHMwTVE9PQ (link)
<<

That file is in csv format, 438 x 14 with field names or column headers in the first row:

There are ten fields of quantitative performance data (bold) of which seven are from the player 'DT cards' at baseballprospectus.com (underline) and three are familiar.

[time] represents mlb debut decade as an integer -3 to 9 meaning 1870s to 1990s. (I see that K-Rod is in the table with time=10. In general a post-1999 debut is represented by time=9.) I use the time variable to account for pitcher batting. The difference in OPS+ (pitcher batting) is about five times the difference in time, or five points per decade. That favors 1870s and 1880s pitchers because the decline in average pitcher batting was then about ten points per decade.

For the allocation of credit between fielding and pitching, according to Davenport, I use RAA and PRAA with XIP as the measure of playing time. So RAA-PRAA is the career bulk allocation to team fielding and (RAA-PRAA)/XIP is the rate measure while PRAA and PRAA/XIP are the career bulk and rate allocations to pitching. Those units are normalized runs and normalized runs per normalized inning.

For one-shot incorporation of pitcher batting, I assign the pitcher 1/12 of team plate appearances. The calculation is not simple and I never do it by hand.

[lahmanID] is the unique numerical identifier for the pitcher in the baseball-databank, useful if you will use this with one of its offspring baseball databases such as those provided by Sean Lahman's Baseball Archive.

[time] represents the mlb debut decade, see just above

[name] is a version of the pitcher's name, surname first and sometimes alone

PRAA is a "fudge factor" inserted into WARP to prevent old-time pitchers from accumulating huge WARP totals. Although no one knows how it's calculated, its basic premise is that pitchers who use their defense more (i.e., have a lower strikeout rate) deserve less credit for their outs than those who use their defense less. This is just not the way baseball works--zillions of pitchers have fashioned successful careers by relying on their defenses. The proper approach (in the absence of a modern PBP metric that shows how easy to field the balls they allowed were, like PZR) is a slightly modified version of the NRA-DERA adjustment, where the team's overall fielding rate above or below league average is backed out of the pitcher's runs allowed (but with the allocation done by balls in play rather than innings pitched). But anyone who wants to tell me that Warren Spahn deserves less credit for the outs he got than Randy Johnson deserves for his is gonna have a heck of a time convincing me.

As for XIP, it's just a really ghetto attempt at a leverage index, based on meaningless stats like wins and saves rather than the actual leverage information we have available for much of baseball history.

Hmm, I think you're right. Just looking at the 2002 Boston Red Sox, there's the same 8-run gap between Pedro and Lowe's RAA and PRAA, despite the fact that Pedro's K rate was a zillion times better. I stand corrected. Forgive me.

306. Chris Cobb Posted: October 20, 2008 at 03:49 PM (#2990285)Dan R: do you mean PRAR rather than PRAA?

As far as I know, PRAA--pitching runs above average--is calculated directly from DERA.

I don't know which is calculated from which but the relation between them incorporates XIP. If XIP is no good then PRAA alone is no good.

But DERA is equivalent to PRAA/XIP. The latter is a measure of normalized runs saved by the pitcher per inning; multiply by 9 and subtract from 4.50 to express that on the NRA and DRA scale.

Perhaps NRA bears the same relation to RAA/XIP as does DERA to PRAA/XIP. I don't have NRA except by that calculation.
RAA and PRAA seem more useful to me because the units need no interpretation and the difference also has a useful interpretation.

His BPro NRA is 2.44, his DERA is 2.68, meaning he pitched in front of good defenses.

The league in 1955 averaged 4.48 R/9 IP and Comiskey (adjusted for not facing White Sox hitters too) was neutral.

Using these numbers (4.48 and 2.19), I get a pythagorean exponent of 1.72, which works out to a .774 WPct.

I then take the 2.19 and 4.48 and use the Excel Solver function to solve for the same .774 WPct at 4.50 R/G league average, to normalize across all eras (and to put it on the same scale as NRA and DERA). The adjustment is small here, just bumps Pierce to 2.20.

I then take the 2.20 and add .17 for the AL league adjustment (1955 AL was a very easy league for pitchers). This adjustment is also normalized to a 4.50 environment.

So this bumps Pierce to 2.37. To this 2.37 I then add the difference between his DERA and his NRA, in this case, .24, bringing his final R/9 to 2.61.

Pierce pitched 189 innings as a starter in 1955 and 16.7 in relief.

The starter IP adjusted for league leader norms. Here I make the adjustment based on an average of a range of the league leaders. The range is just is: (Teams in League * .25) + 1 thru Teams in League * .75. So in a 16 team league, I use pitchers 5-12. In an 8 team league it's 3-6. This allows for a guy like Wilbur Wood to get full credit for his innings. If you base it off the league leader or top X in the league, outliers throw off the averages.

I then adjust this so the average of the pitcher in the range is at 258.3 IP. Since BPro uses the top 5 pitchers in a league (no matter the league size) set to 275, it works out to an equivalent number across all years, but isn't as impacted in individual years by outliers.

In the 1955 AL, the average pitcher in this range threw 237 innings. So Pierce's starter innings get bumped from 189/237*258.3 to 206 IP.

The estimated LI for his 16.7 relief innings is 1.21. That comes to 20.3 relief innings.

206 + 20.3 = 226.3 IP for Pierce.

2.61 adjusted R/9.

I have replacement level set to 5.48 R/9. This works out to .406 WPct.

I would also say that anyone ranking pitchers should absolutely include their hitting relative to their peers (which is has changed greatly across time). I use RCAP from the Sabermetric Encyclopedia, as it's simple and easy and close enough. I do apply the same (starting, not leverage) innings adjustment to this as well.

Oh and the 72.3 PRAR is divided by 9.558 (the amount of runs it takes to flip one win in a 4.50 R/G league) and works out to 7.6 WARP. That converts to .123 PA (although that fluctuates depending on what years are included in the PA calc - my current sheet uses 1876-2005).

We should all be indebted to Joe for his thoughtful and thorough work on pitcher valuation, which has helpfully informed our deliberations on tons of candidates. However, I have a few very strong criticisms of his approach, which I will take the opportunity to register here:

1. Translation of relief innings thrown by starters: While Joe translates the innings thrown by starting pitchers *as starters* to a context-neutral environment, he makes no adjustment at all to relief innings thrown by starters, even though of course standards for relief innings thrown by starters have changed vastly over time (from being a frequent occurrence in the old days to a curiosity now). This biases results towards pitchers who padded their IP totals with relief appearances that are not translated (and, if they were translated on the same basis, would of course be translated to a big fat zero relief IP thrown by starters in the modern era). I think the translation should be performed on total innings pitched, with no distinction between starting and relief innings.

2. Leverage calculation: The prior effect is compounded by his use of raw leverage index as a multiplier for relief innings. Simply multiplying the wins-above-replacement total accumulated in relief innings by the leverage index for those innings is wrong, wrong, wrong. I do NOT see this as a matter of opinion or debate, I see it as empirical fact, because no replacement pitcher has ever thrown a high-leverage inning. What actually happens when an ace closer goes down is NOT that the AAA callup suddenly becomes the closer. Instead, he becomes the mopup man, and the mopup man becomes a middle reliever, and another middle reliever becomes the setup man, and the setup man becomes the new closer. If you are going to introduce leverage into your calculations as you compare them to replacement level, you *have* to incorporate this *reality* (often called "chaining") into your framework, or your results will be wildly and misleadingly favorable towards relievers (and towards starters who relieve), as I believe Joe's are.

3. Replacement level for relief innings: A third problem with Joe's treatment of relief innings is that he uses the same replacement level for starters and relievers, which is simply incorrect. Replacement relievers (who again, replace the mopup man, not the closer) have an ERA I think 90 points higher than replacement starters (I can look up the exact figure if someone's curious).

4. Seasonal vs. career innings translation: OK, enough about relief innings. Joe adjusts starter innings based on *seasonal* IP norms, but makes no similar correction for career length. Pre-1920 pitchers' career innings totals are fairly comparable to those of modern pitchers; they just were packed into fewer seasons. By dinging those pitchers for high seasonal IP norms but not crediting them for low career length norms, Joe unfairly hoses short-career workhorses like McGinnity and rewards long-career guys with lower seasonal totals (like, say, Jamie Moyer).

5. Standard deviations: To my knowledge, Joe has not yet released a version of his numbers incorporating standard deviations for pitchers, which vary even more than they do for hitters. A 150 ERA+ in the 2003 AL is NOT the same as a 150 ERA+ in the 1923 NL, or anything close to it--a quick check at Ink scores should tell you that. Thus, pitchers from high-stdev eras (particularly post-1993) are overrated, and those from low-stdev eras (the 20s and 50s in particular) are underrated.

I don't think the 90 points higher applies across time, that's a modern phenomenon because closers today throw on a very regular schedule and generally for just one inning.

When firemen were pitching 1-3 innings an appearance, ERAs were closer to that of starters. Any adjustment would have to take this into account.

Regarding point 5, I don't adjust for standard deviations, but I do adjust for league quality. I generally take the difference between NRA adjusted for all-time and NRA adjusted for season for the top 5 pitchers in the league in IP. I used this to adjust for things like expansion and AL vs. NL in any given season being weaker or stronger. I don't use it as a 'timeline', once expansion washes out (generally 3-8 years, depending on the size of the expansion), all seasons are created equal, save the AL/NL difference.

Regarding point 2; IIRC the chaining effect isn't huge. But I could probably add it pretty easily.

Regarding point 1, I think you miss the point. Those relief innings are in addition to innings pitched as a starter, and the adjustment was added because people (correctly) said that if I was giving leverage credit for relievers, starters as relievers should also get that credit.

I don't believe those innings need to be translated the same way as starter IP, because they are 'supplemental' innings; and again, all I'm doing is giving credit for his time as a fireman.

I also don't adjust reliever innings (starter or reliever) for any era in terms of norms because as innings went down, ERA+ went up. There's a direct relationship, as managers traded off bulk for effectiveness. So Gossage gets credit for his huge IP numbers, but at a lower ERA+. Rivera gets credit for his huge ERA+, but in lower IP. In the end it balances.

I also disagree that no replacement level pitcher has ever thrown a high-leverage inning. Heck Tom Gordon had 2 saves for the Phillies this year; Jason Hammel had two for the Rays. Heck, Troy Percival had 28 saves for the Rays with a 96 ERA+.

I think the foundation is good, but some of the adjustments you suggest are needed. I'm definitely interested if you can help me take those steps to adjust further. I especially curious as to how you'd attack #4.

Guys, I just want to say thanks for all this discussion. I've been in the process of rebuilding my pitcher system, and I haven't obtained fully satisfactory results yet. I'll probably incorporate elements from Joe's methods now.

1 (on my point 3): Well, is the right approach to use no adjustment at all (as you currently do), or to use the correct adjustment? It wouldn't be hard to track the starting vs. relief ERA gap over time to use as the basis for a floating replacement level (as I do for position players).

2 (on my point 5): As I've said time and again, standard deviations and league quality are most definitely not the same thing. This effect is completely independent of league quality, and it's BIG: Roger Clemens's 222 RA+ in the 1997 AL translates to just a 174 RA+ in Adolfo Luque's 1923 NL, while Luque's 188 RA+ in the 1923 NL counts for the same as a 256 RA+ in the 1997 AL. If you don't adjust for seasonal IP norms, the effect is SOMEWHAT (but by no means completely) mitigated by the fact that the long-term trend in seasonal IP has been down, while that in standard deviation has been up. But if you adjust for one and not the other (as you do), then you're going to be grossly overrating modern pitchers.

3 (on my point 2): Oh yes the chaining effect is big. I have a bunch more work to do on the subject before I weigh in definitively, but my initial research shows that to adjust for chaining you multiply LI by .61 and add .07 (e.g. a 2.00 LI becomes 1.29 after counting for chaining). I would note that this preliminary result lines up perfectly with Tangotiger's statement that ace reliever innings are worth 20% more than starter innings (not the 75% that "raw" LI would suggest).

4 (on my point 1): But you are not being fair to all eras, because those are 'supplemental' innings that deadball pitchers could throw with regularity and modern ones can't. The correct approach (in my view) would be to take the relief component of a pitcher's innings, adjust it both for leverage and chaining (so typically multiplying by something like 1.15), add the result onto his starters' innings, and then use the modified sum of the two as the basis to do your innings translation. E.g., the #5-#12 pitchers in year X averaged 280 innings as a starter and 25 as a reliever with a LI of 1.75, while the historical average is 255 innings as a starter and 5 as a reliever with a LI of 1.75. So the relief innings in year X are adjusted for leverage and chaining as 25*((.61*1.75)+.07) = 28.4, plus 280 for starter innings make 308.4, and the historical average relief innings are adjusted for leverage and chaining as 5*((.61*1.75)+.07) = 5.7, plus 255 for starter innings make 260.7, so the final innings multiplier for year X is .845.

5 (on my points 1 and 5): Well, if you start adjusting for standard deviations (as I think you should), then you can't rely on the lower IP totals to wash out the effect. You say that "in the end in balances," and my response is, "not always." Yes, the overall trends counteract each other, but they have not remotely moved in lockstep. The reason why we have a dearth of seemingly HoM-worthy 1980's starters is, in my view, that IP norms had already come down heavily off the levels of the early 1970's, but standard deviations had not yet risen to compensate. If you properly correct for both factors, you'll be fair to the 80's generation; otherwise, you'll underrate them. There are numerous other years/periods in the game's history where this is the case.

6 (on my point 2): OK, maybe NEVER is a bit too strong, but that's only if a bullpen is in REALLY dire straits. And in that case, it's not just the closer who's gone down, it's also his setup man and a pair of top middle relievers, so the "credit" needs to be divided among all of them. Moreover, saves are not always such a good proxy for leverage index, if they are of the three-run one-inning variety.

7 (on my point 4): Trying to come up with a suitable career length adjustment for starters drove me absolutely insane last year. I spent a whole week on it and got nowhere. I have tons of data I can share with you, and maybe we can try to figure something out together.

I guess Ayala is pretty close to replacement level at this point; he was acquired for virtually nothing. But again, that's only because Wagner got hurt AND Heilman collapsed AND other arms like Sánchez and Feliciano proved to be ineffective. A real perfect storm, for which the credit/debit needs to be distributed across all of the relevant pitchers, and not attributed exclusively to Wagner (which is what a metrics like BP's WXRL or Joe D's wins-above-rep-times-leverage do).

I have added to my WARP archive a spreadsheet with my deadly-accurate team defense scores for 1987-2005 (based on PBP and Retrosheet metrics), along with their equivalents in terms of a NRA-DERA adjustment for voters who use DERA. I hope the group finds this useful in its assessment of modern pitchers.

#3175 (on my points 1 and 5): Well, if you start adjusting for standard deviations (as I think you should), then you can't rely on the lower IP totals to wash out the effect. You say that "in the end in balances," and my response is, "not always." Yes, the overall trends counteract each other, but they have not remotely moved in lockstep. The reason why we have a dearth of seemingly HoM-worthy 1980's starters is, in my view, that IP norms had already come down heavily off the levels of the early 1970's, but standard deviations had not yet risen to compensate. If you properly correct for both factors, you'll be fair to the 80's generation; otherwise, you'll underrate them. There are numerous other years/periods in the game's history where this is the case.

"standard deviations had not yet risen to compensate"
Why not continue thus? "because no one had performed at rates someone should have attained, given the lower numbers of IP."

In practice a crucial question is the timespan for estimate of standard deviation (in raw WARP, I understand you use, but the point is the same for someone who is thinking about s.d. in ERA+). For example, measure standard deviations during 11 seasons centered on the current one, 1998-2008 for standard deviation 2003.

That's all, at least until I understand what this means where you do not measure s.d. at all, but predict it using a regression that covers more than one hundred years.

Well, I will try to explain it then. There is a lot of noise in the standard deviation statistic, due both to random fluctuation and to real changes in the talent pool ("star gluts" or droughts). If you use the actual standard deviation of a league as the basis for your adjustment, you are not measuring how easy it was to dominate but rather how much players happened to dominate it, which is not the same thing. This problem does not go away at all by using a moving average. First, you'd still unfairly penalize 1920's AL players for having the misfortune to compete with Babe Ruth, and unfairly reward 1910's NL players for not having to compete against Cobb/Collins/Speaker. Second, you'd wind up glossing over true changes in ease of domination that occur within your time frame, most obviously expansions and changes in the ball.

If you want to measure how easy a league truly was to dominate, then you have to ask yourself what makes a league easy to dominate or not. This is where the regression comes in. By exploring the relationships between various league factors (expansion and run scoring being the two most important) and observed standard deviations, we can determine what *really* makes leagues easy or difficult to dominate, *regardless of the performances in a given league-season.* The regression tells us that it is harder to put up a high OPS+/WS/WARP mark in low-scoring league far removed from expansion (like the 1980's NL) than it is in a high-scoring league close to expansion (like the 1998 NL). That doesn't mean that there will never be a bunch of great seasons in a difficult-to-dominate year (the 1992 NL certainly had many), or that there will always be a bunch of great seasons in an easy-to-dominate year (the 1926 NL had none). But if we are doing our job right, we will recognize that there just happened to be a number of terrific performances in the 1992 NL despite the obstacles, and they should all be considered outstanding, and that there just didn't happen to be any in the 1926 NL, despite how easy it was to accomplish, and that the already-low WARP marks for that year should still be penalized further.

This is why sunnyday's old claim (I don't know if he still holds it to be true) that my system got the arrow of causality wrong is incorrect. The adjustment I make for standard deviation doesn't "know" who's playing in a given league-season or in surrounding seasons, or how they performed. All it knows is that in a league scoring A runs per game, B years removed from expansion, C games long, with a strikeout rate of D, etc., *with a historically average distribution of true talent*, the league leader in ERA+ will come in at a score of around E. In the 1980's, this figure was still fairly low by historical standards, while seasonal IP norms had already declined sharply off their 1970's highs. As a result, I argue that the apparent dearth of great pitchers in the 1980's (as measured by ERA+*IP or WS or BP WARP) is illusory; not a true "star drought" but simply a reflection of the league context. This is why I think Gooden's 1985 has a strong claim as the best pitching season ever.

All of that until the capital E in paragraph three fits your method as you have taught it previously.
(and I think it is clearly explained here, too, although I now know some of it too well to be a good judge)

But the next two sentences, penultimate and antepenultimate, seem to me special pleading in favor of 1980s pitchers.
>>
In the 1980's, this figure was still fairly low by historical standards, while seasonal IP norms had already declined sharply off their 1970's highs. As a result, I argue that the apparent dearth of great pitchers in the 1980's (as measured by ERA+*IP or WS or BP WARP) is illusory; not a true "star drought" but simply a reflection of the league context.
<<

Why? Return to the crucial lines that I quoted immedly preceding (quoting my emphasis).
>>
The reason why we have a dearth of seemingly HoM-worthy 1980's starters is, in my view, that IP norms had already come down heavily off the levels of the early 1970's, but standard deviations[*] had not yet risen to compensate. If you properly correct for both factors, you'll be fair to the 80's generation; otherwise, you'll underrate them.
<<
[*] standard deviations = standard deviation norms, predicted by linear regression
Why should s.d. norms rise to compensate for low IP norms? That isn't one of the things that drives IP norms.
Statistical theory explains why
- standard deviation (not s.d. norm predicted by linear regression)
- of an average taken over innings, such as ERA or one of its sibling baseball stats,
- will increase when innings decrease
- if there is no contrary underlying change (here, no decline in pitching performance at the inning level).
Given the relatively short pitcher seasons measured by inning, there should have been some very high averages taken over innings, such as ERA and its siblings at season or career level.
We don't see that because there was a decline in pitching performance at the inning level.

Yes, ceteris paribus, we'd expect to see higher observed standard deviations if individual pitchers' innings totals declined. But this is NOT ceteris paribus--the low run scoring and long time since expansion meant that stdevs remained low even when they "should have" increased in line with decreasing pitcher workloads.

This is stimulated by my recent reading of some 2009 preliminary ballots (Cobb to Menckel tonight) and by the coverage of 60 eligible pitchers by Chris Cobb has interspersed with prelim ballots, chiefly during "2009 Ballot Discussion" page 4.

Does anyone except still count players by fielding position?
Howie Menckel counts career shares, I know.
Here is the integer version, every member counts 0 or 1, limited to pitchers.

Pitchers (Ward no, Dihigo yes)Hall of Merit
62 of 234 members

Hall of Fame
71 of 228 players
+ Wright no
+ Spalding yes
+ Griffith yes
+ Jackson no
+ Rose no
= 73 of 233
for purpose of comparison with HOM, but I am off by one :-(

So the difference is 11 or 12 more pitchers in Cooperstown, 17 to 19%. Right?

Personal Halls of Merit
Many HOM veterans have maintained PHOMs that now have 234 members. How many pitchers do you all have?

325. Paul Wendt Posted: November 03, 2008 at 09:22 PM (#3002239)
Personal Halls of Merit
Many HOM veterans have maintained PHOMs that now have 234 members. How many pitchers do you all have?

Though I am a first time voter, I have taken the liberty to travel back into time to construct a Personal Hall of Merit of my own, of which, I have chosen ~ 65 pitchers, excluding Monte Ward, but including Martin Dihigo and Bob Caruthers.

HOM not PHOM: Rollie Fingers (not real close, prefer Lee Smith ), Pud Galvin (I can see the arguments for induction, possibly better than Caruthers from that time era), Joe McGinnity (HOVG, I think Vic Willis was a little better).

PHOM not HOM: David Cone (comfortably above in/out line), Dwight Gooden (The 1980's were quite difficult to dominate, but he did for a stretch, just in), Dutch Leonard (Last pitcher in, nice career value from 40's era, I could be convinced that he is undeserving), Don Newcombe (with the extra credit and potentially unfair segregation, I see him above the line, ahead of Gooden), Rick Reuschel (best available pitcher), Urban Shocker (comfortably in, just below Cone), and Virgil Trucks (similar argument to Leonard).

Virtually no one's case is similar to Gooden's. He has arguably the best "one-year peak" of any pitcher in history, and then a bunch of icing on the cake. For those of us who value each marginal win above replacement more than the last, that's enough to bring him into the consideration set and, in my case, onto the ballot. For those who simply look for a high level of dominance over a given time period (the Al Rosen or Albert Belle voters, and I'd put Kiner there as well), he's probably not close.

Users of the Microsoft Office XP and 2003 programs Word, Excel, or PowerPoint—please install all High-Priority updates from Microsoft Update before downloading the Compatibility Pack.

<i>Thank you for your interest in obtaining updates from our site.
To use this site, you must be running Microsoft Internet Explorer 5 or later.
To upgrade to the latest version of the browser, go to the Internet Explorer Downloads website.

If you prefer to use a different web browser, you can obtain updates from the Microsoft Download Center or you can stay up to date with the latest critical and security updates by using Automatic Updates.
--

For me these links are slow now. Perhaps all of you are there ;-) or millions of MS customers are reading their coverage of the election and the Presidency.

Evidently I mixed up some italic and underscore tags but it's adequate, I think.
--

Bleed,
Evidently you rank Shocker much above and Luque significantly above Wilbur Cooper. Or you goofed while writing #329 or I goofed while searching it. (sighs) I put all three propositions above 20% each.

HOM not PHOM: Rollie Fingers (not real close, prefer Lee Smith ), Pud Galvin (I can see the arguments for induction, possibly better than Caruthers from that time era), Joe McGinnity (HOVG, I think Vic Willis was a little better).

My understanding is that JoeD and DanR respectively persuaded some of us to knock McGinnity and Willis down a bit. That is from my wetware compendium of HOM careers for player reputations. The Iron Man was promptly elected, before the career of his reputation was really underway. I suspect that he was good and durable enough for several seasons, and we have enough peak & prime oriented voters, that he would be elected again today.

Did I bash Willis? I have no recollection, and no particularly strong opinion on him either.

Joe Dimino's distaste for McGinnity represents what I think are two principal flaws in his system. The first is its reliance on BP's defensive adjustments (a weakness that, for the moment, my pitcher numbers share for seasons before 1987). The 1903-04 Giants allowed an extremely low batting average on balls in play. BP attributes a great chunk of that to the team's fielders, implicitly arguing that Mathewson and McGinnity were the beneficiaries of terrific defense. I *highly* suspect the inverse is true--that Mathewson and McGinnity, who combined for an enormous portion of the teams' innings, induced an inordinately high number of easy-to-field balls, making their fielders (Dahlen aside) look better than they were statistically. Michael Humphreys has sent me some team DRA numbers that do not suggest the 03-04 Giants had particularly strong defense as BP asserts.

The second is that it adjusts for seasonal innings norms over time but not average career lengths, so guys like McGinnity get absolutely crunched on their enormous in-season IP totals but are not compensated for the extra longevity they likely would have had in subsequent eras. Waddell is another, less extreme, example of this.

These are both *extremely* difficult issues to resolve, so this isn't meant to be a criticism of Joe in particular--I don't have any easy answers for either of them. But my instinct is that McGinnity was much more Meritorious than he appears to be in Joe's system, and I would vote for him again for the HoM if I could.

Bleed,
Evidently you rank Shocker much above and Luque significantly above Wilbur Cooper. Or you goofed while writing #329 or I goofed while searching it. (sighs) I put all three propositions above 20% each.

If you look at #329 closely, Shocker is listed as slightly below Cone, but in my PHOM, while Luque was listed in the ~10 other guys I could be convinced to give a future PHOM spot. His case rests on solid seasons from 24-26 where we have no MLE's though.

Joe McGinnity (HOVG, I think Vic Willis was a little better).

I was a bit rough when I said Hall of Very Good for McGinnity. I see him as much more meritous than Bender, Hoyt, Joss, and Sutter, who I deemed HOVG.

McGinnity would place within the Top ~20 pitchers I have not elected to the HOM.

I am highly interested in learning more about Michael Humphrey's study of team DRA for McGinnity's time period, and for any other candidate.

McGinnity, if it was more skill than defense with regard to his high number of easy-to-field balls, would seem like a borderline candidate/bottom quartile worthy.

Dan, how is your study progressing on pitchers?

Pitchers are always going to be a challenge to place, no matter what system, but the electorate has done an outstanding job of avoiding mistake choices. No Jesse Haines here!

Bleed quoted me and replied:>>Bleed,
Evidently you rank Shocker much above and Luque significantly above Wilbur Cooper. Or you goofed while writing #329 or I goofed while searching it. (sighs) I put all three propositions above 20% each.
<<

If you look at #329 closely, Shocker is listed as slightly below Cone, but in my PHOM, while Luque was listed in the ~10 other guys I could be convinced to give a future PHOM spot. His case rests on solid seasons from 24-26 where we have no MLE's though.

not Cone, Wilbur Cooper. Evidently you rank Shocker >> Cooper and Luque > Cooper, for you do not mention the latter.

In "2009 Ballot Discussion" #320, Chris Cobb
http://www.baseballthinkfactory.org/files/hall_of_merit/discussion/2009_ballot_discussion/P300/#3000219
listed eligible major league pitchers since 1893 who merit consideration along with Bucky Walters --merit without consideration of high peaks and career gaps, as may make the cases for Dizzy Dean and Sal Maglie.

Including Walters, that’s 60 pitchers. To my knowledge, no one with anything like a serious HoM case has been omitted. (We can safely let the cases of George Mullin, Earl Whitehill, Paul Derringer, Larry French, Freddie Fitzsimmons, Rube Marquard, and so on, lie unconsidered.)

A few days ago I asked whether Chris overlooked Ned Garver.
Now I have looked at this using similar career rates and quantities. In my desktop database, there are now 62 pitchers marked 'm' for merit and 60 marked 'c' for cobb, in one field because the groups do not overlap.

Among all the other pitchers who worked mainly 189x-198x, Al Orth and Ned Garver and Claude Passeau show up with Walters at the top, then a gap. They are among the leaders almost whatever I do, although by ignoring the DERA (Davenport team fielding) analysis entirely and using ERA+ to measure pitching skill, Walters then leads with Nig Cuppy, Doc White, Ed Reulbach, and Harry Brecheen rather than with the trio I have named here. (Those are three turn of century guys and another WWII guy like Walters. Of course they enjoyed support of great fielding teams, like Walters's champion Reds: Cleveland and Boston in the 1890s, two Chicagos in the 19-aughts, St Louis in the 1940s. That is why they show up by pure ERA+.)

Using DERA with my incorporation of batting (pitcher gets 1/12 of team plate appearances, which have uniform influence on team scoring), yields these career leaders, displayed in rank order with "gaps".

runs
350 Orth

330 Garver
328 Walters
323 Passeau

303 Lucas Red
299 White Doc
298 Dickson Murry
297 Brecheen

(280 and below, Bill Dinneen, Schoolboy Rowe, and others)

Passeau, Dickson, and Brecheen were average or avg+ batters. The other five were excellent.
(There is no below average batter, not to mention a weak one, for he wouldn't show up among the leaders, by some kind of career bulk, within a group selected this way, as maybe-overlooked. (Skimming down the list, the first really weak batter is rank 13, Paul Derringer 265 runs, and the next is rank 19, Milt Pappas 253 runs. And neither one was so weak at bat as the "other five" above were strong.)

Ned Garver suffered with just about the worst team fielding ever, on average for his career. Orth and Passeau played with exceptionally weak team fielding, Lucas and Dickson very weak (about 0.10 runs per game, one-third what DERA says Garver teams tossed and booted away). Meanwhile Walters, White, and Brecheen enjoyed very strong team fielding, on average for their careers (about 0.15 runs per game the other way).

The effect of career workload and replacement-level

Oh, Brecheen pitched about 1907 innings, Garver and Lucas 2500, Passeau 2700, Walters White and Dickson about 3100, and Orth 3355. The career sum reported takes 1.1 x league-average runs as point of reference, or 91 on the ERA+ scale. So Orth, White, and Dickson are the ones who will stick with Walters if one progressively decreases the reference point or replacement-level. Alternatively, Garver & co. will pull away from Walters if one increases the point of reference. Here is the career runs above average, that is taking league-average as point of reference, and the net+ rate from which it is calculated. There are two, eight, or fifteen leaders, again rather well defined by gaps.

That covers everyone(?) with 1600 innings pitched, career centered about 1895-1995, where the index net+ for rate of runs above average is derived from DERA and OPS+. Skimming down the ranks there are only four pitchers with "net+" rate better than Cuppy, all in the high teens: Cooper, Tudor, Maloney, and Mungo.

The point of 1600 innings is to eliminate career relief and "mainly relief" pitchers, a class that does not include Shantz and Dickson, and Johan Santana. This analysis credits Mariano Rivera with 388 runs, yes, even when I make him the worst batter around and give him 1/12 of team plate appearances as a batter. Hoffman, Wagner, and Lee Smith also surpass Ned Garver. With careers centered no later than 1985, the leading relief pitchers are relief aces Hiller and Quisenberry at 172 runs, essentially tied with Walters in sixth place.

So what?
Passeau pitched through WWII and put up two of his better seasons 1944 and 1945, by this runs measure. (Walters also pitched through the war.)

Orth worked at the turn of the century and just after, so his candidate value is KOd by the AL expansion. He should have electoral value by casting his shadow over or even near some other marginal candidates: ouch, my guy was another Al Orth in sum (albeit with better arm and worse bat).

Garver? Does the 1950s AL discount cover pitchers as well as everyone else? Certainly not quite so well. With only ~2500 innings Garver slips down the ranks as one increases that credit for league-average work.

Alternative recommendations
Add Orth, Garver, and Passeau to the list, making 63 here but a nice round 125 with the 62 HOM pitchers.

Add Orth and Garver to the list, making 62 here which is a nice match for 62 HOM pitchers.

The arguments are essentially aesthetic and marred because the 62 HOM pitchers include several from before 1893, several from the Negro Leagues, and a few relief specialists.

Substitute Ned Garver for Johnny Sain, retaining the sexagesimal value of 60.
Let the shadow of Al Orth fall upon Jack Morris.
And so on.

I have team DRA (albeit I believe somewhat outdated numbers) for all team-seasons since 1893. They put the Giants at 33 runs *below* average in 1903, and 42 above in 1904 (compared to +26 and +119 according to BP).

Not sure which study in particular you're referring to, Bleed the Freak. I an send you my pitching numbers if you'd like, but they're just based on DERA, whose flaws we have just been discussing.

Now in "2009" we have four pitchers among the top 15 incumbents, or ranks #4 to #18 in the 2008 election:
Bucky Walters, Dick Redding, Luis Tiant, David Cone in election rank order.

Here is the pitcher subset from the first six ballots cast, down to rank 30 if provided, quoting in full the comment on the highest-ranked pitcher.

2. Bridges: Tommy Bridges - Best rate production of the pitchers available, maintained that rate for a long period of time despite (perhaps because of) innings pitched numbers that aren't overwhelming. There are fewer pitchers elected from the WWII era than any other. It wasn't easy to pitch in the AL of the 1930s and early 1940s. Deserves some war credit. Looks even better in the standard deviation adjusted numbers. Incredibly strong PWAA - even with Walsh, Lyons, Saberhagen, Bunning. Best pitcher currently eligible.
4. Tiant
6. Cone
10. Lee Smith
11. Shocker
---
16. Jim McCormick
19. REDDING, pre-NeL
20. Quinn
22. Kevin Appier, new eligible 2009

That is 9 pitchers in ranks 1-25 (where complete rankings end).

2. Joss: I’m now [unspecified past] even more convinced I missed him earlier, and that adjusting innings down for dead ball pitchers is illegitimate. 2327 IP at an ERA+ of 142. 160-97 by age 30. If you assume the rest of his career would have been 1800 IP, 120-90 with an ERA+ of 110 (somewhat conservative, assuming you boost his last sick season, though pitchers didn’t last as long as they did later) then 50% credit would put him at 3227IP, 220-142, with ERA+ of 130. 25% credit puts him at 2777 IP, 190-120, with ERA+ of 136. Substantially better than Koufax. OPS+20. Electorate needs to take him more seriously.
3. Cicotte
8. Leever
9. John
10. Mays
(11. Elmer Smith, primarily LF but his adjusted career includes "about 1400IP at an ERA+ of 113 and a W/L of about 96-72")
13. Mickey Welch
---
(16. Van Haltren, primarily CF and "significantly below Elmer Smith, either as hitter or pitcher.")
19. Cone
21. Tiant
22. Willis
23. Lee Smith
26. Reuschel

2. Redding PHoM- 1975. Great peak years between 1914 and 1919 including an estimated 2.14 ERA in 321 innings for Chicago in 1917 (according to i9). Lost a half a year in each of ’18 and ’19 due to military service. Even so, his career MLEs of 234-174 put him in the neighborhood if not ahead of contemporaries like Coveleski, Faber and Rixey.
4. Newcombe
7. Bridges
11. Grimes
13. Willis
---

With little confidence I have anticipated a year-on increase in the numbers of pitchers earning votes in the 2009 election. First, the Hall of Merit includes only 62 of 234 pitchers or 26.5%; that share is about as low as anyone advocates (the Cooperstown share is 31.5%). With passage of one year some voters might find time to align their practice with values. With passage of one year there would be high turnover in the electorate too, presumably with regression to a more representative population. Second, Chris Cobb recently identified 60 leading pitcher candidates. Especially in absence of any corresponding effort concerning another class of HOF members, the attention effect would evoke more support for pitchers.

After six ballots cast, there may be some weak evidence for my anticipation. Rickey Henderson is a given at the top of the ballot, expected to be a unanimous winner. Career pitchers fill 29 of the 84 remaining ballot positions on the first six ballots cast.

A point for our "non-methodical"/"eyeball"/"subjective" voters on pitcher evaluation: How many of you take into account pitcher hitting? Not just in the cases where it's obviously valuable, like Ferrell or Lemon, but in those where it's extremely *not* valuable?

Here's an example: who was the most valuable pitcher in the 1966 NL? In one corner, we have Sandy Koufax, who allowed 74 runs in 323 innings, in front of a +5 team defense, in a 91 PF ballpark. In the other, we have Juan Marichal, who allowed 88 runs in 307 innings, in front of a +11 team defense, in a 103 PF ballpark. The league average was a 4.10 RA.

So, we take Koufax's 74 runs allowed, divide them by 0.91, and add 5*323/1458, to get an adjusted total of 82.4 runs allowed. A league average pitcher would have surrendered 147.1 runs, making Koufax 64.7 runs above average. Add on 20.83 1966 NL-runs per 200 IP for replacement level, and Koufax is 98.3 runs above replacement.

Now, the same for Marichal. 88 RA, divided by 1.03, plus 11*307/1458 makes 87.8 runs allowed. Against a league average pitcher giving up 139.8 runs, that's 52 runs above average and 84.0 runs above replacement.

So, Koufax by a comfortable 14.3 runs, rrright? Wrong.

Marichal had a very nice offensive year for a pitcher, hitting .250/.263/.321 in 114 PA, which comes out to 6.0 runs below league average. By contrast, Koufax had a monumentally putrid year at the plate, "hitting" .076/.113/.102 (9-for-118, with 3 doubles, 5 walks, and 57 strikeouts) over 124 PA, which is a ghastly 20.3 runs below average. Now, a replacement pitcher, being an average-hitting pitcher, would have mustered 12.1 runs below average in Marichal's PA, making the Dominican Dandy a further 6.1 runs above replacement with the bat, bringing him to a total of 90.1 runs above replacement. In Koufax's PA, the replacement pitcher would have been 13.2 runs below average, so Koufax was 7.1 runs *below* replacement with the bat, dropping him to 91.2 total runs above replacement.

OK, fine, Koufax was still better. But there's a big difference between leading by 1.1 runs and leading by 14.3. My fundamental point is that you can get to a quite sizable 13-run/1.5-win gap on pitcher offense without either one putting up an OPS+ above 60--which I imagine would be the bare minimum for a pitcher's hitting to get noticed by an "eyeballer."

I count pitcher offensive contribution but I discount the value relative to their pitching value. If I didn't AL pitchers post-DH would benefit too much. I think pitcher offensive contributions are deleveraged compared to other hitters - lots more bunts and pinch hits in crucial situations.

AL pitchers post-DH should be HURT by your counting pitcher hitting, not helped. As a group, pitchers will be definition hit at the pitcher average, so the average pitcher in a non-DH league will have exactly the same score as an average pitcher in a DH league (0 batting wins above positional average and replacement in both cases). However, the *standard deviation* of pitcher WARP in a non-DH league will be higher than in a DH league, since the presence of the hitting statistics creates added variance around that average. Assuming that variance is distributed randomly--or really, as long as it's not *inversely* correlated to pitching quality--then it should produce higher observed WARP2 for the above-average half of the distribution.

Point taken on the deleveraging; but it would be good to calculate it empirically rather than guesstimate it. Fangraphs can give you exact numbers.

To provide a quick frame of reference to back up Dan R’s comment, here’s a table of career RCAP totals for pretty much all the pitchers currently under consideration. First are the bad to mediocre hitters, from worst to least bad.

(RCAP is runs created above position: these values are from Lee Sinins’ encyclopedia)

As you can see, there’s a 110-run difference between the best and the worst here, and there are many cases here where including batting value will make a big difference in head-to-head comparisons of pitchers.

You can calculate pitcher RCAP (well, WCAP) from my starting pitcher WARP spreadsheet. The pitching component of the Rep2 column is always 2.1 wins per 200 translated IP. So batting wins above positional average will be equal to BWAA2 - Rep2 - .0105*TransIP. For example, Walter Johnson in 1925 would be 0.7 - (-2.7) - .0105*190.6 = +1.4 (or 1.8 if you don't want to translate his PA as you translate his IP). Aaron Harang in 2005 (2-for-74!) would be -2 - (-3.3) - .0105*214 = -0.9.

That said, be careful with this, because it is comparing a guy to the pitchers' offensive average *just for that league-season*, and there is a lot of noise in that figure. It would probably be better to compare them to a 5-year moving average, but I haven't done that research yet.

Pitcher's batting runs created are normally tricky to calculate because of the prevalence and value of a successful sacrifice (the usual weighting of sacrifices as compared with other batting events in the runs created formulae should not apply because of the average pitcher's batting ineffectiveness). In Koufax' case, it appears that he was such a bad hitter that the sacrifice doesn't help him. In 260 PAs with a runner on first (probably about 170 with less than two outs), he successfully sacrificed 32 times. And he came to the plate with the bases loaded 31 times, walked twice (can you imagine the steam coming from the opposing manager's head?), had a solitary single, two sac flies, fifteen strikeouts and three GIDPs.

I wonder if his ineffectiveness at the plate provided some kind of bizarre psychological motivation for him on the mound...he did do very well in low-scoring games. A historical WPA analysis, from both the batting and pitching perspective, might be fun.

Replacement level hitting for a pitcher is the hitting of an average pitcher. No one gets selected to pitch based on his hitting ability.

So I take RCAP (Runs Created Above Position) from the Sabermetric Encyclopedia and use that as the starting point. I adjust it for a few things, as you can see in the spreadsheet, but that's the basic starting point.

It also has the nice effect of treating AL pitcher hitting since 1973 as a zero, neither a plus, nor a minus for those guys (except for the few AB they get during interleague play).

I also try to 'strip out' the pitcher hitting and position player hitting for pitchers that also played the field or PH in a season.

Red Ruffing's pinch hitting (or the % of his PA that came as a PH, more appropriately), for example is compared to that of the replacement level hitter in a league, while his pitching PA are compared to that of an average pitcher in the league.

When I was looking at Ruffing and Ferrell as PH, I was thinking of a hypothetical 4th/5th OF. While there are always some teams with a very good hitter on the roster trapped behind their starting OF/1B, and occasionally a team has had a Smokey Burgess/Gates Brown type, but I think that for most teams it tends to average out as a little less than a league average hitter. The guys who hit at league average or better tend to find their way off the bench one way or another.

As an aside, I remember that the team of my youth, the 67-68 Cardinals, pinch hit for Maxvill (SS/#8 hitter) a lot. Looking over those teams' bbref pages, I realize that quite a bit of the pinch-hitting must have come from infielders (Gagliano) or catchers (Ricketts, Edwards) rather than outfielders, and overall, I'm underwhelmed by how good the bats on the bench were.

Against that background, I decided that Ferrell, who was a near league average hitter (better in some years), and who provided something specific as a HR hitter, did provide positive value as a PH. (There are certainly particular situations in which HR rate is the most important attribute you want in your PH; there are other situations in which you're looking for OBP and yet others in which you're looking for BA.) But for Ferrell, who was the best of the type, the amount of the benefit was rather small over his career - at best a game or two of value. And Ruffing contributed less than that. The big impacts of Ruffing's and Ferrell's hitting (and Walters, and Newcombe) came when they were pitching, and in those cases you can compare them to other pitchers. (I think I set the baseline for that too low, however.)

=============

An off-the-wall question about PH: Suppose you were playing in some sort of fantasy league, and you were allowed to add one player to your lineup with the rule that he could only ever be used as a PH (and subsequent baserunner), once in a game. If the three players available to you were Rickey Henderson, Tony Gwynn, or Mark McGwire, which one would you want?

Gwynn is much closer to the classic PH stereotype, including his left-handedness. There are a few narrow situations, like having the winning run on third with two outs (and a RHP on the mound) when you'd like to see the batting average. But even then, it's going to matter who's coming up next as to whether the IBB is a viable strategy, and if the bases are already loaded, then we're back to looking at OBP.

What is the classic PH stereotype?
--meaning pinch-hit appearance, not pinch-hitter

How many classics are there?
Two or three, including one where a team bats for the pitcher who has neither pitched very well(*) nor been knocked out of the game? This middling, commonly middle-inning, pinch-hit situation must constitute an important share of all pinch-hit appearances. There is no tendency for high leverage in the base-out situation or the run score. The pinch-batter "never" provokes a pitching change so he suffers or enjoys the platoon (dis)advantage as the oppo pitcher happens to use the same or oppo hand.

That's worth approx 15 points of ERA - or about 7 ERA+ points in relation to Koufax- enough to give him an ERA+ advantage over Koufax.

That is very close to my calculation that uses a linear approximation for pitcher batting skill as a function of time and a constant pitcher share of batting workload. (I have interpreted it as a workload, but it may be interpreted as leverage.)

For example,
Gibson gains 3 points on the ERA+ scale and Koufax loses 5 points on the same scale.

This coincidence is no credit to me and little check on JWPF13's or Lee Sinins' arithmetic, because Gibson and Koufax played mainly as leaguemates (the linear approximation is irrelevant) and they both completed "most" of their games (with normal batting shares for career starting pitchers).

Regarding the pitcher data that I posted and stax re-posted (#303-304 and #316),
I wrote in #303:[time] represents mlb debut decade as an integer -3 to 9 meaning 1870s to 1990s. (I see that K-Rod is in the table with time=10. In general a post-1999 debut is represented by time=9.) I use the time variable to account for pitcher batting. The difference in OPS+ (pitcher batting) is about five times the difference in time, or five points per decade. That favors 1870s and 1880s pitchers because the decline in average pitcher batting was then about ten points per decade.

That is the linear approximation. Using this table of data it is not even a linear approximation but a stepwise linear one with ten-year steps. Even so it may be "better" than anything exact, on the grounds DanR mentioned --high variance in the single-league-season averages for pitcher batting.

352. Chris Cobb Posted: November 26, 2008 at 01:52 PM (#3016422)To provide a quick frame of reference to back up Dan R’s comment, here’s a table of career RCAP totals for pretty much all the pitchers currently under consideration. First are the bad to mediocre hitters, from worst to least bad.

(RCAP is runs created above position: these values are from Lee Sinins’ encyclopedia)

Suppose that we
- accept DERA (based on something fancier than earned/unearned runs) as an improvement over ERA, but hedge by giving 1/3 weight to the official measure of pitching rate.
- incorporate batting skill measured by OPS+, using my simple method indicated above.
- give no credit for fielding.

Suppose that we
- give no credit for league-average pitching-batting; that is, use league-average "net pitching-batting" as the point of reference for pitcher contributions in runs

Then the peak pair Frank Hahn and Jerome Dean rank 5-6 by career runs above baseline(average). Four other pitchers gained more runs with their arms and bats.

Dizzy Trout worked through WWII and Urban Shocker missed one season during WWI.

Among the sixty pitchers, there are three others in the top twenty by this measure of career runs who worked fewer career innings than Shocker: Addie Joss and Nap Rucker rank #10 and 12 based on about 2300 innings; Eddie Rommel ranks #17 based on about 2500 innings.

--
Relative to baseline 0.95, or 5% below league-average runs (better than average pitching-batting), Hahn and Dean rank 2-3 between Trout and Shocker. If you discount Trout's work during WWII and give Shocker no credit for 1918 (or no more than four net runs), that is just enough to put Hahn and Dean at the top of the heap. Joss, Rucker, and Rommel rank 7, 11, and 14; there is a big gap between 11 and 12.

--
Relative to baseline 1.05, or 5% above league-average runs (below average pitching-batting), Hahn and Dean rank 10-11, behind the same four re-ordered plus five others who worked a lot more than 2000 innings: Cicotte #5, Bridges #6, Cooper, Cone, and Leonard. Now Joss and Rucker rank 14-16.

--
Relative to baseline 1.20, or 83 on the ERA+ scale (average batter who yields 20% more than average runs), Joss and Rucker now rank 34-35, a little below the median among the sixty pithers. Hahn and Dean rank 38 and 41. Rick Reuschel is still in the top spot with Tiant 4, Trout 11, and Shocker 14.

Cell values represent rank order among all 60 pitchers, by career runs with respect to the benchmark rate (column). The pitchers (rows) are ordered by rank w/r benchmark 1.05 (column three). It happens that the same 24 pitchers hold the first 24 ranks when the benchmark is 1.00 (column two) or 1.05 (column three). That is indicated by whitespace separating the first 24 from the rest.

Eight pitchers have negative career runs w/r benchmark 0.95 (column one). That is, their career pitching-batting rates were not so good as 5% better than league average.
: Lolich, Friend, Hunter, Tanana, Blue, Hough, Martinez, Morris
Three fall short of career league-average rate and thereby have negative career runs w/r benchmark 1.00 (column two).
: Hough, Martinez, Morris
All sixty pitchers were better than 5% worse than league average by career pitching-batting rate. With respect to benchmark 1.05, Johnny Sain ranks #58 ahead of Dennis Martinez and Jack Morris only. He slips to #59 behind Martinez w/r benchmark 1.10, barely holds that position w/r benchmark 1.15, and dislodges Jack Morris from the bottom of the barrel w/r benchmark 1.20 (column six).

Sain takes the bottom spot from Morris at benchmark 1.16 and yields it to innings trailer Mel Parnell at benchmark 1.33 (75 on the ERA+ scale).
Tommy John takes the top spot from Reuschel at benchmark 1.24 (about 80 on the ERA+ scale).

Suppose that we
- accept DERA (based on something fancier than earned/unearned runs) as an improvement over ERA, but hedge by giving 1/3 weight to the official measure of pitching rate.
- incorporate batting skill measured by OPS+, using my simple method indicated above.
- give no credit for fielding

See #374 for top ten among 60 Cobb pitchers, by those methods.

But #372-373 consistently put
- 2/3 weight on DERA and on Clay Davenport's "something fancier" measure of innings pitched (XIP), in combination denominated in runs;
- 1/3 weight on ERA and on official innings pitched (IP), in combination denominated in runs.

The alternative benchmarks represented by six columns in table #372, or six places in angle brackets #373, simply provide different methods of combining the rate and the playing time, and thereby denominating in runs.

Using combined statistics in #372-73, "downstream" from the data table that I have posted here (#303-304 and 316), I forgot that I had incorporated the two different measures of playing time. It is easy enough to re-do but I don't plan to re-do it here unless/until I have more to offer.

The difference is not great because none of the 60 pitchers worked much in relief. Davenport's XIP is close to IP for starting pitchers.
. . . Jim Kaat worked about 30% of his pitching games in relief but without great leverage, I infer, because D credits him with XIP=4475, less than the official IP=4530. Jim Perry worked about 30% in relief; D credits him with XIP=3288 only slightly greater than the official IP=3285.

Among the 60 pitchers,
Wilbur Wood, Eddie Rommel, and Charlie Hough alone worked about equal numbers of games in relief and starting roles. Davenport credits Rommel with about 1% more than official innings pitched and Hough with about 1% less than official. (XIP/IP ~1.01 or ~0.99).

For twelve of the pitchers, XIP/IP > 1.02, meaning more than 2% extra credit for work in high-leverage situations, all according to the estimate implicit in XIP.

The three leaders worked at the same time and worked about 40% of career games in relief, as did the contemporary Leonard and the earlier Quinn. Luque and Shawkey worked 30-35% relief games; Bender, Uhle, Dean, and Harder 25-30% relief games. Among the twelve only Wilbur Wood pitched after the 1940s/50s, perhaps because all specialist relief pitchers take some of the high leverage innings for all teams since then.

Thirty-three of the 60 pitchers show career XIP/IP > 1 but only two beside Wilbur Wood --Bob Friend 1.0055 and Larry Jackson 1.0012-- debuted so late as the 1950s. Meanwhile only four of the subset with career XIP/IP < 1 debuted before the 1940s and the measure is trivially less than one for all of them: Joss 0.9995, Leever 0.9988, Hahn 0.9960, Breitenstein 0.9956.

Let me close with the trailers, whose workload is most discounted by using that fancy XIP.

I voted in some of the special elections and prepared ballots in others.

Insofar as XIP is incorporated in other statistics, I use it, sometimes deliberately and sometimes by accident.

The probability is high that I will both vote in the special election for pitchers next month, and that my vote will be influenced in part by XIP analysis such as it is. I don't yet know the order of calculation for these statistics (none of us yet knows?), which is a residual of which, and I probably won't learn that this month.

To peanut gallery:
1.
He gives me too much credit. I meant "Can you explain the differences?" rather than "Do you understand they are different, or do you think they are the same thing, bananas for brains?"
2.
Can anyone explain the differences? Why does XIP diminish Jim Kaat's official career IP by 56 inns, about 1%? Why does the translation diminish his career IP by 152 inns, about 3%? --or diminish his career XIP by a further 96 inns, about 2%, if that is more illuminating.

I think my method for translating innings is better than that on Prospectus. The spreadsheet I posted earlier has the info, I can resend if needed.

Basically the BPro numbers translate the top 5 pitchers in the league to = 275 IP.

I take the 'middle-half' of league leaders, (5-12 in a 16 team league for example; 3-6 in an 8-team league). Then I equate them to 258.333 IP.

The numbers are directly comparable to BPro's (end up on the same scale), except that my numbers give an outlier like Wilbur Wood in the early 70s, Robin Roberts in the 1950s, a fair shake historically. The BPro numbers limit those guys, by including them in the sample that sets the standard.

Also being top 5 in a 16 team league is much different than top 5 in an 8 team league.

Joe, as you know, I think chaining is a MAJOR issue. (See my NYT column today at http://www.nytimes.com/2008/12/07/sports/baseball/07score.html). The actual effect of leverage is less than one-half of what it appears: a reliever who is 1.1 wins above average with a LI of 1.8 (so a WPA of 2.0) in 70 innings only increases a bullpen's aggregate WPA by 1.65 compared to a .470 WPCT/-.2 WPA per 60 IP replacement reliever. (The effective leverage is thus 1.65/(1.1+(.2*70/60)) = 1.24, nowhere *close* to 1.8). I can send you the spreadsheet with the work-through examples if you'd like.

Also, you know I don't think it's fair for you to not include relief innings accumulated by starters in your innings translation, since old-time pitchers were given the opportunity to accumulate them in substantial quantities, while modern hurlers are not.

Also, you know I don't think it's fair for you to not include relief innings accumulated by starters in your innings translation,

I do include them.

I still think you have to account for them. The old-timers accumulated more value this way - heck starters were still occasionally used in relief into the 1970s.

The modern pitchers get years added onto their careers instead, since they are used more regularly. In the end, I think it's a fair tradeoff, once I adjust for chaining.

Also, I believe chaining was FAR less of an issue in previous eras, because bullpens were not nearly as deep as they are now. Until this is accounted for, I'm uncomfortable changing my leverage adjustment.

357. Mike Green Posted: November 26, 2008 at 04:07 PM (#3016543)Pitcher's batting runs created are normally tricky to calculate because of the prevalence and value of a successful sacrifice (the usual weighting of sacrifices as compared with other batting events in the runs created formulae should not apply because of the average pitcher's batting ineffectiveness). . . .

358. Bleed the Freak Posted: November 27, 2008 at 10:26 AM (#3016811)Joe D, does your PA for pitchers fully factor in Pitcher Hitting Value, or do you have to subjectively adjust for each pitcher[?]

Replacement level hitting for a pitcher is the hitting of an average pitcher. No one gets selected to pitch based on his hitting ability.

So I take RCAP (Runs Created Above Position) from the Sabermetric Encyclopedia and use that as the starting point. I adjust it for a few things, as you can see in the spreadsheet, but that's the basic starting point.

It also has the nice effect of treating AL pitcher hitting since 1973 as a zero, neither a plus, nor a minus for those guys (except for the few AB they get during interleague play).

476. Chris Cobb Posted: December 08, 2008 at 06:24 PM (#3023374)
[quoting DanR]
>>This is a good ol' chicken-and-egg question: did 1970-75 hurlers accumulate huge IP totals because era conditions made it easy to do so, or did it seem that it was easy to do so because you just happened to have a crop of incredibly durable pitchers? With the standard deviation issue, I've been able to "solve" this conundrum via multiple regression,
<<

So what is the solution, as you see it? You have probably mentioned it before, but I confess I don't remember what you have concluded.

477. David Concepcion de la Desviacion Estandar (Dan R) Posted: December 08, 2008 at 06:34 PM (#3023384)
For stdevs, the solution is to use a regression-projected stdev rather than the actual one. But I don't quite see how the same approach could be used for IP.

479. DL from MN Posted: December 08, 2008 at 10:26 PM (#3023505)
I think you have to look at usage patterns and opportunity. How many starts did starters average. How many innings were typical per start. Then consider how many innings a pitcher "should" throw in that particular season. Individual numbers are going to vary too much based on health and managerial preferences.

480. Juan V, posting on behalf of Juan V. Posted: December 08, 2008 at 10:41 PM (#3023516)
[after quoting #479:]
I'm not sure if that solves the chicken-egg question. I think one variable could be league-wide run scoring, but that doesn't seem like it would be enough, and I don't have any other ideas.

481. Chris Cobb Posted: December 08, 2008 at 10:51 PM (#3023525)
[after quoting #479:]
I am working on a study of patterns of this sort for the 1900-09 decade. I've gathered most of the data I need for the NL, but I haven't touched the AL yet. I hope to be able to post some numbers and analysis by the time we start working on pitcher rankings.

All I can say for now is that it is pretty clear that usage patterns are the result of interaction between manager preferences and pitcher ability. Even if managers are ready to use a pitcher for more than 40 starts in a season, they usually do so only when they have a pitcher who is an established workhorse. On the other hand, a manager whose preference appears to be to avoid heavy use of any starter will use a workhorse pitcher more frequently than his managing pattern would permit with any other pitcher. For example, McGraw gives a pitcher 40+ starts seven times from 1903-09 with the Giants, but five of those seasons are McGinnity and Mathewson, so although McGraw is interested in pitchers throwing 40 starts, he doesn't always have them, and he doesn't even use his two horses that way every season. He needs a particular kind of pitcher to follow this usage strategy. On the other hand, from 1900-05, Fred Clarke never gave a single starter 35 or more starts in a season. From 1906-09, he gave Vic Willis, and only Vic Willis, 35 or more starts every season. He never runs Willis out for 40+ starts, as Willis's Boston managers had done, but he adjusts his usage pattern to take advantage of Willis's capacity as an innings-eater.

Those are the two clearest stories I have read out of the data so far, and this work may lead mostly to narratives rather than to statistical tools for adjusting pitcher innings. But maybe something of that sort will turn up.

482. Chris Cobb Posted: December 08, 2008 at 10:54 PM (#3023526)
As an addendum, finding a way to appropriately adjust pitcher innings for different eras feels to me quite similar to working out playing time estimates for MLEs, and that is a process that I don't think can be accomplished by any formula, though one can use some formulas as part of the process.

483. DL from MN Posted: December 08, 2008 at 11:36 PM (#3023555)
Yeah, but innings/start could be a better way to adjust. If pitchers typically go 7 innings, then the guy who goes 9 gets more credit. This explains somewhat why I like Tommy Bridges and Billy Pierce - when the manager gave them the ball, they pitched deep into the games. Pierce was used in relief which limited his starts. Bridges did some relief work but also the Tigers had a pretty deep rotation. I'm not going to give either player demerits for not being a workhorse when they performed well given the opportunity.

#368-75
simple career rate-ing with replacement/reference-level a parameter (espy end 372)
: I have privately returned to that project using IP rather than XIP thruout (optionally, by making "IP or XIP" another parameter). I will not reproduce tables in this forum.
: In other words I have addressed the problem explained in #375. The pitchers named at end 375 do not vault so skyward as you may suppose, friends of Cone Gooden and Guidry, because I have been regressing DERA-cum-XIP toward ERA-cum-IP anyway.
: I don't say that I have resolved the problem. I am not happy using DERA in conjunction with IP because I still suspect that Davenport somehow estimates PRAA and XIP directly, and derives DERA. If so then any improvement upon "earned runs accounting" is embodied primarily in PRAA, and we risk that gain by using DERA in conjunction with the official measure IP.

#343-44
Chris Cobb's 60 candidate pitchers, major league starting pitcher post-1892
I have returned to this. Recall my observations regarding whom the list of 60 overlooks.

Al Orth, Claude Passeau, and Ned Garver stand out. Roughly they compare with Bucky Walters. They are much better than some of the sixty.
>>[quoting myself]Among all the other pitchers who worked mainly 189x-198x, Al Orth and Ned Garver and Claude Passeau show up with Walters at the top, then a gap. They are among the leaders almost whatever I do, [among pitchers not in the Hall of Merit and not in the Cobb 60].

: Red Lucas, Doc White, Murry Dickson, and Harry Brecheen. By hook or by crook they are clearly better than some of the sixty, if not much better.

Now what? Well, I was using a pitching replacement/reference-level (110% of average scoring) much closer to average than those favored by Chris Cobb and Dan R (125% of average) or Joe Dimino (123.3% of average). And I was using XIP in part. I feel that question and provide some report here.)

At 125% (credit to the pitcher for every run saved below 125% of league average scoring; that is, above 80 on the ERA+ scale)Al Orth now stands out so far that he needs a close look.(*) Orth pitched more innings than Walters and Walters pitched more innings than half of the Sixty pitchers, including Passeau, Garver and all of the others whom I identified as "the better of the rest" using reference level 110%. Orth should get a close look from everyone who gives moderate to high credit for innings pitched.

a close look?

Quoting myself #344,Orth worked at the turn of the century and just after, so his candidate value is KOd by the AL expansion. He should have electoral value by casting his shadow over or even near some other marginal candidates: ouch, my guy was another Al Orth in sum (albeit with better arm and worse bat).
That was not a close look ;-)

From this distance, whatever Al Orth achieved was a product of expansion no more than for Powell, Willis, Tannehill, Leever, Phillippe, Joss, Hahn, and they are all among the Sixty.

Chris Cobb defines the Sixty ("2009 Ballot Discussion" #320)
>>
320. Chris Cobb Posted: October 30, 2008 at 11:35 PM (#3000219)
OK. Here’s the first post with actual data following my attempt to lay out a pitcher consideration set that included every pitcher worth giving a serious look. It’s a study looking at pitchers’ career runs above replacement.

Background

Dan R has been arguing lately for the importance of rating pitchers based on value above replacement, if replacement level can be established. . . .

. . .
RCAP is taken from Lee Sinins’ Baseball Encyclopedia. Some RCAPs have been estimated: the old Windows laptop on which I accessed my encyclopedia copy died recently (the Encyclopedia is not Mac-compatible), so I couldn’t get the data for some pitchers recently added to the consideration set. Using WARP’s BRAA as a benchmark, I have been able to estimate trustworthy RCAP totals for these pitchers based on the RCAP/BRAA ratio for pitchers who are their contemporaries.

. . .
Note that two elements that go into career RAR—PRAA and FRAA—are normalized stats, but RCAP is not. This may make for some small inaccuracies, especially for pitchers with very high or very low RCAP values whose run environments deviated a great deal from 4.50 r/g. But as the sorting I am doing here is just a first effort to group the candidates and not a basis for final rankings, I won’t sweat this detail.</i>

Al Orth is one of those for whom the details in assessment of pitcher batting are likely to be crucial. He beats all sixty by raw OPS+ and he is among the contenders for second best in relative batting skill --behind Don Newcombe who achieved OPS+ 85 in the 1950s.

(I do not yet know much about RCAP or BRAA or any other particular measure, and I would not yet be confident in my judgment anyway. Briefly, I am skeptical regarding the use of league-season aggregates in contrast to MLB aggregates and multi-year moving aggregates. I don't have a settled view about methods on this point, and I would want to look a the details of Lee Sinins execution anyway. Wait 'til next year. Wait forever?)

. . .
For purposes of comparing the rest of the consideration set to Walters, I have sorted the post-1892 MLB pitchers into four groups:
[ sizes 14 + 14 + 26 + 5 = 59 + Walters = 60 ]
. . .
Including Walters, that’s 60 pitchers. To my knowledge, no one with anything like a serious HoM case has been omitted. (We can safely let the cases of George Mullin, Earl Whitehill, Paul Derringer, Larry French, Freddie Fitzsimmons, Rube Marquard, and so on, lie unconsidered.)

There were 5 pitchers in this group: Mays, Gooden, Phillippe, Newcombe, Sain

Here there was some mistake by Chris Cobb. Gooden belongs in group 3 with DERA lower/better than Walters. Langston belongs in that group too --although he may have been a sabrmetric tie recently.
At the moment: DERA Walters 4.14, Langston 4.13, Gooden 3.97
At the moment that is 27 pitchers with DERA lower/better than Walters, with Landston the highest/worst of them.

Now quoting #342 in the other thread.And now for the largest table –

Group 3: Pitchers who top Walters in DERA but not in IP
. . .

. . .
Group 4: Pitchers with fewer IP and a higher DERA than Bucky Walters

One might expect that no pitchers with fewer IP and a higher DERA than Walters, whose value is supplemented significantly by hitting and fielding and whose merit lies more in his peak than his career, would actually merit serious consideration, but, surprisingly, that is not the case.

. . .
Gooden does not do badly here, but unless one values his stratospheric 1985 season hugely indeed, he doesn’t push in among the top candidates [as perhaps Mays and more likely Newcombe may do].

Gooden appears out of the place in the table for group 4, with his stand-out 3.97 DERA intact, so it seems to me likely that there is nothing in the bottom line for Chris Cobb to undo.

Orth and Walters would be two of several closely matched contenders for second-best batter in the consideration set, behind Newcombe.

Orth would probably be the single best match for Walters in relative pitching and batting skill (rate statistics, no fielding): better than Carl Mays in group 4, whose close match Chris Cobb mentioned; better than Lon Warneke in group 3; better than George Uhle in Orth's own group 2; maybe a better match than Dolf Luque and Wilbur Cooper in group 1.

(Cooper and Orth pitched about 10% more than Walters; they worked about twenty and forty years earler. Dolf Luque is the best match for Walters if we include career innings or the timespan in the comparison. Still no accounting for fielding.)

By one simple rate-ing that uses reference 125% (the method of #373 corrected for #375),
Orth would rank 15/14 with David Cone among the now-61 pitchers in the consideration set. Walters would rank 23.

Orth would rank 49 among Hall of Merit pitchers (major leagues including pre-1893), ahead of Pierce, Saberhagen, Ferrell, Spalding, Lemon, Stieb, Koufax, and relief pitchers Wilhelm, Gossage, and Fingers. Walters would rank 52 behind Pierce, Saberhagen, and Ferrell. Paul Derringer is the best of the rest, consderation-appropriate but outside the Sixty. He and five others whom I have named would rank 53, behind Spalding; ahead of Lemon, Stieb, Koufax, and the relief trio alone.

Al Orth rates more than 10% above Paul Derringer and everyone else who is not under consideration. That is a big gap like the one Chris Cobb sees between Rick Reuschel and the rest of the consideration set. Beyond the five listed with Derringer just above, all of the following pitchers rate closer to Derringer than Derringer to Orth.

I'm planning to add Orth, Passeau, and Garver to my own consideration set, but I haven't had time to run the numbers. Orth especially seems like a candidate I ought to take a look at, if I can, before the balloting closes today.

Al Orth in the preceding articles is not primarily a figment of the 125% reference level (or replacement level 80 on the ERA+ scale) in conjunction with his relatively high career innings pitched (3355).

Suppose that I extend the consideration set by 12, adding Orth and eleven other leaders. Then he ranks 15 among the 72 by the simple rate-ing with reference level 125% (covered at length just above). But he ranks 16 at 110% (91 on the ERA+ scale, my reference level in #372-373) and he ranks 20 at 100% (zero credit for league average pitching-batting).

Walters ranks 25, 24, and 24 among the same 72-man extended consideration set, using reference levels 100%, 110%, and 125%. Of course he ranks behind Orth with any reference level, because Orth provides the best match in the set for his rates and Orth pitched almost 10% more innings. In order to rank Walters ahead of Orth, not to mention some others who rank ahead of Orth here, one must depart from the simple rate-ing with credit to Walters for fielding, or peak, or the World Series, or quality of league-season competition.

2. Rick Reuschel
Yep, Joe is right about him. Superficially similar to Tiant--both threw 3,500 innings with a 114 ERA+--but
[1] Reuschel was hurt by his fielders while Tiant was helped by his, and
[2]
while Tiant rode the wave of massive pitcher seasons around 1970, Reuschel pitched half of his career when the 300-IP season was a thing of the past.
[3]
Plus he has that One Big Year (1977) I like to see.

Thruout this note I will rely on DERA analysis Clay Davenport (see player DT cards at baseballprospectus). After the first two words of the next paragraph, I will stop saying so.

By DERA, Luis Tiant suffered with below-league average team fielding (on average for his career) and Rick Reuschel suffered with far-below-average team fielding. The norm for a good career pitcher such as we consider here is above-league average team fielding, so Reuschel and Tiant are unusual among pitchers we consider. They stand out even more in the Hall of Merit consideration set as I write, early in the winter of 2009, because for 110 years we have elected pitchers partly by heavy reliance upon ERA+ with little attention to other accounting systems. That is, the official distinction between earned and unearned runs has (with a little help) generated our demarcation between Hall of Merit member and nonmember pitchers.

RAA PRAA
201 273 Reuschel
238 257 Tiant

For Reuschel's career, poor team fielding cost 72 of the 273 runs that he "saved" relative to league-average pitching. Alternatively, the team yielded 201 runs below average while he pitched 3548 innings. Credit the team fielding with -36% and Reuschel's pitching with 136% of those 201 runs. That poor team fielding cost -0.19 runs per nine innings pitched (more than avrun scoring)

For Tiant, poor team fielding cost 19 of 257 runs that he saved. The team yielded 238 runs below average while he pitched 3457 innings. Credit the team fielding with -8% and Tiant's pitching with +108% of those runs. The poor fielding cost -0.05 runs per nine innings.

Among the 72 pitchers in my consideration set, having extended Chris Cobb's 60 by adding twelve this noon,
by the per-9-inning measure,

Tiant ranks 49 of 72 and Reuschel ranks 68 of 72 by team fielding support.

Tiant ranks 44 and Reuschel ranks 58 among the 60 pitchers in Chris Cobb's set.--much worse than they rank in 72-man extended set, partly because I relied on this measure among others to identify candidate pitchers.)

The four pitchers who suffered support worse than Reuschel's include two knuckleballists, Leonard and Wood. That suggests some bias in the measure.

Team fielding support per 9 innings, relative to league averagetop 6 and bottom 6 of 72 pitchers

Among 550 pitchers for whom I have convenient access to the data, Carl Mays ranks 19 (second behind Johnny Murphy given debut after the 19-aughts). Ned Garver ranks 544 or 7 from the bottom (second ahead of Dick Radatz given debut after the 1890s).

Paul, I use NRA - DERA to come up with fielding support (negative numbers mean bad fielding), and I have Reuschel at -.05 for his career, Tiant at +.03.

I do make some adjustments, the most significant one is weighing based translated innings. I doubt that would make a huge difference though . . . also I have not updated by numbers in awhile, if Clay went and changed the formulas, that could be the reason we are seeing different things.

I definitely agree with your general premise that ERA+ is not the answer. It is a nice eyeball metric, for sure. But that is it. It really should not be used as the sole basis of any serious ranking of pitchers.

"I definitely agree with your general premise that ERA+ is not the answer. It is a nice eyeball metric, for sure. But that is it. It really should not be used as the sole basis of any serious ranking of pitchers."

I refer to this stat a lot, but I totally agree with this.
Anyone has to factor in career length and in-season durability relative to league, for starters.

There's also the question of whether to penalize a player if they are recalled before they're ready, or allowed to keep pitching beyond their usefulness.