Thursday, November 25, 2004

Pennants Added 2.0 (updated for the 1949 ballot)

I’ve updated the Win Shares version of Pennants Added. For an explanation of the methods, try this thread.

I’m going to list players by position. If someone is missing that you’d like to see added, just let me know.

One other fix for 2.0 - I’ve updated the team games to team decisions. Since we are using Win Shares, it doesn’t matter if a team played 158 games, if they went 82-70, the player’s total should be based on 152 prorated to 162.

There will be a separate thread for pitchers. Click into the discussion for the results.

Reader Comments and Retorts

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

Thanks for the comments guys. I'm visiting family right now, and don't have access to everything to update the list right this minute. I'll add any players you ask for by Monday morning at the latest.

jschmeagol - that's just the way the numbers fall. For players that truly had a significantly great peak - guys like Ross Barnes, Hughie Jennings, Home Run Baker, etc. - you can see the difference.

Barnes for example has the same number of WS as Hardy Richardson, but 33% more Pennants Added. Look at Baker and Lave Cross as another example. Compare Sam Rice to Harry Heilmann. Jennings has 65 fewer WS than Maranville, but more Pennants Added.

Also, this is only based on a context through 1939. There was more spread between the teams the further back you go in history - so a great peak didn't have nearly the pennant impact that you'd think. The tighter the teams, the more impact a big year has. When the spread is huge one of two things usually happen - 1) Your team would have won anyway; 2) Your big year wouldn't be enough to close the gap anyway.

So as we go through history and more team-seasons get added into the calculation - more team-seasons that are tighter with lower standards for winning a pennant - the players with more big years will move up faster than those without.

Well, this is not good news for us peak voters as Schmeagol suggests in #2. Obviously a very very high peak like Hughie Jennings' does not amount to a whole lot of PA. You need some longevity too.

OTOH I have some questions. Thanks Joe for the info, BTW, I had been agitating for it from time to time, this is conceptually a very elegant measure, and I plan to take it very seriously, despite the fact that it doesn't help some of my faves like Hughie and Charley Jones. (OTOH it does help other faves like Pike (not even a complete career there), and it very much justifies players I've supported in the past who are now HoMers, from Barnes and G. Wright and McVey and Start.

But like I said I have a couple of questions.

1. Joe, it sounds like each player/season was matched only with other teams for the same season, is that correct? You said you've now gone only through 1939. There would be no logic to matching all previous seasons (only) but not future seasons. IOW if you match 1895 with 1929 or 1939, why not 1949? So I assume Jennings' 1895 was only matched up to the rest of the NL 1895. Is that correct?

2. And we still have timeline issues, right? (Not that I timeline. Maybe right now, however, we need that reverse timeline I was once accused of using.) IOW Player A and Player B have the same season, one in 1895 the other in 1935. Player A (1895) is at a disadvantage because of the happenstance (beyond the player's control, IOW) that all but 2 teams finished 20 GB or more, so there is a structural bias against Player A creating any PAs. Whereas in 1935 (let's just say, e.g.) all but 2 teams are within 20 GB, so there is a structural bias in favor of PAs. Is that what you are saying? How would one construct a timeline that would fairly compare eras?

But we see that players in the earliest days have some huge PA scores. That I assume is because of the very low replacement levels from a primitive player pool that makes it easy to dominate. So the structural disadvantage you mentioned, therefore, appears only to affect, say (as of today) about 1890 through...when? Could one infer that Childs was probably better than Doyle--i.e. close enough despite the structural disadvantage? Is Browning really better than Roush and Carey? Maybe Sam Thompson really better than Sam Rice after you adjust for this structural bias?

3. Also is there any merit to developing a rate for this? i.e. Jennings earned .737 in probably just 5 seasons for .14 per season. Is that more valuable than Herman Long's .770 in perhaps 10-12 above average seasons or .07 per season?

4. This goes to the next question or point of clarification. Presumably each player's total is the sum of a set of seasonal totals, right? So are 5 .14s really more valuable than 10 .07s, especially when you equalize for the unusually large spreads that character Jennings' era. (This may be a bad example because Long was more or less contemporary. But say Jennings vs. Sewell, Jennings' .14s would have had a bigger impact in 1930 while Sewell's .07 would have less impact in 1895. Is this a comparison worth thinking about--that is, again, the rates?

Great stuff but hoping for some clarification and discussion. Not sure yet if this will reflect in my 1940 ballot, but it will soon.

BTW, I know that my word choice in question #4 in post #7 is awkward. By Jennings .14 year, I mean to refer to his raw aWS. Plug those into 1930 and you get a different number than .14 PA, by Joe's comments above I infer that you get, maybe, .16 or .18 or .20, IOW more than the .14 that that season was worth in 1895.

Also, if you put Jennings' 1895 on the other 11 rosters in 1895 and play 'em out, what does .14 pennants mean? I would think it means that out of 100 seasons equally distributed among those 11 teams (9+ each), 1.4 of them would win the pennant with Jennings at SS instead of an average (?) SS while the Orioles meanwhile got an averatge SS in Jennings' place. Or it is 14 out of 1000 or whatever.

That is several questions in one.

Finally, what is the ceiling on single season PAs? Looking at Barnes it must be at least .25, or looking at Collins and Wagner it must be no more than about .15 in the 20th century. Is that right? Can we get a list of the best single seasons?

3. Also is there any merit to developing a rate for this? i.e. Jennings earned .737 in probably just 5 seasons for .14 per season. Is that more valuable than Herman Long's .770 in perhaps 10-12 above average seasons or .07 per season?

Trouble here is that Hughie didn't play 5 seasons, he played about 12. Plus, I think pennants-Added was designed to not be a rate stat.

We've already had access to the rate numbers and the career numbers. The more peak-oriented of us also bring in Top-3 or Top-5 Consecutive (or some variation).

Pennants-Added is one way to give peak-bonuses to players' career values and also lessen the contribution of below-average seasons. It eliminates issues of the definition of peak (how long, consecutive seasons or not)... these issues were often problematic as certain players scored better than other depending on how you defined peak. This method is nice in that its published.

This is just one number for us to add into each of our mixes, but I'm grateful for all the work that went into this. Thanks, Joe! The WSaR numbers are pretty cool as well.

Joe - are you still using Chris Cobb's adjusted numbers for the Win Shares? The reason I ask is because compared to the old ones, they all seem to be about 5-7 Win Shares higher, with one glaring exception: George Van Haltren*, who's about 50 higher. Any idea why that would be the case?

*based on the ones on my spreadsheet, which just has active candidates.

I only have a few minutes guys, but let me see what I can touch on - I'll be much more thorough once I get home tomorrow night.

Marc - players are evaluated using all seasons 1876-1939 in terms of pennant impact. I use the cutoff of the year we are voting in (since we are voting in January 1940, I use 1939 as the last season). But all seasons are weighed equally.

Things that aren't accounted for - timeline, league quality (one reason I included the WARP3 numbers). Also unaccounted for - defensive spectrum shifts - aside from WS accounting for the 2B/3B shift.

I would certainly give 1B a boost pre-1920, probably pre-1925 (to account for the few years it took to change attitudes, etc.).

Also catchers - like relief pitchers will be in the future they just don't get enough on their own, either because of manager tendencies or wear-and-tear - so unless you don't want many in, I think you need to give them a boost. I boost them 33% (if all they did was catch). I boost pre-1920 1B about 10%.

WS loves CF, they tend to be a touch overrated because they get too much credit for overall OF defense - so I drop CF about 10% and LF/RF about 5%.

Devin - Van Haltren is higher because his pitching WS are included. I didn't use Chris Cobb's numbers, I figured them on my own - and I've revised my sheet to use team decisions, not games, because WS are only based on team wins, not games. So if a team played 159 games with 6 rainouts, for WS it's not fair to adjust using 159, you need to use 153. For WARP, I believe 159 is the correct number, but I didn't do anything with WARP, those numbers are right off the site.

One other caveat - pitching replacement level is signficantly higher - I've got it set to where 220 IP = 1 full-time regular season in terms of what I subtract from replacement level.

I wouldn't use pennants added/seasons to come up with a rate stat - I'd use pennants added/win shares or WSaR. That would be a good way to see which players had higher peaks. But as David said it isn't designed to be a rate stat.

I think it's entirely reasonable to give him two more seasons of 30 WS, 24 WSaR (his 1883), which is about .17 more Pennants Added and puts him at 347 WS in basically a 12-year career. That's pretty conservative. Total is .985 PA.

If you give him 35 WS, 28 WSaR (his 1885), you are looking at 357 WS, and another .21 PA. That puts him over 1.000 PA (1.025). At that point it's reasonable to compare him to a guy like Harry Heilmann. That's where I have Jones, a little bit short of Heilmann.

I don't remember, but is there any reason to give him credit for anything pre-1875?

"But we see that players in the earliest days have some huge PA scores. That I assume is because of the very low replacement levels from a primitive player pool that makes it easy to dominate."

Replacement level is considered the same throughout histroy. The players from the earliest days had some huge PA scores because they had some monster years - a short schedule contributed somewhat, but a pennant is a pennant.

"Is Browning really better than Roush and Carey?"

No way - for one, WS can't dock him enough for his defense, which was truly abysmal. Also it doesn't account for the poor quality of the early AA, which Browning dominated. I guess you only meant that because of a structural bias (which doesn't exist, as I said in post 13).

"Also is there any merit to developing a rate for this? i.e. Jennings earned .737 in probably just 5 seasons for .14 per season. Is that more valuable than Herman Long's .770 in perhaps 10-12 above average seasons or .07 per season? "

Just to elaborate on what I mentioned earlier - NO! This is that stat. Hughie Jennings' 257 WS were worth .737 PA, Herman Long's 313 WS were worth .770 PA. So if you assume WS is a perfect metric, Long's longevity (no pun intended) was worth more than Jennings' peak.

But, Jennings did have 78% of the WS and 96% of the PA, so the gap was signficantly closed. Closed enough that if one could make a compelling arguement that Jennings is underrated or Long overrated for some reason (besides Jennings' peak, we've already accounted for that!) you could flip-flop them.

"Can we get a list of the best single seasons?"

I don't have that handy right now - but basically just find the top single-seasons for WS/162 games and that will be close (since PA are based on WSaR). A few examples:

So two 19's are .134, one 38 is worth .158 - Jennings gets an 18% bonus for having the big year."

If you look at raw WS, the boost is bigger. Jennings had just 45 WS as opposed to 52. So despite having just 87% of the raw WS, he gets 118% of the PA. So a 45 WS season gets an 18% boost over two 26 WS seasons.

The gain would be even bigger over two 22.5 WS seasons. Sherry Magee had a 22.5 WS season in 1918, that was worth .058 PA. Jimmy Sheckard's 1899 was 22.5 WS and .055 PA and Speaker's 1927 was .057. So a 45 WS season gets a 40% boost over two 22.5 WS seasons. I don't see how anyone could argue that this metric doesn't give enough credit to a big peak.

I'd say that the standard at C is low for obvious reasons. I'd say a player is a serious candidate at .600 PA if he caught full time - Bresnahan is tricky because of not catching full time - his bonus isn't as high.

3B is a little lower than the rest, .800 making you a serious candidate instead of .900 - not sure why, since WS gives extra defensive credit for playing 3B. Maybe we just overrated Groh and Collins? WARP3 sees them (and Lave Cross) equal to Baker and Leach not sure who is right there . . .

I'm also trying to think of why I was missing on Leach. I guess his skills were a little more subtle. He needs a bit of a docking, due to playing CF for 1/2 of his career - but at worst that could lower him to the Collins-Groh level.

Either Thompson is underrated by WS, or we overrated him (I lean towards the former), but either way, I don't believe .777 should be the standard for RF, I believe .900 (as with RF/CF) is obviously what makes sense.

Oh - first base - I'd say .800 is a reasonable standard pre-deadball, since there was much more defensive responsibility than WS gives the players credit for (generally between 1-3 WS per season).

Beckley is the only tweener candidate (I guess Sisler too), but it's been tough, because each career is so one-sided - in terms of the peak-career value spectrum. If Beckley had the same PA, but 320 WS instead of 369, he've been in a long time ago (because he would have won over at least a few of the heavy peak guys).

If Sisler had 350 WS instead of 317, he'd have a few more of the career voters, etc. . . . We haven't had a balanced tweener candidate at 1B, so the standard really hasn't been set yet . . .

Benny Kauff and Gavy Cravath are interesting calls too, depending on how much extra credit they should get. Kauff is especially tough since he had 41 and 36 WS seasons in the Federal League that need to be discounted . . . Cravath is probably going to make my ballot, though I'm not sure where. Cravath gets extra credit on the front end of his career, Kauff gets it on the back end - interesting to say the least . . .

One more attempt to clarify something about the pennants added metric:

The pennants added metric measures a players' total contributions over the course of his career. It is, therefore, by definition a career value metric as "career value" has traditionally been defined. What the metric does is produce a total value for all of a player's contributions over the course of his career that takes into account the fact that units of value (WS, WARP, etc.) make different contributions to the ultimate end of winning a pennant depending on the time and place that they are earned.

There are two debates we need to have about the pennants added metric (and one uncontestable point we need to acknowledge):

(1) Is pennants added a better way to measure career value (total value under the line) than aggregating season-by-season value above some replacement level? This debate turns on whether one buys the assumptions built into the pennants added metric.

(2) If we can accurately measure the total additional value a player's high peak brings to his teams, should we stop there? Or should a high peak or high prime or long prime count further in our evaluation process? Joe and many others (even some who claim to be "peak" or "prime" voters) ASSUME that the answer is obviously the former. Perhaps they are right. But, as Bill James noted in the first historical abstract, many, many people in evaluating the greatest baseball players of all-time either assume or argue that we should be answering a very different question, not who brought the greatest value to his teams over the course of his career but who established the highest level of play (with very different definitions of how long one has to play at that level to have "established" it). I started out essentially trying to balance the two questions equally in constructing my ballot. I have moved (or been pushed by the group) to valuing the "total value under the line" question more (measuring in a sophisticated, career-shape considering way not unlike the PA calculcation), but still count the "highest established level of play" assessment as something like 1/3 of my system. The choice is, of course, up to each of you.

(3) (The uncontestable point): Pennants added are only as good as the underlying statistic used to calculate them. Therefore, treat Joe's numbers with however much respect you treat Win Shares, no more, no less.

3B is a little lower than the rest, .800 making you a serious candidate instead of .900 - not sure why, since WS gives extra defensive credit for playing 3B. Maybe we just overrated Groh and Collins? WARP3 sees them (and Lave Cross) equal to Baker and Leach not sure who is right there . .

The reason probably is that third basemen break down faster than many of the other positions which is not going to be highlighted by WS. Also, does WS give extra credit for the pre-Mathews third basemen?

As the only FOLC, I would like to do a little advocating at this time. While his PA numbers dont scream for him to be on the ballot, they do ask for consideration. Also, note that he has the highest WARP3 of the 3rd basemen listed, and that his "other" position is more than 2 FULL seasons of catcher. Probably more like 3 typical catcher seasons. If you are giving catchers bonuses, make sure you pro-rate out some of it to Lave.

Also catchers - like relief pitchers will be in the future they just don't get enough on their own, either because of manager tendencies or wear-and-tear - so unless you don't want many in, I think you need to give them a boost. I boost them 33% (if all they did was catch). I boost pre-1920 1B about 10%.

WS loves CF, they tend to be a touch overrated because they get too much credit for overall OF defense - so I drop CF about 10% and LF/RF about 5%.

It's not clear whether the Pennants Added numbers reflect these boosts, or they are just your personal preferences.

(3) (The uncontestable point): Pennants added are only as good as the underlying statistic used to calculate them. Therefore, treat Joe's numbers with however much respect you treat Win Shares, no more, no less.

Win Shares doesn't do a very good job adjusting the pitching/fielding split for changing conditions. I think even the most ardent Win Shares supporters agree with that. Exhibit A for this assertion is the pitcher Win Shares during the 1870's/1880's. Pitcher Win Shares are also out-of-whack, but by smaller amounts, during the 1890's and probably the deadball era also.

The corollary to this is that the fielding Win Shares for this period are also out of whack, because whatever was mistakenly given to the pitchers should have gone to the fielders. This means that the Pennants Added numbers are also depreciated for those who played key defensive positions (C,SS,3B,2B), and the farther back we go, the more the PA numbers have been depreciated.

This affects Bennett more than Bresnahan and Bresnahan more than Schang.

This affects Long and Jennings more than Sewell.

This affects Dunlap more than Childs, Childs more than Evers, and Evers more than Doyle.

This affects Williamson more than J.Collins, Collins more than Leach, and Leach more than Groh.

IOW, when there are close calls at a key defensive position, this flaw in Win Shares acts as a reverse timeline, because the farther we go back, the more important the fielding actually was.

OTOH, if you really believe in the 1880's Pitching Win Shares numbers, then there's another dozen 1870's/80's pitchers available whose 3-yr/5-yr peaks will put to shame any of these players from the 20's/30's.

Joe is right. It would appear that, with only a couple of exceptions, we've done a very good job sorting out the best position players. About .900 seems to be the PA between in and out.

Bennett's peak is far enough back that it is probably seriously underrated by the WS fielding/pitching split flaw mentioned above. If that were somehow corrected for, I'm confident he would move ahead of Bresnahan, though probably not quite to .900.

For the same reason, I think that Dunlap would move up the 2B list. Whether he'd move over the .900 mark is open to debate, but is a serious possibility.

Leach appears to be an oversight; Williamson too (1880's fielding, again); Collins and Groh are the exceptions to the .900 rule.

Thompson certainly doesn't look good. With a real 1890's CF glut we've rejected them all (so far) in favor of Carey (who may be questionable).

The base premise behind these calculations is where the WS replacement level lies. If I'm understanding correctly, Joe, you're using either 6.5 or 7 WS for a full season. This may be too low as 10 Win Shares is the average regular on the 1962 Mets. A higher replacement level should drop all PA numbers, but hurts long careers with low peaks (e.g. Van Haltren, Beckley) more than short careers with high peaks (Flick, Kelley).

I definitely agree with what you are saying on the reverse timelining Jim. If two players are even, the benefit of the doubt, based on this metric should go to the player that played earlier, especially if it were at a key defensive position (or 1B which is underrated by WS pre-Ruth).

Win Shares gives 48% of the shares to the offense, 17% (.52x.325) to fielding, and 35% to pitching. When players are average, this breaks down as 8 full-time position players, each getting 8.1% of the shares, and 4.33 pitchers, each also getting 8.1% of the shares. This "full-time" pitcher pitches about 330 IP. (For estimates, I usually approximate this as 12 full-time players.) In actuality, these are 12.33 full-time positions, and individuals are allocated some portion of the playing time at these positions; few these days are truly full-time.

A .250 team (like the 1962 NYMets) would go 40.5-121.5, and accumulate 121.5 Win Shares. Each of the 12.33 positions would then get 9.9 Win Shares (assuming they were all equally bad, and that Win Shares portions them out evenly - I don't believe it actually does, though I would have to work through the math).

If you believe that those Mets represent replacement level, then this provides a bench-mark for full-time replacement level. The replacement level amount for a .500 team may be somewhat lower due to Pythagorean distortion, but I can't see it being much below 9 (though I could be wrong). If you believe that those Mets are above replacement level, then the Win Shares replacement number is correspondingly lower.

All of the examples you cite are from "glove" positions. Win Shares is biased against them, with a higher replacement level for fielding/pitching than for hitting. Also note that the DH messes with the calculations because it steals OWS from the other positions (2+ per position). Find some NL 10's at 1b/corner-of and see what your impression of their quality is.

Bret Boone this year had 9 WS in 148 G (basically 10 per 162 G). He had an OPS+ of 98 while playing 2B.

Something's funny there. An average offense on an average team gets 116 Win Shares to spread around, which is 13 average at 9 hitting positions (including DH). A DH with a 98 OPS+ playing 90% of the time should get 10-11 Win Shares. (Seattle missing their Pyth-proj by 6 games didn't help either.)

To hazard a guess, for the past 100 years (200 pennants), I suppose we would see about 250 "pennants added" in total. Which would mean if all players in MLB history were put in a dollar auction, immortals like Ruth (2.5 pennants)would be worth 1% of the total budget.

Based on the above numbers and some common sense, I would say that we should treat 8-9 WS and 1-1.5 WARP as replacement value (which of course means that WARP isn't actually Wins Above Replacement at Position--but we knew that).

"Corollary: The total sum of Pennants Added for a league for a season would be approximately 1"

I would think not, but I don't know for sure.

For one thing teams in a league - if a player has a 25 WS season in the 1963 NL (10 team league) or 2004 NL (16 team league) they would be weighed the same, get the same pennants added, etc.. But in a 16-team league there are more teams, seasons, etc..

Interesting thing to mull over at least . . .

Okay for replacement level, I've been convinced, I'll bump it to 8 WS/162 games. But Jim - are you saying I should be giving pitchers a replacement level of only 8 per 330 IP? That 330 IP = a full season as a hitter? I can't fathom that.

For one, that's about 1300 BF or ~double the number of PA in a full-season.

Let me try the math . . .

.675*.52*81*3 = 85.293 WS for an average staff.

An average staff pitches 1458 innings. That works out to 12.87 WS for an average pitcher in 220 innings.

We know that WS has a built in replacement level of about .200. That's what 0 WS means, that you are a .200 player.

So if you want to set replacement level in the middle (.350 W%) you'd get about 6.435 WS per 220 IP, right?

But here's my problem with all of this . . .

I think a replacement team would be worse than the 1962 Mets. First, the 1962 Mets were 10 games under their Pythagorean record. They were a 50-110 quality team that was really unlucky. The next season, they were 51-111 with a 50-112 Pythagorean record.

So I think a .250 team with a .313 pythagorean is above replacement level, but I think a But I think a replacement player is roughly a .350 player. How do these two thoughts peacefully co-exist?

Take an offense of replacement level players, the team will play .350 baseball - with an average pitching staff. If you put replacement pitchers into the mix, now you've got a .350 offense and a .350 pitching staff and the team plays significantly worse than .350 ball.

A team that scores 700 runs (league average) and gives up 962 plays .350 baseball. A team that scores 494 runs and gives up 700 (league average) plays .350 baseball.

But a team that scores 494 runs and gives up 962 - they play .223 baseball.

So maybe we should look at the WS of a .223 team to figure out where replacement level should be?

A .223 team isn't that much worse than the .250 team that Jim used, but let's run the numbers anyway real quick.

Dividing the batting and fielding WS up between 8 position players it works out to 8.8 WS/162 games and for pitchers it works out to 5.7 WS/220 IP. Or 8.6 for the 330 IP pitcher Jim used (I still have trouble with that though).

Okay, are these the figures I should use to recalculate the numbers? Are we theoretically correct here?

a team that scores 308 and allows 1367 - they play .052 baseball (8-154) - you can't get much closer to zero but it just helps me to confirm that we are on the right track here theoretically, that's all.

Under Win Shares, a 336 IP average pitcher is equivalent to a play-every-inning average regular. I'm not quite sure how this translates to replacement level in real life, but theoretically the equivalency has some validity.

Pitcher's hitting screws this analysis up to some extent, though it's not clear to me how much. In general, 20th century pitchers haven't hit enough to earn positive Win Shares, though specific individuals do. Pinch hitters also can earn positive OWS ("stealing" them from the position players). These two factors would lower the 336 equivalency number somewhat (don't know how much, but adding a full-time DH lowers it to 300 IP, so it can't be less than that).

There are going to be all sorts of issues when the DH comes into play Jim. Major issues.

I think it's not out of line to say the WS underrates AL hitters by as much as 11% on average - possibly more.

Before we get to considering players whose careers took place after 1972 I (or someone) needs to figure out exactly what an average DH hits. Retrosheet's breakdowns will be a big help here.

My theory is that there is only one way to handle it. You have to add offensive WS to all AL teams (or more specifically offensive players). I realize that this goes against the concept of everything 'adding up' but there is no other way that I can see to handle the problem. The AL artificially adds offense to their game - in essence it doesn't add up there either. A player with the same batting events (assuming equivalent parks, etc.) gets fewer WS in the AL for no other reason than that the offensive credit is divided 9 ways instead of 8.

I've talked to Studes about this and he disagrees - but I really think that it's because I haven't had the time to explain it well. I don't see any other way around it, but maybe I've got tunnel vision on the issue.

There is no subsequent adjustment needed for AL pitchers because they are still being compared to league averages (they are already adjusted for the tougher 'degree of difficulty'). But the AL adds offense, so I think the only way to account for this is to add WS to the mix, even if it violates the everything must add up rule.

A player with the same batting events (assuming equivalent parks, etc.) gets fewer WS in the AL for no other reason than that the offensive credit is divided 9 ways instead of 8.

I've talked to Studes about this and he disagrees - but I really think that it's because I haven't had the time to explain it well. I don't see any other way around it, but maybe I've got tunnel vision on the issue.

Joe, I believe your thinking is correct, but your conclusion may be wrong?!

It will take more offense in the AL to produce a "WIN" than in the NL, so the same batting events in the AL SHOULD result in fewer win shares, as they are LESS VALUABLE than if they were produced in the NL.

What you seem to be trying to adjust for is not value, but ABILITY. If you're trying to get to ABILITY, then you'd need some type of adjustment.

On top of this, the "problem" is made worse I think by the silly zeroing out of "negative" batting win shares, which are primarily created by NL pitchers....

KJOK - I sort of agree - but here's the thing . . . when you are comparing these guys to players that play in the NL - or to previous years (pre-DH) I think you have to level the playing field. I guess it is an ability question but in this rare instance, I think it's appropriate. The only other alternative is to put all AL hitters from 1973-present at a significant disadvantage in the comparison.

The only other alternative is to put all AL hitters from 1973-present at a significant disadvantage in the comparison.

I admit I'm sorta on the fence on this one. Basically, the AL is playing with 10 players vs. the NL with 9, so Win Shares is not going to be the ideal measuring tool in this case since both NL and AL teams have the same number of wins to divide up, unless you also had "loss shares" which hasn't really been invented correctly yet..

3B - John McGraw (was .691) moved ahead of Billy Nash and Lave Cross. Nash (was .710) moved ahead of Cross as well (was .729). The gap is close enough that I would nudge Cross up to about equal with Williamson, due to his catching.

Looking at these numbers and adjusting for WS's flaws, for league quality, for peak and prime (which I independently value, not just as a multiplier when calculating career value), and for position, I think these numbers very much reinforce the conlusions I have reached on all the eligible position players save one: Tommy Leach. I'm going to reconsider him thoroughly before the next election.

My theory is that there is only one way to handle it. You have to add offensive WS to all AL teams (or more specifically offensive players). I realize that this goes against the concept of everything 'adding up' but there is no other way that I can see to handle the problem.

This approach strikes me as conceptual similar to hypothetically adding more pitching Win Shares to all teams after the introduction of the 5-man rotation to allow for direct comparison to pitchers from 4-man rotations. I'm not sure what I think about that.

I think, in general, Joe, you are attempting to use a favored statistic for some measure that it was not intended for. There are statistics that measure ability rather than value. If it is important to measure ability, then use one of those instead of Win Shares -- rather than warping WS for one purpose, but not for all the others in which WS boosts shortchanges a player due to non-ability-related factors.

Joe:There are going to be all sorts of issues when the DH comes into play

Complete agreement.

I think KJOK has it right when he says that offense is less valuable in the modern AL. Simple supply and demand; surplus vs scarcity. I think this is best handled with a league quality adjustment, though it only affects BWS. (I have no idea how WARP deals with this issue, not having looked that closely at the modern players.)

KJOK:On top of this, the "problem" is made worse I think by the silly zeroing out of "negative" batting win shares, which are primarily created by NL pitchers....

Disagreement. Adding negative batting WS to the NL makes the problem worse. When compared with James' WS, this in effect takes pitching WS and converts them to batting WS; the smaller group of NL hitters divvy up a larger pool of NL batting WS, accentuating the difference wrt the AL.

jimdAbout .900 seems to be the PA between in and out.

This appears to be about .750 after the revisions (with a .700 exception for 3B).

Again, Joe, thank you very much. I know it was a lot of work, but the results are very interesting.

Disagreement. Adding negative batting WS to the NL makes the problem worse. When compared with James' WS, this in effect takes pitching WS and converts them to batting WS; the smaller group of NL hitters divvy up a larger pool of NL batting WS, accentuating the difference wrt the AL.

Agreement. You're correct, the zeoring out of negative batting win shares LESSENS the problem, but as Joe has discovered, it's still a problem.

I think, in general, Joe, you are attempting to use a favored statistic for some measure that it was not intended for. There are statistics that measure ability rather than value. If it is important to measure ability, then use one of those instead of Win Shares -- rather than warping WS for one purpose, but not for all the others in which WS boosts shortchanges a player due to non-ability-related factors.

Agree here also. Win Shares is simply not a good tool for doing "HOM" evaluation - low replacement level, incorrect position adjustments over time, incorrect pitching/fielding splits over time, etc., etc.

Please let me know if I missed anything in my detective work. As best I can tell all of these numbers are set to 154 game seasons, so they Win Shares numbers will need a 5% boost to be on par with those for their white counterparts.

"Dividing the batting and fielding WS up between 8 position players it works out to 8.8 WS/162 games and for pitchers it works out to 5.7 WS/220 IP. Or 8.6 for the 330 IP pitcher"

I'm going to use these numbers unless someone speaks up and tells me I shouldn't.

I'm not sure whether this confirms your decision or not (my first thought is that it does) to use a significantly lower replacement level for pitchers than for hitters. If anything, perhaps replacement level for pitchers may need to be even lower.

suggests and provides some evidence (for the 21st century at least) what I have long suspected -- for pitchers much more than other positions, showing up (and not sucking) is much of the battle.

The conclusion:

[The replacement level for e]very position comes in between 60% and 70% except starting pitching. The replacement level for starting pitchers is around 40% [38%] -- just slightly more than half that of all other positions. This finding reinforces the commonly heard complaint that Win Shares undervalues starting pitchers."

In other words, if my starting leftfielder goes down, I can probably replace about 70% of his value with a fairly readily available backup, but if my starting pitcher goes down, then I might not be able to replace much more than a third.

Joe: could you put Ruth and Hornsby on the chart? Obviously it won't affect anyone's vote, but just for the fun of it, I'd like to see what ridiculous number gets put next to Ruth's name. (Of course, Bill Terry should go on there because it might affect votes.)

I will try to get to this tonight OCF - I'm heading off on vacation this morning, and it kind of slipped my mind. Worst case, I'll print the relevant info before I leave work tonight, and do it on the plane - I assume the hotel have a way for me to hop on line with my laptop, even if it's just dialing up through a local #.

In other words, if my starting leftfielder goes down, I can probably replace about 70% of his value with a fairly readily available backup, but if my starting pitcher goes down, then I might not be able to replace much more than a third.

There's one important variable missing here, which is TIME.

For players just injured and missing today's game, there is one (very low) replacement level.

For players injured and about to miss several games (giving the team time to call up someone, or trade for someone, etc.) then there will be a higher average replacement level.

For players injured in early spring training and about to miss the whole season, they replacement level will be even higher.

For players retiring in October, the replacement level will be even higher, etc.

What all of these models fail to account for is that the overall level of the league declines when a player is injured.

This is true, but the question is what value we are trying to evaluate -- a players' value to his team, or a players' value to the league as a whole. I think most of us think of value as the former. Babe Ruth is great because of what he did to help the Yankees wins pennants, not because he improved the overall quality of the American League. If that is the case, then the relevant replacement level is the quality player the Yankees are likely to get, not the inevitable scrub who would trickle up into a backup role for the Indians or Browns.

What would you suggest I use as my replacement level. I think it's already pretty low - 8.8 WS per 337 IP.

Of course, we'd need to see more information on how the 1930s replacement level compared to the 1930s, but assuming that they are similar, then using jimd's numbers for post 45, if an average pitcher on an average team (77-77 record) pitching 330 innings earns 8.1% of 231 win shares (77*3), then he would earn 18.7 Win Shares per year. If we assume that replacement level for pitchers is 38% of the average level, the we get 18.7*0.38, or 7.1 Win Shares per 330 innings.

I don't think long-term replacement level approaches anything near average, as some have suggested.

The Lake Wobegon replacement-level. If replacement-level was average over any time period, then practically all teams would be above average.

This concept confuses expectation with reality. 20 years from now, your expectation at each position is average, but that's not the replacement level; you have to expend resources to get to average. If you don't expend the resources and just drift along, you become the 1918-1948 Phillies, or the Browns 1930-1953, etc.

If that is the case, then the relevant replacement level is the quality player the Yankees are likely to get, not the inevitable scrub who would trickle up into a backup role for the Indians or Browns.

A concept worth exploring and related to Brent's (I think it was his, but I could be misremembering) championship level.

But please don't call it "replacement level". It's different from what is usually meant by that term (which is a league-wide concept). And we have enough trouble communicating our thoughts sometimes without deliberately using the same name for different concepts.

The Lake Wobegon replacement-level. If replacement-level was average over any time period, then practically all teams would be above average.

I'm not sure I follow this. The replacement level of starting players is actually slightly ABOVE league average long-term...

20 years from now, your expectation at each position is average, but that's not the replacement level; you have to expend resources to get to average.

Resource expenditure is moving more towards an economic evaluation. Plus you have to spend resources to even be the old Phillies or Browns, including the resource of a roster spot, etc.

But please don't call it "replacement level". It's different from what is usually meant by that term (which is a league-wide concept). And we have enough trouble communicating our thoughts sometimes without deliberately using the same name for different concepts.

Well, this is exactly what some of us are arguing against. Measuring "above replacement level" by using the absolute worst player in a league to measure HOM worthiness. A better measure would be the level that the "average" player doing the replacing would be performing at.

I've argued this at length elsewhere but replacement level is equal to the level of the best non-majorer in a world where all teams are of equal intelligence, all teams have equal financial resources, there are no rules as to when and how you can trade players, there are no psychological factors involved in making player moves, etc. Under those artificial conditions, to replace a major leaguer, you should be able to aquire the best non-major leaguer for resources totaling the difference between that player and the worst major leaguer who fills a similar role.

However, those conditions are fictional. Not all teams are smart. Not all teams have the resources to be on the market for talent at all times. There are rules on when and how you can trade. Teams stick with below replacement level guys out of loyalty or inertia. Etc.

As a result, when the smart, resource-heavy teams need to replace a player (and they are the teams we care about if we are focusing on winning pennants) the market is structured such that they don't pay full price (in $ and talent) for their acquisition. As a consequence, they can usually obtain a player who is somewhat better than the best non-major leaguer at little or no cost or a player who is fairly good at a discounted cost. I have absolutely no idea the magnitude of this effect, and am fairly certain it fluctuates over time (due to different distributions of resources and savvy, different rules, etc.) but am certain that it exists at all places and times.

The replacement level of starting players is actually slightly ABOVE league average long-term...

This can only be true if you DEFINE replacement level as an average. The definitions I've seen define replacement level as the minimum available. If you change the definition, then you're talking about a different concept, which is fine, but it's not "replacement level".

IOW, if you take the average value of all replacements 20 years from a given date, then that value will be slightly above league average. But that's not replacement level, that's average replacement value, and they're not the same thing.

A better measure would be the level that the "average" player doing the replacing would be performing at.

I'm not sure that it would be "better", but it would be a different measure. Count as positive all value above average, count as zero all value below average. This just means that you are applying a higher standard. Zeroing the negative is important, because again you shouldn't penalize the guy for playing - making a living because he has value to some team even if not to the best teams - just because he doesn't meet the higher standard. It's kind of like time in the minors, usually just ignored, not actively penalized.

KJOK :A better measure would be the level that the "average" player doing the replacing would be performing at.

jimd :I'm not sure that it would be "better", but it would be a different measure. Count as positive all value above average, count as zero all value below average.

EEP! My head is spinning...

I believe KJOK meant "average replacement" here not "average". The point here is that due to fluctuations in farm system strength, who is on waivers, poor management, etc, the "replacement level" for a team at a given position can vary. Thus, there is almost guaranteed to be several "below replacement" players in the league at any given time. So, the hypothetical "replacement level" should not necessarily be defined by the worst players in the league, but a little bit higher than that.

Comparing to the absolute average instead of the replacement is another debate.

I think a great point was made in this conversation, though whether it is really germane to the HoM I'm not sure. But the concept of replacement value is highly conceptual. Clearly, replacement value for the New York Yankees circa 1997-2004-2005 is a little different than it is for other clubs, even if you're talking about short-term replacement due to a temporary injury situation. But as soon as you're talking about a longer-term replacement, the threshold goes way up.

This is just another reason why I take value to be a zero-based phenomenon. Every player who takes the field has zero value until the first pitch is thrown and then their value goes up (not down) from there. A fielder makes an error, a pitcher gets rocked, a batter takes a donut, that's an absence of positive value.

The calculation of value against any other baseline just looks too hard, too conceptual, too much guesswork to me.

Yes. But he also said "The replacement level of starting players is actually slightly ABOVE league average long-term", so to him they are close to interchangeable.

So, the hypothetical "replacement level" should not necessarily be defined by the worst players in the league, but a little bit higher than that.

No quarrel with that, but that's not what KJOK is saying.

sunnyday2:whether it is really germane to the HoM I'm not sure.

Methinks any discussion about the way we evaluate players is germane.

Clearly, replacement value for the New York Yankees circa 1997-2004-2005 is a little different than it is for other clubs

This may be true, but it does not lead to a workable definition of value for a project such as the HOM. A team-by-team definition leads to evaluations where good players on bad teams are more valuable than good players on good teams simply because they play for bad teams, while using a league-wide replacement level in the same case would have preferred the good players on the good teams.

KJOK:
I'm not attacking your player rankings, please understand that. I think that peak value is more important (and valuable) than many here do; I just don't have a good handle on quantifying it. So incorporating "value over average" as part of a player evaluation system seems reasonable to me (just no penalty for "value below average"). I hope to get my old BP's out of the attic and look at Pennants Added more closely some weekend soon, to understand it better.

I also think that using WS or WARP to evaluate careers has problems, because much of that total value represents value under replacement, value awarded simply for playing time, not value above replacement (using the worst regular definition).

Yes. But he also said "The replacement level of starting players is actually slightly ABOVE league average long-term", so to him they are close to interchangeable.

So, the hypothetical "replacement level" should not necessarily be defined by the worst players in the league, but a little bit higher than that.

No quarrel with that, but that's not what KJOK is saying.

I apologize for being ambiguous, especially on a subject where you need to be very precise.

What I meant to say was the avearge replacement level for STARTING players IS slightly above ".500" OVER THE LONG TERM. In the short term, if you've got to replace your starting SS 10 minutes before gametime, the "average" replacement in this situation is obviously much less than a .500 player.

I think the basic difference in what I'm proposing "should" be the method is assigning 'value' is that "WARP" basically measures from the 25th man on the roster of the Arizona Diamondbacks while I would propose measuring from the average "10th man" (11th in AL I guess) with average calculated across the Red Sox/Cardinals/Yankees all the way down to the Brewers/Diamondbacks, etc.

Just wanted to point out that over on the Mike Matheney signing thread, someone reposted a post from Clay Davenport that admitted that the way Baseball Prospectus does their 'replacement level' results in the replacement level being at the level of an average AA ballplayers.

I think I'd much rather use Major League average than AA average for any measurement....

What I meant to say was the avearge replacement level for STARTING players IS slightly above ".500" OVER THE LONG TERM.

Again, this doesn't make sense to me. Everybody who is playing a "below .500" player is ignoring an above .500 player who is out of work?

It just means that the "average" starting player is around a .514 player. So, if a player retires, the most likely performance level of his replacement on the team the following year would be a player who is slightly above the major league average.

So it is an "average" based replacement-value system, as opposed to a "minimum" based replacement-value system

Yes, because unless you're the Diamondbacks or the old Browns or Phillies, that is the more realistic replacement IMO.

Do the players who are below the "average replacement value" get zero value or negative value?

I know people get hung up on this 'negative value', but each game begins with both teams theorically having a 50% change of winning. So if a player over the course of the year was less than a 50% of league player, he would receive negative value for that year as he moved his team towards losses more than towards wins.

Let's say, Player X comes to my team. He is a young free agent, and an "above minimum replacement player", but he comes with something extra: a crystal ball that lets you know exactly how he will perform over the next 15 years. I look in his crystal ball, and see that he will perform at exactly one game below average for his position for every one of the next 15 years. Complete statistical certainty! The only thing he asks for is a guaranteed 15-year contract, with a no-trade clause (his kids like the school system).

I confer with my stat guys, to decide whether to accept his terms. They tell me my only immediate replacement in a "minimum replacement" guy, but there are some free agents who might be available and who might be average (but don't come with crystal balls, and are asking for 3 year, $21 million contracts), a couple of guys in AA and AAA who certainly look like they could add to the team in future. Of course, they could all crash and burn too.

They run the numbers and tell me, "If you sign this guy, you'll be one game below average at that position for the next 15 years. If you don't sign him, you'll probably give that starting job to betwen three to six different guys, some of whom are only 12 years old today. If you add up their combined performance, they should give you about one game above average over the next 15 years."

Oh, I decide. Sign this guy for one game below average, or go into the unknown for an expected one game above average. I tell Player X to take his ball and go elsewhere, I'm not interested.

If you disagree with the above -- to show this idea in starker terms -- consider that Player X brings 24 of his friends with him. They all have their own crystal balls, and they ALL show they will perform exactly one game below average for the next 15 years. You own an expansion team with no one currently under contract. They all want 15 year guaranteed contracts with no-trade clauses. Do you sign them all? They all have "value," don't they?

I know people get hung up on this 'negative value', but each game begins with both teams theorically having a 50% change of winning.

The problem is, theoretically, if some idiot owner had allowed Lefty Grove to pitch for his team until the day Grove died, Lefty's value would have wound up deep in the negatives. It doesn't make any sense to me that the player should be penalized for this stupidity. If anybody should be given a negative, it should be the GM and owner.

(90) Extending it to 25 players makes no sense. IF you had another team of 25 players, each at 1 game above average for the next 15 pennanats, you would win 106 games every year and probably 14 pennants (the '98 Yankees or '01 Mariners would steal one from you.) Yet the gap between the two types is not that great. In today's market, the crystal ball guy is worth a 15 year contract for about $1m-1.5m a year, and you'll take 4 but no more of his buddies (assuming they play appropriately different positions.)

Only a moron would sign them all, but that still doesn't mean that a below-average player doesn't have some value in the real world. Maybe a not a lot, but not negative either.

History is filled with teams that would have loved to have one of those "-1" guys to fill a hole and push them over the top.

I'm afraid I'm torn on this whole debate. I acknowledge that a low replacement level is good in showing that the "-1" guys are better than an above average september call-up or even John Paciorek. The lower baseline is also great for single-season debates where in-season durability is an issue (Ripken vs. Larkin, Mantle-62-MVP). In a single season, its logical to assume your replacement will be fairly low.

Still the higher baseline is nicer for measure "greatness" over a long career. This gives higher relative weight to peak seasons. Also, I agree with KJOK in that as time passes on your "next-year replacement" will regress fairly quickly back to the mean. So when comparing the 10-year player to the 15-year player, there are many advantages to using a higher baseline.

Now, if I could only figure out how to combine those two. :-) Somehow rewarding the peaks of a Barry Larkin or a Willie Stargell, yet acknowledging that each spent a good chunk of nearly every season on the DL. I'll just look at both numbers for now.

The guy who is "1-game below .500" has positive real value; he represents an upgrade for at least 1/3rd of the league, probably more. Nobody is likely to want him for his whole career, but he will play for some organization until they acquire a better player, and he will then move on to another organization. He will not be out of work until he retires, and he will never be near being the worst player at his position in the league. There is no question that he has positive real value.

The HOM dilemma is when such a player has an unusual career shape; no peak combined with unusual durability. The "1-game below .500" full-time player is worth about 16+ Win Shares or 5+ WARP a year, and 20 years of that is over 330 career Win Shares or 1000 career WARP, benchmarks that usually indicate HOM worthiness, at least to the career voter. If such a player manages to play 20-25 years while very rarely being above average, is he a HOMer?

Beckley's not a -1 player for 20 years, he's more of a +1 player for 20 years, which wins you the pennant if you have 25 of them. Opposition pitchers get completely dizzy and disoriented with all the triples!

I'm thinking of Charlie Hough. He'd give you 200 innings on league-average knuckleballs pretty much every year. He retired at age 46, after 25 years and almost 4000 innings. I don't think he'll get any HoM votes.

But what if he were still pitching today? And had started at age 12?

What if, in 2014, he retires at age 66, so instead of 216-216, he was 516-516? He's now the major league leader in wins (and losses), he's kept up that 95-99 ERA+ he had in his 40s, so when he retires his career number is exactly 100.

But, except for a few scattered years in the 1980s, he was never actually that good.