/m/sabermetrics

Reader Comments and Retorts

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

It’s just that I’d neglected the possibility that there was another factor besides natural ability and luck that was working in Bonds’ favor.

No apparently it's that you don't understand probability. And apparently not regression to the mean (or maybe that's just poor writing).

Even if we assume Bonds was on steroids and they were effective, why would this make him immune from regression to the mean?

Nor is he particularly good at math. First 88 games if more than half the season -- presumably that was the AS break but if you're going to apply statistical principles, you shouldn't apply baseball definitions of first and second halves. Also 2/3 of 39 is 26 and Bonds ends with 65.

In 1998, McGwire had 37 HR after 81 team games, no additional ones before the AS break and hit 33 in the second, not really any difference. Bond's second half matched McGwire's -- perhaps unlikely but obviously not unprecedented. In 98, Sosa had 32 in the first 81 games, added zero after the break, had 34 in the second half.

Last year Davis had 28 in the first 81 and so 25 in the second. He did add 8 between game 81 and the AS break but pointing out one problem with using the AS break, the AS game was after game 96 last year -- 60% of games. So for a guy who finished with 53 HR, we'd have expected 32 before the AS break when he had 36 ... not a huge effect.

And if 60% of games happened before the AS break, we would expect the guy to hit only 2/3 as many in the second "half" if he maintained the same pace. (I assume 96 games before the AS break is unusually high)

Nor is he particularly good at math. First 88 games if more than half the season -- presumably that was the AS break but if you're going to apply statistical principles, you shouldn't apply baseball definitions of first and second halves. Also 2/3 of 39 is 26 and Bonds ends with 65.

Are you pinching him at both ends here, snot? What'd Bonds have after 81, and can you add 2/3 to that and get 61?

39 - he didn't hit any home runs between Team Game 81 and the All-Star break

As to how you get to 61, it's pretty straightforward. Through 88 games, Bonds had 39 HR's - that's 0.443 HR/game. If his pace for the rest of the season (74 games) was 2/3 of that, it'd be 0.295 HR/G times 74 games = 22 HRs (21.864 if you don't do any rounding at any point in the process). And 39 HRs + 22 HRs = 61 HRs.

The issue was whether Bonds would break the record (70). His expected home runs, according to this guy, was 61, which would leave him 9 short of the record (70). Hence, this guy's prediction that Bonds wouldn't break the home run record. The fact that Bonds's expected HR's equaled Maris's old record is just a coincidence.

Historically, typical league-leaders only hit two-thirds as many home runs in the second half as they did in the first.

"two-thirds as many ..." 2/3 of 39 is 26. If it's 2/3 of the rate, then "2/3 as many" is not quite right and it's rather important to know whether "second half" is 81 games or 74 games. I'd also be curious how many actually hit them at 2/3 the rate vs. how often they continued to hit them at about the same rate (or a sufficient rate) but got hurt -- which would still lead you to predict he'd come up short.

It doesn't. But it does change what you think the mean is.

Possibly (we have no evidence steroids was the main culprit for the increase ... we don't really have any reliable evidence it had any effect) ... but if he hadn't noticed that the mean had changed by 2001 ...

And no matter what the mean is ... if the mean is higher then, yes, the 39 HR is not as far out in the distribution and so the regression towards the mean would be less severe ... but the statement was about the "typical HR leader" ... are we saying that Bonds' 39 was fewer standard deviations above the new mean than the "typical HR leader?" That seems unlikely.

Regardless, statistics is about probability. Therefore the statement would be that Bonds was unlikely (with a specific percentage if you have an estimator) to break the record. Even if he was "wrong" about the mean (bias), the main reason he was "wrong" was that the roll of the dice went against him. If you don't believe in the roll of the dice, you don't believe in statistics and you don't really believe in "regression to the mean."

My issue isn't with the application of statistical principles to conclude that it was highly unlikely Bonds would get the record, it is looking for the explanation -- and offering one without evidence -- that I object to. It's fine to look at big residuals to see if you can detect something missing from your model, but it's pretty impossible to do from a single outlier ... although as I note, both Mac and Sosa seemed to have violated this principle in 98 ... and of course you should test your new model.

Now, I find 18 seasons where a player had at least 32 HR in the "first half" -- defined the way b-r defines it by the AS break although I'm not sure how it's defined for Ruth from 1921 to 1930. I am going to skip Frank Howard because the AS game that year (1969) came after his team's 101st game. (The A's only had 93 games by that point so I'll keep Reggie in).

Of these 17, 6 happened between 1998 and 2001 and that does not include Sosa's 2001. Pujols and Davis have done it since 2001 so it was 6 of 15. Note that three guys had also done it in 94.

Of those 6, 2 ended with over 70, 2 ended with over 60. Griffey and Gonzalez faded. Of those 4 ... Bonds hit 39 first "half", 34 second; Mac 37/33; Sosa 33/33 and 32/31. Sosa added a 29/35 in 2001 and McGwire a 28/37 in 1999.

There have been 17 seasons with a second "half" of 30 or more HR, 7 of those between 1998 and 2001 -- three Sosa, 2 Mac, one Bonds, one Belle. ARod, Howard and Bautista have done it since 2001.

Obviously drawing conclusions from sample sizes of 6 or 13 or 18 or 35 is discouraged. But the basic model doesn't seem to have been working very well from 98-01. Unless you want to pretend that "HR first half leader" is a different population than "big first half HR totals" the basic model was wrong about 6 times over 4 seasons, 4 times in 98-99.

It's not clear the model was working in 94 either. Williams had 33 HR through 89 games then another 10 in 26 games, the same pace. Griffey and Thomas were fading at about the "expected" rate. We'd still expect Williams to miss 62 (he was only on pace for 60.5 as it was) but he was a good bet to not fade to 2/3 the rate he had in the first half (he needed 8 in 47 games to match that).

In 95, seasons started late ... Sosa had 15 HR in 69 games at the break, 21 in 75 games after the break. In 96 he had 27 in 87 games (I think he was the leader but don't know for sure ... a damn fine total regardless) then hit 13 in 37 (a higher pace) before breaking his hand on a HBP. In 1996, McGwire missed the first 18 games of the season but still had 28 HR in 69 team games at the break. He had 24 in 74 games after that which was a slower pace but at about 80% of his first half. Still, he got into only 130 games, 548 PA and 52 HR ... all he needed was a full season. In 1997, he had 31 HR in 89 team games and then 27 in 75 team games (he picked up 2 team games in the trade) which is a slightly higher pace. Griffey was bombing them out in 97 too with 30 in 87 games followed by 26 in 75, the same pace. Sosa led the league in 2000 with "just" 50 ... he hit 23 in 86 games followed by 27 in 76.

So we can add Mac 97, Griffey 97, Sosa 2000 and possibly Mac96, Sosa 96 and Sosa 95 to the list of "wrongs". That could be due to a radically shifted mean that meant a "typical" big HR hitter would hit one every three games or so but, regardless, there wasn't any reason not to notice the "typical" adjustment wasn't working by 2001.

It's possible that applying the "typical leader" rate to atypical HR hitters is a mistake.

(Injuries, etc. certainly count towards prediction ... i.e. p(injury) is sort of automatically adjusted for by focusing on season totals rather than HR/PA or HR/games played rates and such ... which is fine.)

Note, the split finder doesn't seem to let you save a set of players then check stats only for them ... or at least I'm nost sharp enough to do it. I'm too lazy to check all 17 but obviously most of the rest other than Ruth faded. I also don't know how to get p-i to give me AS break (or 81 game) HR leaders.

Therefore the statement would be that Bonds was unlikely (with a specific percentage if you have an estimator) to break the record.

Oh yeah, absolutely. What I wrote (if this isn't clear, I'm the author of the linked article) was meant to say that Bonds was unlikely to break the record, not that it was in any way IMPOSSIBLE for Bonds to break the record. Here's what I wrote later about this in Slate:

"Many people have written me about my assertion in July that "Barry Bonds isn't going to hit 72 home runs," and asked what went wrong with my analysis. Answer: Nothing. In July, it was extremely unlikely that Bonds would break the home run record. One great thing about baseball is that players sometimes accomplish the unlikely. (Ask Tony Womack.) If you bet a hundred bucks at the All-Star Break that Bonds would hit 73 home runs, you made a dumb bet. Now you've got a hundred bucks; it was still a dumb bet."