“That doesn’t mean that I am going to change the same tone I have been using for 30 years. What the hell is the difference? If you guys can’t separate tone from substance, that is your problem not mine. Stop being such whiners about tone.” –MGL

Monthly Archives: November 2011

Yesterday, Brandon Warne took a trip down memory lane, reminiscing on his favorite deaths in baseball history. He couldn’t quite remember them all. But the commenters at FanGraphs were helpful, as always.

To thwart those hoping to call me a hypocrite, I’ll cut straight to the chase. Matt Lentzner, in his most recent Baseball ProGUESTus column, wastes several hundred words and a whole article to express what I can do in 26: Sample size is important; please include it. And wouldn’t it be cool to have the sample size this stat stabilizes at (according to Pizza Cutter), too?

Wait, I know I can do better. sample size rulez. context iz cool 2. That’s seven words and only 38 characters, leaving plenty of room for sweet hashtags, #imisspizzacutter. If this is your topic for a column on likely the biggest stage in sabermetrics, just sit on it and try for something more interesting and original.

As always, the BP commenters were right on point. skyojohnny chimed in first:

This is one of the most important articles I have ever read in BP and I bet that it will also be one of the most overlooked.

#smh

By the way, it’s bad enough BP couldn’t come up anything more clever than “Proguestus” for their guest writers’ column. That has to be literally the first idea they came up with. To spend time brainstorming names and ending up with something so banal would make me sad. But to capitalize the “guest” is insulting. Because I’m so dumb, I wouldn’t get it otherwise. Thanks, guys.

Apparently, Major League Baseball had their postseason last month. I was travelling in Treasure Island, Ontario for work and without internet access, so I missed it. A lot of SABR drama flared up while I was gone–too much for me to properly deal with here, unfortunately. The best I can do is publish this compilation of MGL’s best comments from October, 2011, but without any of the snappy backtalk you’ve come to expect from Praiseball Bospectus.

They say that legends are born in October, but only one man can reign over it. MGL easily surpassed 40,000 words (or enough for a short novel) written in comment threads last month. And I’m sure there are even more great MGLian comments not included here that I missed (especially if not posted on The Book Blog). I’d like to add them if you’d be kind enough to post a link in the comments. So, please help me out.

If I could give a manager a piece of paper with the answer to all of these decisions, I would be correct 90%+ of the time and a lot more than the manager would. A lot. I would miss some of the intangibles for sure, but those would pale in comparison to the “numbers” behind my decisions. I would add at least one win a team’s WE, thus I should be paid 5 mil or more…

#5 and #6, and because you think it, that makes it right? You want to accept that bet also?

I’ll bet you won’t accept that bet. That is because when people who have little expertise on a matter have an opinion on that matter and those opinions are not supported by evidence, they never take those bets. I wonder why?

You see, anything that I posture on this blog, I will always stand by it, because it is almost always based on evidence or my experience, knowledge, or expertise which has been gleaned by evidence. I learned a long time ago that my opinions without a solid base aren’t worth jack…

So, in the original thread, did I say anything egregious or is the entire post egregious enough to warrant the vitriol on BBTF and even here?

I made the point about Punto, which I think is correct, at least according to my sim, and a 1.5% WE is pretty big. And I have not heard anyone refute that with evidence other than stupid batter/pitcher matchups which we have already discussed (and hopefully put to rest), or the proverbial, “The manager must know something that you don’t know.”

I made the point about the two bunts, which I think are correct, and, again, I have not heard any refutation with evidence. A few people excoriated me for saying that Carp would have been safe anyway, which is debatable (or not) but completely irrelevant to my argument. I simply said that with him running, the bunt win expectancy is going to be very poor compared to the WE from batting. That is because a bunt is always marginal. Throw in a poor/slow runner on the bases such that he is going to get forced a significant percentage of the time, and the bunt is not likely to be correct, even against a very good pitcher. Any arguments there?

I said that bunting with a 2-0 count, when the bunt at the outset of the PA was probably bad, is an egregious error, and I am pretty confident it is, and I have not heard any refutations on that either. Problems with that?

I said that not pinch hitting for Carp in the 8th was bad, but I did not harp on that. I still think the numbers will show that was bad. I admit that probably no manager would have done that – although that does not make it correct. I am strictly speaking of mathematically correct things, and not what would make the manager look good or bad. That is not my job to determine that.

And finally (the 4th or 5th thing I criticized), I said that bringing in Motte was correct. It looks like that may be a tossup, but with Carp not being a top tier starter (according to my projections and others – see ZIPS, Oliver, Pecota, Steamer, etc.) and with Motte being a very good closer, I think that bringing in Motte IS the correct choice, but perhaps marginally so. Again, whether a manager “should” do that is not my business. I am talking strictly numbers.

So why the universal hate, condemnation, criticism, mockery, etc.?

Someone please explain what I did to deserve that? And I am taking about substance and not tone. If someone wants to criticize me for my tone, so be it. I don’t give a hoot about that. Those are all ad hominem arguments anyway for people who sadly have nothing substantive to contribute…

It should be fairly easy to figure out what is happening in the 9th, but I can’t do that now.

McCoy, you can do all the speculation and “thinking” you want, and you might be right about conclusions, but without any numbers, they are meaningless. Sorry. Either one strategy or the other yields a greater win expectancy, or it is close in which case I have no problem yielding to the gut, experience, instinct, etc. But, unfortunately, you can’t figure out the answer without “running the numbers.”

What I find especially outrageous, almost scandalous, is that someone could actually write, as in #72, that I said, “TLR is a terrible manager because he let Carpenter pitch the 9th, ” when the title of the post was hyperbolic, as titles or headlines often are, and that the last example of several gave of Tony’s mistakes was leaving Carp in to pitch the 9th. I suppose a loose characterization of my post is, “TLR is a terrible manager because (insert anything I happened to mention in the post),” but in my opinion that is classic spin, mischaracterization, taking words out of context, etc in order to launch an ad hominem attack and deflect attention from the issues at hand and is otherwise totally uncalled for.

Circle, your “English” explanation is useless. Do you really think that manager’s “experience, intuition, and expertise” can figure out the right answer? If you do, then you are dreaming. I mean managers make ridiculous, silly mistakes all the time, believing in erroneous things like the hot hand, and batter/pitcher matchups, and other small sample nonsense. Do you think that they magically become genius savants when it comes time to figure out when to take out their starter and when to bring in a reliever.

I certainly agree that if it is a tossup, you probably want your starter in order to save your reliever for a possible extra inning game or for tomorrow.

But, if your starting pitcher comes to bat late in a close game, and it is not an obvious bunt situation (with no outs), and especially with runners on base or leading off an inning, since the pitching aspect alone is a tossup (presumably), then it is a no-brainer as far as pinch hitting is concerned since there is always a large difference in WE and RE between a pitcher hitting and a pinch hitter hitting in a high leverage situation.

All the nonsense about, “The manager believes that the starter can shut them down, and the other team is not making good contact, and the team is ahead in the game anyway,” is just that – nonsense. All those “English” explanations will not help to facilitate the right answers in any way, shape or form.

Anyway, I have not read Max’s article yet, but I agree that there needs to be a lot more investigation and controlling of all the variables before we declare it a tossup…

As I said, I love to be proven wrong, since that gives me an opportunity to learn something. However, after all the bashing I incurred at BBTF, I think it only fair for them to see this new research. Not that I wouldn’t get bashed again. After all, it is BBTF.

As well, I think I learned a lot from this research, even though in the end, I think I was vindicated, which is kind of a silly notion anyway. As I have always said, and Guy put it aptly, my opinions are almost always informed. Sometimes they are specifically supported by evidence and sometimes they are not. They are never, however, out of my a**, as most “lay” opinions on sports are. After all, I am an expert in the field of sports analysis. You would think I was a lay journalist opining on sabermetrics, like Jayson Stark, Buster Olney, or Murray Chass, if you read the comments on BBTF.

I also encourage other people to do similar research. For example, why is it that wOBA is so much higher when the game is close? I’m sure we can speculate, but without looking at the components and perhaps even the pitch f/x data, I don’t think we can be too sure of anything in that regard.

I would also like to see how pitching with runners on base comes into play. The starters obviously always started the 9th, but the reliever data I looked at was anytime during the inning. It could be when they started the 9th or when they came in in the middle of the inning, often with runners on base.

As well, although I didn’t adjust for platoon issues, the relievers definitely faced more same-handed batters, especially when the pitching team was losing, suggesting that they were brought in specifically to face same-handed batters at some point in the inning. This needs to be looked at too.

So I don’t think that the story is over, although I think we found a very significant factor that was affecting the data in the prior research…

DavidS, because of the small samples in each category for the starters, it is almost inevitable that there will NOT be a smooth transition and pattern, which is one reason why I broke it down into only 2 categories at the end.

#5, is the Don Malcomb of Big Bad Baseball? My, you’ve come a long way down. Do you enjoy attacking me for little reason? Is it professional jealousy?

As always, if you don’t have anything substantive, intelligent, or otherwise valuable to say in my house, I’d prefer you stay out…

5 out of 9 continuing to mock me while adding nothing to the discussion.

And one with this:

“I don’t need to see any research to show me that good starting pitchers who have pitched strongly and efficiently through the first eight innings should be kept in to try and complete the game. I suspect that almost every serious baseball fan in America knows this instinctively.”

And if anyone makes any reference to the outcome or result of a certain strategy in terms of evaluating or even mentioning it (as a mistake), you are going to be banned for life!

You have no idea how many people in BBTF, said something like, “You are an idiot for suggesting that Carpenter should not have bat in the 8th or pitch in the 9th,” because he got a hit and retired the side in the 9th. Of course calling me an idiot on BBTF is nothing new…

I was curious about Jay batting second. Andrus batted second all season long. Surely not a good choice but a least it was not an example of a manager “doing something different” for a bad reason, which happens all the time, because they , I guess to show that they actually have a difficult job (in terms of lineups and in-game managing), which they don’t, in my opinion. I think I can train a 12 year old to manager a baseball game. Oh, wait, I forgot about the “double switch” in the NL. Too tricky for a 12 year old…

Some of you guys are going to be surprised at actually how many mistakes a manager can make in a game or series, especially in the post-season when managers think they have to do “something.” I have been mentally noting these mistakes for 25 years. Maybe that is why I am so ornery…

I guess you have not read all the work I did in the other thread! You still think that you can tell whether a pitcher is “pitching well” enough to continue. Managers can’t do that. You should be a manager!

Right, hitters occasionally do that to fool the umpire, but in this case there was no way that was acting.

“Pujols has been IBB’d 4 times in the playoffs. All 4 times the inning has ended with Matt Holliday and no runs scored.

Regardless of the good/bad decision aspect to it, it will likely continue until Holliday makes them pay.”

That is true. That is one reason that managers make so many bad decision…

Circle, you’re everything that a manager is, which is not necessarily a bad thing. SOME of your insight, knowledge, and experience is valuable on this blog.

However, a big part of sabermetrics, at least a vestige of it, is to show all the things that managers (and most people in general) believe that are simply not true. Rather than keep digging your heels in, learn something. You must be on this site for a reason other than to keep telling us that conventional wisdom is right and we are wrong. Or, more along the lines of your tone, “Yeah, you guys might be right, but…”

Your “buts” HAVE TO HAVE EVIDENCE for anyone to take them seriously. If you don’t have the expertise to do the research, then cite research from someone who does. If not, and I mean this teasingly, “Shuddupp!”

You say, “If it were me, I would pitch him at home. The splits are so large they must mean something.”

How about, “I think the world is flat and that we, as human beings, were spawned by aliens who landed here a long time ago. And we never landed on the moon and 9/11 was a conspiracy by the U.S. government?”

These are opinions same as yours. Evidence? Nah. Don’t pay them any mind and I won’t pay yours. Deal?

Tango, #6, I never thought it was that difficult to plot a pitch by eye, despite what some of the pitch f/x guys say. Especially if you watch a lot of games and you mentally adjust for the slightly offset camera angle (some broadcasts are more centered than others)…

I thought Moreland was traded to the Cardinals before the game started?

I didn’t really watch Pujols on that fly ball, but of course he should be running it out in the freaking WS! He is one of those guys that almost never runs out balls that are likely to be outs. I don’t care how good he is or how much money he makes. If I am the manager, he either runs them out or doesn’t play. If nothing else, it is a poor model for the younger guys.

So why was Napoli batting so low early in the season? He was an excellent hitter going into the season…

Westbrook is a terrible starter, but as a reliever, maybe he is 1 run better. Only TLR knows that. As you said, Phil, you gotta choose the guy who has the most K, you don’t mind walks. You want a guy who misses bats.

You bring in a high K/BB guy, and if Young walks, you bring in Westbrook, the sinker-baller.

Ah, I should have been a manager or pitching coach!

Remember I said, There would probably be a mistake before the game ends. There were maybe 5 mistakes.

Circle, that throw was terrible. Probably 15 feet off line. I don’t even think that Pujols touched the ball…

Not thin air. Nothing I say is out of thin air. You can disagree of course. But everything I say is based on 25 years of sabermetric research, mine and others’.

I have Feldman projected as a 4.32 starter (average starter is around 4.08), which is actually not that bad. I take back what I said about him being “terrible”, although I do think he is worse than that as a starter based on what I have seen of him (but I don’t intend anyone to take that seriously) and ZIPS, Oliver, Steamer, and Pecota have him as around a 4.62 which is very poor. So the concensus is probably that he is a mediocre starter at best.

As a reliever, we usually just subtract around 1 run per 9, although I subtract .82 (from the research I have done). And my projection is based on his starts and relief appearances, with each one adjusted.

Thanks! With that attitude, you would never be allowed on BTTF (Bash The “The Book” Fan”.

BTW, swapping Napoli with Young is more advantageous than swapping Andrus with Beltre. The former generates an extra 7 runs per season, while the latter only adds .5 run per season. So maybe having Andrus at #2 is not all that bad.

If we just switch Andrus with Napoli though (G-d forbid we bat a slow power hitter in the 2 hole), and kill 2 birds with one small stone, we get 12 extra runs a game or 1.6 wins a year…

I’m confused. We (saberists) get criticized all the time for not taking into consideration (including in our models/formulas) things like, “Hammy is hurt and may not be able to turn or catch up to Motte’s fastball.”

Yet, LaRussa, arguably one of the best at seeing and utilizing things like that, takes Motte out.

Wow. Without actually applying some NUMBERS to each of those options, it is impossible to know which one is correct (yields the highest WP for the Cards). Each of us can have an “opinion” on which one is correct, but without NUMBERS, I am afraid that opinion isn’t worth much.

I’ve been doing these kinds of analyses for 25 years and I have no idea which one is correct. I suspect that D might be, but you/I would have to figure out how much the bad “D” in the field costs in WP as well as removing another player (decent chance for a tie game and extra innings). It is really complicated to figure all this out, but it can be done (approximated at least, so we have SOME idea as to which option might have been correct).

I am completely agnostic as far as walking the bases loaded. Normally you never do that with 0 outs (other than perhaps in the bottom of the 9th in a tie game), but here, I don’t know. I don’t think (no, I KNOW) that Dave or anyone else knows without “running the numbers.”

I’m also not sure why Dave is obsessed with the Cards trying to make Hamilton hit the ball on the ground. I have no idea whether that would be better or worse than a fly ball. Lots of fly balls don’t score the runner at third (short ones of course) and lots and lots of fly balls don’t move the runner to third. Same thing with ground balls. Some are base hits, some move both runner, etc. I don’t remember if the IF was playing in or not (probably not), but even if it were, I’m STILL not sure whether a fly ball or ground ball would be better. IOW, I’m not sure whether I want a GB or FB pitcher to pitch, everything else being equal.

Also, I am very uncomfortable when an analyst gets to choose which sample he wants to present to support his point or his opinion. This year only? Last 2 years? 3 years? Career? Lately, as in last half season? You should not be allowed to do that, for obvious reasons (cherry picking your evidence makes your arguments intellectually dishonest, or misleading at best).

For example, Dave said this:

“While Hamilton’s strikeout rate against LHPs jumps to 22.1%, Rhodes K% against LHBs this year was just 16.1%. His career numbers are much better, but he’s not the same pitcher he was a few years ago, and Hamilton had hit an outfield fly against him the night before.”

Yes, he is not the same pitchers, but if this year his K% was higher than his career numbers, Dave would obviously be quoting his career numbers (heck, I would to if I had the choice!). The analysts should NOT have the choice. He should always be quoting a projection which is some kind of weighted career average!

And for the last part of that last sentence, about Hammy hitting a fly ball the night before, David should get immediately thrown into the MGL jail. I can’t believe he even said that in that context. Shame on you Dave!

I think there is about a zero chance of Hamilton squeezing just because Motte is playing third. That should not even be in the analysis unless you want to use it as a tie breaker in a dead heat.

Finally, Davis didn’t even mention one of the most important part of the puzzle, and one that made LaRussa’s decision likely awful. Lynn is a terrible pitcher! I don’t care how he has done lately (has it been good). I an other projections experts have him as near replacement level! If you know that he is going to bring in Lynn, as opposed to say, Dotel, then it is a no-brainer not taking out Motte (or putting him in the field).

So while David definitely brought up most of the relevant facts in order to determine which option was best, I don’t think that any of us is any closer to the answer….

Have you ever known anyone in “real” life who does a lot of unconventional things, even when some of them are incorrect, just because they think they are smarter than everyone else and in order to prove that, they have to do things differently? I do.

That is LaRussa in a nutshell. I’ve said this for many years. When I worked for the Cards and I met him for the first time, he basically laughed me out of the room (he has no use for sabermetrics or sabermetricians – none at all)…

People still don’t get it. If you have little or no predictive value in the population, as we found in the book, than, no matter what you THINK you know, and no matter what SEEMS to make sense, the sample size barely batters.

We can talk about this until we are blue in the face, and we can explain it in detail in The Book, but we still get this:

Oh yeah, I understand, but:

“The match-up numbers became large enough to tell something.”

Tango, I don’t like that you undersell the lack of predictive value we found with batter/pitcher matchups. I mean, you even went so far as to try and increase the size of the samples by using “pitcher families” (OK, not quite the same thing as facing one particular pitcher), and STILL we found nothing.

Can we please put this to rest? No number of PA gives us any more than de minimus value (at most – it still might be NOTHING – given that we found nothing). Use it as a tie breaker if you want – I don’t care. But please don’t use it otherwise as a decision maker no matter how many PA it is based upon (obviously you can’t have hundreds of PA and even then, there is ZERO evidence that that would have predictive value).

Guys: If 50 or 100 PA have any practical predictive value, then we would find it in 20 or 30 PA. We didn’t. We found nothing. Nothing. N-O-T-H-I-N-G.

So, please, pretty please, stop telling us about Earl Weaver and your grandmother who are so smart that they wait for 20 or 30 PA. We looked at lots of guys who had 20 or 30 PA and found zilch. Nada.

We tell you (in The Book) how much to regress clutch (a lot) given a certain sample size. We tell you how much to regress pitcher BABIP. We tell you how much to regress windup/stretch splits. We tell you how much to regress RHB platoon splits. All of these are a lot even with a fairly large sample size.

We didn’t tell you how much to regress batter/pitcher matchups. Do you know why? Because we found NO predictive value, i.e., no “skill”, i.e., all the various unusual results you see are likely due to random fluctuation, at ANY sample size.

And again, for the 10 thousandth time, you cannot use the argument, “You found nothing at 30 PA, but at 50 PA, there IS something.” If you find nothing at 30 PA, as long as your sample of players is reasonably large, then there is nothing or almost nothing at 50 PA or 100 PA.

Do we know all of this for a fact – i.e. with 100% certainty? No! We know virtually nothing for a fact when they are based on inferences derived from sample data, which almost all sabermetric tenets are…

Bill, I appreciate the input. I also happened to have played high stakes poker for many years – quite successfully as you might imagine. So I kind of know what I am talking about when it comes to poker! ;-)

I have not RTFA yet, but I love the idea of estimating true talent from observed data for anything, especially pitch type and location data.

I have thought about this for a long time. It is especially important for advanced scouting and I think that many teams ignore this fact. For example, if two batters are observed to have exactly the same hot and cold patterns, and they are quite far from league average, but one has 400 PA and the other has 4,000, you would pitch each one quite differently (the one with 400, you would pitch him much closer to that of the league average player). I’m not sure teams do that. Same for defending a batter based on spray charts. Those spray charts have to be regressed!

So basically what we have already said a million times before: “Use it as a tie-breaker and nothing else.”

Yes, I would like to see how much to regress the batter-pitcher results toward the expected results (for a given number of PA of course) or how much to regress the expected results toward the batter-pitcher sample.

Colin one thing: I assume (again, I have not RTFA yet) you did not control for GB/FB platoon. If you do that, I suspect that the entire predictive value might disappear. Can you perhaps tell us in the extreme cases on both ends what the average GB /FB ratio is for the pitchers and batters?