Thursday, September 1, 2016

Hitter-Controlled/Pitcher-Controlled and Component Top 5's/Bottom 5's

While I was making renovations on the Fanduel spreadsheet, I was researching something - I have no idea what, now - and I stumbled upon sabr.org's archives of The Baseball Analyst and the introduction to the October 1987 issue, written by Bill James.

In it, James observed that the spread of homeruns hit by hitters was greater than the spread of homeruns allowed by pitchers. The worst homerun hitter (in 1986) hit 0 homeruns per 1,000 plate appearances, while the most prolific hit 59.6 per 1,000 - a range of zero to 59.6. Pitchers, meanwhile, allowed between 10.5 (the stingiest) and 44.4 (the most homer-prone) per 1,000 batters faced. In other words, pitchers allowed "a substantially tighter shot pattern", or a smaller standard deviation, of homeruns per plate appearance than hitters hit.

James continued:

"Calculating this for each pitcher and each hitter, you would find that the standard deviation of home runs/1000 plate appearances was something for hitters and something less for pitchers--let's say, at a guess, 6.0 for hitters and 4.0 for pitchers. If the shot pattern (or the standard deviation) was exactly the same for both pitchers and hitters, then one would have to say that the occurrence was 50% controlled by the hitter, and 50% controlled by the pitcher. In this situation, we might say that the home run was 60% controlled by the hitter, and 40% controlled by the pitcher.

"I have thought of doing this for years, but one problem was always getting stable data."

So I thought I would test this with my eight binary components, and fortunately, nearly 30 years later, "getting stable data" is no longer a problem. (Thank you, internet.) But as Bill James pointed out, using single season data is problematic, because

"...a great percentage of the spread of occurrence over the course of a season is actually random
variance....If you study spreads of occurrence over the course of a single season, then, you're going to be comparing a pitcher with 1100 BFP against a hitter with 600 PA, and you're going to get very different variance in the two samples simply because one of them is much larger than the other."

To counteract this, I'll look at statistics for the last ten years - 2006 to 2015. For each component, I'll look at the top 200 batters and top 200 pitchers, ranked in order of opportunities. But "opportunities" means something different depending on the component. For the first component, $BB, opportunities are simply PA (or BF). For the next one, $SO, opportunities are PA/BF minus walks and HBP (or, plate appearances that didn't result in a walk or a hit batter). $SB opportunities are SB + CS (stolen base attempts).

This should give us a stable data set for each component for hitters and for pitchers. For example, Adrian Gonzalez had the most plate appearances in baseball between 2006 and 2015, but since he attempted only ten stolen bases in that time (tied for 601st-most), he should not be included in the data set for $SB. Also, using floating groups of 200 players should even out the spread of opportunities for hitters and pitchers.

The "Opp. Spread" shows the range of opportunities for each data set: the number of opportunities by the batter or pitcher with the 200th most opportunities, followed by the number of opportunities by the batter or pitcher with the most. (Remember, opportunities = PA or BF for $BB.) As you can see, the spread for each component is somewhat similar for batters and pitchers, but wider at each end for pitchers (until you get to $SB).

As it turns out, Bill James' "guess" of a 60/40 split is almost dead-on for $BB, $SO, $H, and $XBH. Ironically, the example he used, homeruns, is much higher - more than 3-to-1 determined by the hitter (although $HR does remove walks and strikeouts from the equation, whereas Bill just used raw HR per PA). $HR is the most hitter-determined component, while $SB is the only component more pitcher-controlled than hitter-controlled. Once a runner takes off, his success depends even more on the pitcher's ability to prevent base-stealing than on his own skill at stealing bases.

So all of this might be useful...if I were planning on creating my own batter and pitcher projections and regressing them to the mean. Luckily, since I'm still partially sane, I'm perfectly happy to trust the good folks at Steamer and the projections they've created.

However, this study did show which batters and pitchers of the last ten years have been the best and worst at each component. I found it interesting, so I thought I'd share.

Ranking players by component allows us to see clearly a batter's strengths. In Joey Votto's case, it mostly confirms what we already knew. His greatness as a hitter comes from totally dominating two components - $BB (3rd) and $H (1st). That, along with very good power (28th in $HR), makes up for his below-average contact (146th in $SO).