Experiment: SCOUT+ Batting Leaderboards
For the past year-plus, I’ve frequently published in these pages what I’ve called the “SCOUT leaderboards” for winter leagues and (recently) spring training. I’m quoting myself when I write that “SCOUT represents an attempt to derive something meaningful from small samples” and is the average of a player’s standard deviations from the league mean (or, z-score) either in regressed strikeout and walk rate (for pitchers) or regressed home-run rate, walk rate, and strikeout rate (for hitters). SCOUT builds off of work done by Pizza Cutter on when samples for different stats become reliable. By taking a batter’s strikeout rate, for example, after X plate appearances and figuring in the league-average strikeout rate for the remaining plate appearances — up to the reliable sample size for strikeout rate — we’re able to reach a conservative estimate of what that batter’s “true talent” strikeout rate is. (Click here for more on SCOUT.)

Yesterday, I wondered aloud whether it made sense to continue including walk rate in SCOUT for hitters. “There seems to be,” I suggsted “a significant-enough population of hitters who’re able to post high-ish walk rates against minor-league (and, presumably, spring-training) pitching based largely on selectivity, but whose walk rates decline considerably when they face more talented major-league pitchers.” If walk rates dry up upon reaching the major leagues, it follows that they shouldn’t be included in a metric designed to make some kind of comment on a player’s future.

A couple of readers suggested that this might not be the case, at all, however — and, in fact, a recent (and excellent) study by Chris St. John at the Platoon Advantage reveals that minor-league walk rate is actually a useful tool in attempting to analyze a prospect’s chances for major-league success. (Note that I say prospect. There are still likely older minor leaguers who, by virtue of experience, are able to post comparatively high walk rates in the high minors.) St. John’s work has convinced me that SCOUT should include walk rate.

Simulataneous to this, I’ve wondered less aloud whether it might make sense to weight the various elements of SCOUT. Walk rate might be important, but home-run rate is surely more important — and yet, per SCOUT, a batter at 0.5 standard deviations above the league mean in expected walk rate would be valued as highly as a player at 0.5 standard deviations above the league mean in expected home run rate.

In response to that, I submit this experiment: a version of SCOUT called “SCOUT+.” SCOUT+ builds off of work by Bradley Woodrum from last August. In a piece by Woodrum called “Defensive Independent Hitting, Or ShH” (the last bit standing for “Should Hit”), Woodrum found that using the three variables found in FIP — using those plus expected BABIP — that one could predict a batter’s “true talent” wRC+ with some accuracy.

True-talent BABIP is, of course, not something that we can reasonably predict for winter leagues or spring training; the other elements of ShH, however, have already served as the basis for SCOUT.

Accordingly, I’ve used my limited spreadsheeting skills to calculate what we’ll call SCOUT+ for the moment. SCOUT+ is essentially a heavily regressed estimate of what a player’s wRC+ should be — again, minus BABIP. By definition, this will undervalue players who are capable of sustaining higher BABIPs and overvalue players whose “true talent” BABIP is lower than league average. Put another way, SCOUT+ will likely undervalue players who hit the ball hard and/or are fast, while overvaluing players who are either slow or possesses an extreme fly-ball approach.

To calculate SCOUT+, I’ve used a simplified form of Woodrum’s equation — specifically, as proferred by Tom Tango, (12.3*xHR% + 3*xBB% – 2*xK% ) * 92, where xHR%, xBB%, and xK% stand for expected home run, walk, and strikeout rate. To that result, I’ve added a constant that sets the average for all players in the sample at 100. The results seem reasonable, and SCOUT+ appears to account for the different values between home runs and walks and strikeouts in a way that plain SCOUT did not.

Below is the SCOUT+ batting leaderboard for spring training, for the 149 batters who’d recorded at least 22 spring-training at-bats as of Thursday afternoon. Note, of course, that the samples in question are very small and that the results should be regarded with due restraint.

What is that meaningful thing, performance or projection? Since these games don’t count it seems like it must be the latter. Is it too soon to have gone back and see whether SCOUT excellence correlates with MLB excellence?

The project started as means of providing an alternative to a standard slash-line leaderboard. With regard to the AFL, in particular, it’s pretty frequent that a beat writer or blogger will refer to a prospect’s slash-line as an indication that he’s doing well. Of course, because the AFL is generally hitter-friendly, it’s the case that almost every player’s slash-line is excellent. SCOUT is a means of comparing the players against each other, to get a sense of how well each is actually performing.

That’s all I intend by “meaningful,” really — that it’s an improvement upon other leaderboards you’ll see for leagues (like the winter leagues, like spring training) that are rife with small sample sizes.

Insofar as we have years of data telling us that Hector Luna is only slightly better than a replacement-level hitter, it’s wise not to conclude much from his 26 springs PAs. On the other hand, despite his .833 spring OPS and the attendant fanfare, Brandon Wood appears to be hitting exactly like Brandon Wood so far. And for the prospects, SCOUT helps to highlight a guy like Jefry Marte, maybe, who’s been around long enough to have already fallen off some prospect lists, but is still just entering his age-21 season and showed good underlying skills this past fall.

Thanks, Carson. That’s a measured, intelligent response. My opinion is that you’re claiming SCOUT(+) does exactly what it actually does. This is a fruitful direction for future sabrmetric research, in my opinion, small-bore stats that provide incremental improvement on what already exists rather than attempts to find global narratives in single numbers.

I really appreciate the attempt at this. This kind of problem seems to make more sabermetricians wave their hand dismissively, say “small sample size” and move on. I think the way you’re defining this, and how it should be used, is very well explained.

I do wonder if this controls enough for the variance in pitchers that the hitters will see, though. Between raw new guys, veteran retreads trying to prove something, guys working on specific pitches and approaches without caring about the outcomes, etc. the quality of opponent seems to be all over the map.

If one’s frequent spring training partner is the B-squad AA/AAA guys from the Padres, then total spring training results might be a bit more skewed than if one is playing most games against Phillies starters (ok, bad example, Halladay is off to a rough spring) or something.

Also, some guys might be looking better than they should because the in-game strategy isn’t the same. In as much as a team uses scouting reports in the regular season, they may not be using them as much in spring training, in favor of working on specific pitches. Say Felix wants to reintroduce his slider into his repetoire again (because in a recent interview, he says he does). He’s probably throwing it more often, in more counts, to different guys, than he normally would. He’s trying to spot the slider and repeat mechanics, not pitch with intent to attack that specific hitter’s weakness on the “right” count. So it doesn’t mean as much if the hitter guesses right or hits one that straightens out a bit.

In other words: Doesn’t the small sample size and wide, wide variance in talent and game approach, skew the level of pitching competition that hitters see?

Having listened to my first Podcast last night, I think I understand Scout+, or at least its place here at Fangraphs.

Horticulture is hard and the most beguiling species are the hardest. The best gardeners nurture patiently and flexibly; aerating deep tap roots that produce, say, fruit of calculus, or patiently stringing drip lines along lateral roots and then benignly accepting crypto-occult mystic brambles, or lovingly misting aeroponic roots and then bemusedly chuckling over the ugly bloom of alchemy. Flexibility, because the best gardeners know they aren’t in charge, the species’ innate process dictates whether you get delicious fruit or teosinte. And, like all of us, the best gardeners know that waste fertigates.

Fangraphs is the gardener and Carson is the species. Scout+ is something.

(Incidentally, the only species as broadly “spacey-but-together” as Carson on the podcast are, in my experience, (1) St Johns (Maryland, New Mexico) graduates, (2) Classicists, (3) advanced degree holders who took lots of acid, and (4) professionals multi-tasking through conference calls but earning their exorbitant rates nonetheless.)