Minor issues in the air

If you’ve followed my articles over the course of the winter, you probably noticed a theme or two. Minor league pitching and batted ball outcomes. Where we last left off, I was in search of some cases (or cohorts) to help tease out the changes in batted ball outcomes between minor league levels. From the comments:

One of the approaches I’m toying with, certainly not the only, is to start with pitchers who made MLB debuts in 2009 and work back. See if and how they differentiated from peers, and how those peer groups changed, as they progressed. And to repeat the entire cycle with hitters.

I lied. Well, I got sidetracked.

Before going any further with any case or cohort studies, I wanted to understand a bit more about the potential stringer effects in minor league parks. The issue came up back in 2006 at SoSH, as pointed out by a reader. Knowing that Cory Schwartz and the folks at MLBAM work hard to improve these things, I started looking for stringer bias. Some folks expected me to find bias between levels of play (or leagues within levels), but I didn’t. Still, I shot Schwartz a note to make sure training was standard across levels. He assured me it is, but also agreed to look at some numbers I dug up. I want to share those numbers with you.

Searching for bias

Before the end of 2009, I started calculating the stringer effects on batted ball classifications in the major leagues. In short, there’s some substantial variance by park. For example, the stringer(s) in Anaheim (or Los Angeles, wherever the Angels play) are the least likely to score a ball a line drive, tending to classify them as fly balls 25 percent more often than expected, while in Texas, there was a equally large tendency to tag things as line drives. Maybe. Add that to the partially completed work portfolio.

In any case, there does appear to be some bias (subtle or otherwise) in tagging. This is not a surprise. As a matter of fact, before our colleague Colin Wyers left Hardball Times for Baseball Prospectus, he published an important study on press box heights. The height of the press box has a measurable impact on batted ball classifications.

LD_RATE = 0.253415 + PB_HEIGHT * 0.00157926

As you can see, the pieces of a park correction for batted ball types are coming together. I also expect something like HITf/x to come along, providing better information, before this puzzle is solved.

The three years of minor league data I’ve been working with (all published after the SoSH study on bias) may also contain press box effects and other sources of bias. Rolling up the fly ball-to-line drive ratios for each minor league team and league since 2007, I found the following spread of results:

Fly ball-to-line drive ratios by minor league, 2007-2009

Click the image for a full version. As you can see, the levels (in gray) show the basic pattern of more line drives as you advance, as previously discussed. But there are some interesting variances between leagues at some levels. Up to this point, my focus was at the three-year aggregated level. After looking at this grid, Schwartz suggested I go deeper. I had already calculated park-by-park ratios, but it was year-by-year that interested him.

There was an expected result—less variance between parks as time progresses. This makes sense, as MLBAM is continually improving and refining its products. For the most part, my results were supporting that notion. But I did find something somewhat alarming, and MLBAM will surely be addressing the issue in 2010. The alarming issue? In a word, Huntsville. In a sentence, the Huntsville stringers stopped producing line drive tags early in 2009, using fly ball and ground ball only. Either Huntsville has some spectacular pitchers and horrific hitters, or something in the data chain broke down.

Fly ball-to-line drive ratios by Southern League teams, 2007-2009

Huntsville’s 2009 value was 14.34—way off the charts. Otherwise, with some exceptions, there appears to be some convergence. If you’re curious about the Southern League—or any other—you can play around with this spreadsheet. Feel free to publish your findings/musings. Just link back here with appropriate credit.

The numbers in the Excel file are raw data, no park adjustments are made.

So what?

My first attempt to make sense of the year-by-year and park-by-park differences is in progress. It will be published elsewhere, so look for a THT Live hit from me on the matter sometime this spring. I’ll also follow-up in this space as the minor league season progresses. And, before Opening Day, I plan to have a Q&A with Cory Schwartz about Gameday 2010, Bloomberg and, of course, stringers.