Introducing: CAPS Road Park Factors

Everyone knows Dan Haren is a great pitcher, but could he be even better than we think? (Icon/SMI)

As I’m sure all of our regular readers know (and all the ones Rob Neyer sent over here), a couple of weeks ago I unveiled a new stat that I called CAPS (Context Adjusted Pitching Statistics). If you aren’t familiar, I’d definitely suggest checking out that article, but to briefly summarize, CAPS adjusts a pitcher’s peripheral numbers based on a number of different contexts to give us a better idea about what that pitcher should be expected to do going forward.

Up until now, CAPS adjusted for home ballpark, quality of batters faced, and any league change. Today, I’d like to add one more adjustment to the mix: road ballpark factors.

As we know (and as David Gassko covered thoroughly in this article) ballparks can have a significant effect on just about every stat we fantasy players look at: everything from runs and home runs to strikeouts, walks and ground balls. For the majority of baseball players, we tend to ignore these effects because they remain on the same team from one year to the next. The context remains exactly the same, so it has no bearing on our expectations.

Have you ever considered, though, what the effects might be of all of the games that are played on the road? While a player may play all of his home games in the same environment, his mix of road stadiums will undoubtedly differ from year to year. When most everyone talks about ballpark factors, they talk about the home ballpark impacting the numbers, working under the assumption that the road side is completely neutral. This, however, is simply not true.

A pitcher who happens to throw a disproportionate number of times in PETCO and AT&T Park will be helped in the home run department simply as a matter of context, the same as a pitcher who throws too often in Coors and Chase Field will be hurt. So with the help of Retrosheet, I’ve calculated an individualized “Road Park Factor” for each pitcher using his exact blend of road ballparks (and the time spent in each) for every year back to 2004 and for every stat we care about, neutralized it, and then applied a 2009 factor based on the exact 2009 road schedule for every team.

Method

Only read this if you’re interested in hearing a little more about exactly what I did. If you’re not interested, you know all you need to and can skip to the next section.

The method used is pretty intuitive, but to elaborate just a bit further, I calculated each pitcher’s road park factor by weighting each park he played in depending on the number of opportunities he had to accumulate each stat. To come up with the strikeout factor, for instance, I looked at every non-HBP batter faced. For all types of hits and batted balls, I looked at all fair, contacted balls. (If it makes it easier, think about it in terms of Pizza Cutter’s flow chart.) Once I arrived at the factor, I simply applied it to each pitcher’s road stat line.

The one other note I need to make deals with batted-ball types (ground balls, infield flies, etc.). Because Retrosheet classifies these differently than Baseball Info Solutions does, I wasn’t able to apply these factors to the pitcher’s road stat line. Instead, for batted balls, I had to cut the pitcher’s full-year line in half, applying the road factors to one half and the home factors to the other half. It shouldn’t make that much difference, but it does need to be noted.

After coming up with these factors and neutralizing the player’s line for each year, I then took each team’s 2009 road schedule and combined the ballparks appropriately. I then applied this factor to every year we’ll look at to put all of the numbers into the context of 2009, which is what we care about.

CAPS: Where we’re at

To summarize, the CAPS numbers you’ll be seeing going forward take all of the following into account:

Past home ballpark

2009 home ballpark

Past road ballparks

2009 road ballparks

Past quality of opponents (neutralized)

League switch adjustments

Ground balls adjusted for league average line drive rate (called xGB)

How large is the road ballpark impact?

As I noted earlier, baseball analysts have long ignored road park factors, assuming these things are neutral. While logically we know this isn’t true, could it just be that the effects are so small that this is a fair assumption to make? Let’s take a look at the leaders and trailers for 2008 and find out.

Note: The “a” before each stat in the third column stands for “adjusted.” This is what the player’s stat would look like if it was neutralized for road park. Also, because more strikeouts are good and fewer walks, homers, and hits are bad, the tables are arranged so that the five unluckiest are always on top and the five luckiest are always on the bottom, regardless of stat.

Looking at our four leaderboards (the one on the bottom left represents all singles, doubles and triples, if it isn’t clear), we can see that the effects aren’t huge, but they are there. Obviously the biggest raw differences are seen with strikeouts and hits because they are more numerous to begin with, but these effects are pretty large even in a relative sense.

With 4.5 more strikeouts, Gil Meche‘s K/9 would have jumped 0.2 points from 7.8 to 8.0. Twenty previously unaccounted for points of K/9 is huge. In terms of walks, the effects are much smaller, with Felix Hernandez‘s BB/9 falling from 3.59 to just 3.55 and Miguel Batista‘s from 6.18 to 6.11. Even Ian Snell‘s would only have risen 0.08 points.

Looking at home runs, though, we see some big changes. Aaron Harang‘s HR/FB would have fallen from 15.3 to 14.7, which explains a sizable portion of his unlucky-looking HR/FB this year. It’s very nice to be able to write it off to a specific cause instead of simply to “bad luck” (although it wouldn’t really be wrong to do).

Of course, we’re dealing with the extremes, but you can see that the assumption that road effects are neutral is simply not true. Also, while these effects won’t be very large for many players, the whole point is to add this onto our current CAPS system. When we combine all of the different effects—even if any one is small in isolation—we can see some big differences in value. And that, I believe, is what fantasy leaguers care about. If this can highlight for us just a few undervalued players or help us to avoid a few overvalued ones, this becomes a powerful, powerful tool.

Also of interest (albeit perhaps more to the non-fantasy crowd) is the groupings, which some of you may have picked up on. If you notice, all five of the luckiest in walks are Pirates. Three of the unluckiest with walks are Brewers. The unluckiest with hits are all Red Sox and Angels. Two of the unluckiest with strikeouts are Royals and two are Rockies. As this started as an exercise to determine “divisional park effects” (the inspiration for which came from commenter Nick on the original CAPS article), it’s not surprising to see players of the same team appear on the lists together.

Nothing of much interest here. Lowe’s adjustments are minimal, as will be the case with a lot of players. I normally try writing about the players who are more interesting, but as Lowe is a guy I’m sure many of you have been wondering about … ta-da! As I said earlier, though, the value in the CAPS system won’t be the guys that it values the same, but rather the guys who it sees a big difference in. Check out our next player for a case like that.

Lowe is obviously an extreme groundball pitcher, though he does manage to strike out about as many batters as a league-average pitcher. This has value in fantasy leagues, as does his mid-3.00s ERA. Overall, taking Lowe in the round 12-to-15-area of a traditional, 12-team mixed league should get you a fine player. He seems to struggle a little with home runs, but he keeps his BABIP pretty low, and the move from the Dodgers to the Braves could help. The UZR difference between the two was 4.8 per 150 last year.

As you can see, Haren has had some terrible luck for a few years now. In terms of his strikeout rate, he’s probably the unluckiest pitcher in baseball over the past three years.

This bad luck wasn’t quite as pronounced in past years (and part of those past years’ numbers are due to the league change, so he really shouldn’t have been expected to post them with the A’s), but in 2008 Haren really deserved much better. His strikeout rate was almost a point too low, and his QERA was a ridiculous 2.76. 2008’s actual QERA leader was C.C. Sabathia‘s Brewers stint at 2.89, to put things into perspective.

We shouldn’t expect him to post identical numbers in 2009, but he has steadily risen four years in a row, will be 28 years old, and should have a good deal of luck catching up with him. His current Mock Draft Central ADP is 57.39, which would put him at the end of the fourth round in a 12-team league, though I have seen him go in the sixth. I’m not a fan of taking pitchers that early, but if Haren falls into the eighth or ninth round, I don’t imagine I’ll be passing him up. If the strategy you’re employing allows you to take starters earlier than that, Haren seems like a very good choice.

Concluding thoughts

As I said in the original CAPS article, if you guys have any ideas for further things we could adjust for, feel free to contact me. If you have any questions about CAPS or anything fantasy baseball related, also don’t hesitate.

Errata

In the original CAPS article, I accidentally applied the home ballpark factors to Javier Vazquez‘s entire line instead of just the home side. This has been fixed, and the new CAPS numbers (with road ballpark adjustments included) are displayed below. As you can tell, very little changes, and my evaluation remains the same; Vazquez makes a great fantasy pick this year.

Comments

This is really great stuff. Have you thought about applying this to any projection systems? Ideally, we could apply this stuff to get a projection of a player in a completely neutral environment (adjusting for home park factor, road park factor, quality, etc.) and then take that projection and adjust for those various things.

I’d always thought that it was important to know what road parks a player was playing in. When Matt Holliday was traded this offseason, everyone cited his significant home-road splits in arguing that he could not maintain a high level of performance outside of Colorado. While this is certainly possible, and while there also is a great deal of variability in 200-300 PA samples, I think the fact that Holliday played a significant portion of his road games in pitcher’s paradises like Los Angeles, San Francisco, and San Diego plays factors in as well.

Great piece, Derek. Do you still plan on writing about that experts mock draft from last week? I was just curious about a few selections and also wanted to know if any picks were guys you would not have taken in a real draft.

Andrew,
I’d like to if I can find time. Which selections were you curious about? I could address them individually for you, just in case. Also, I played that draft pretty straight. I was trying out bits of a new-ish strategy, but I don’t think I took anyone who I wouldn’t have taken in a real draft.

Andrew B,
You’re absolutely right that this kind of stuff is important Also, I’ll probably be doing this for hitters sometime in the near future.

I was curious about the following picks: Doumit, Ibanez, and Joba. I’m interested because all of those players are on my keeper league roster, and I’m in the midst of considering off-season trades. Do you think Doumit’s skills from last year were legit? Do you like Ibanez this year in hitter-friendly Philly? Do you see a big year from Joba? Thanks, Derek. I figured you took the draft seriously; I just wanted to make sure.