You can’t wander too far into any of the analytically inclined baseball sites these days without encountering park factors. They are used to characterize hitting and pitching environments, so the impact of different park dimensions and other factors can be quantified and compensated for in key statistics, allowing better evaluations and comparisons of player performance.

There are different equations for park factors, but all are based on a comparison of results achieved by the home team at the park in question, along with the home team’s results achieved in other parks. One typical calculation for the park factor for home runs (HRPF) would be as follows (source: ESPN):

The premise of this calculation is to include per-game stats for a team in its home park and on the road, and compare the two, with the difference being attributed to the team’s home park.

What’s wrong with the current HRPF calculation?

Perhaps this can be answered with a single example: in 2002, the HRPF for Chase Field (then Bank One Ballpark) was 48, and then the very next year it rose to 116. Need I say more?

This is an extreme case, but many more examples of large year-to-year swings can be found. No sane person could believe that it was really the park that changed; obviously something else is at work here. Perhaps the Diamondbacks pitchers (or the visiting pitchers) as a group surrendered fewer fly balls in 2002; perhaps the weather in 2003 was more favorable for homers (unlikely, though, since the roof at the BOB was closed for the majority of games there that year); maybe the baseballs used at the BOB were significantly livelier in 2003 than in the previous year (more on this later).

However, the most likely explanation for huge year-to-year swings in HRPF is random variation of the performances of the pitchers and hitters. The equation counts individual events authored by human beings who perform differently from one day to the next; if enough teams play enough seasons you will eventually end up with a one-year swing like the one at Bank One Ballpark in 2002-03, even without any other factors that might skew the park factor.

Unfortunately there are LOTS of uncontrolled noise factors that influence the current park factor calculations. Here are a few:

Atmospherics: conditions will vary from park to park due to differences in climate, and the weather varies from month to month and day to day at the same park.

Roster makeup: a team’s lineup might be heavily skewed towards either right- or left-handed power hitters, magnifying the impact of asymmetry in their home park.

In-season roster changes, injuries or discretionary days off: these might lead to a slugger playing more games at home than away, or vice versa.

Ball characteristics: baseballs stored in the dry heat of Phoenix will go farther than balls stored in higher humidity environments (naturally as at Dolphins Stadium, or artificially as at Coors Field)

There are two other factors that are controllable (by the league), but are currently configured to introduce significant differences between home and away home run totals:

Interleague play: because the designated hitter is allowed only in AL parks, AL teams play with a weaker lineup in about 1/8 of their away games than they do at home, while NL teams play with a stronger lineup in one-eighth of their away games than they use at home.

The unbalanced schedule: because teams play more games against their division rivals than against other clubs in their league, their home/away home run numbers may be distorted. This effect is most pronounced when all of a team’s division rivals play in parks that are on the opposite side of 100 on the HRPF spectrum (as was the case in 2006 for Boston, Chicago (AL), Texas and Philadelphia). That team’s HRPF trend (calculated the traditional way) will be magnified, e.g. Boston’s low HRPF is made even lower because their AL East rivals all play in high HRPF parks.

Multi-year Park Factors: Still Flawed

A much better way of judging a park’s propensity for home runs is to use a multi-year average. A multi-year park factor washes away many of the problems listed above, but not all of them.

The extreme individual performances (e.g. Cody Ross’ three-homer game on Sept. 11, 2006 at Dolphins Stadium) will largely balance out, but the roster makeup factor (left-handed sluggers at Yankee Stadium, right-handed sluggers at Fenway Park) will typically still be there, as will the ball characteristics factor. And of course, interleague play and the unbalanced schedule are still significant factors.

Weather effects will typically be reduced, but not entirely removed; day to day variation will be smoothed out, but some weather patterns run on cycles longer than a season. Finally, the building of numerous new ballparks, and recent fence changes on existing parks (Citizens Bank Park, PETCO Park, Miller Park, Comerica Park, Kauffman Stadium, Minute Maid Park) render some of the multi-year park factors problematic for the near-term future.

Overall, we should expect and demand more of a park factor. So, if we’re thinking about improving the HRPF metric, what would we change? Some thoughts:

HRPF should not change if the park itself hasn’t changed.

HRPF should describe the impact of the park itself, not the performances of the players who reside in it or visit it.

HRPF should incorporate atmospheric factors such as wind and temperature that change from game to game.

HRPF should include individual sub-factors for different hit directions (LF, CF, RF) rather than different hitter types (LH, RH).

One possible approach would be to use a carefully calibrated hitting machine to “hit” balls in each park, much as the PGA’s “Iron Byron” is used to test golf equipment with a completely consistent swing. Such a device would be an improvement, but apart from the time and expense of traveling to all 30 MLB parks to do this testing, a physical test would still leave the noise factors of temperature and wind uncontrolled.

A better idea would be to conduct the tests virtually, where the noise factors can be eliminated, leaving only the “signal” we seek. This can be done if we have just three things:

A fully controllable baseball trajectory simulator that includes atmospheric conditions.

Accurate scale models of all 30 MLB parks.

Someone with a blatant disregard for his own social life and circadian rhythms to operate it.

HRPF Calculation using Hit Tracker

Hit Tracker in its usual form uses observations of hit outcomes (landing point, time of flight) to derive the hit’s initial parameters (Horizontal Launch Angle or HLA, Vertical Launch Angle or VLA, and Speed off Bat or SOB, with spin assumed to be a function of these factors). But, with a few lines of code added, it becomes “Hit Whacker,” using HLA, VLA, SOB and atmospheric inputs to generate a hit’s outcome. With this capability, we can create a procedure for assessing how easy or hard it is to hit homers in any park.

To cover the range of possible batted balls that could become homers, I created a “test set” of trajectories, representing 45 different HLA’s (every two degrees from foul line to foul line), 41 different VLA’s (15 to 55 degrees) and 26 different SOB’s (95 to 120 mph). That’s 47,970 different fly ball paths! I ran this complete test set in each park, in that park’s actual altitude, in the park’s average game time temperature from 2002-06, with no wind (I’ll describe how to account for different winds shortly). The trajectories were evaluated as “home run” or “not home run”, and the results were compiled.

To most accurately characterize the overall difficulty of homering in a given park, it was necessary to weight the test set of trajectories so their directions (VLA and HLA) matched the distribution of balls hit by MLB hitters. For VLA I used a combination of Hit Tracker data and the league-average GB/LD/FB percentages available on the Hardball Times site. At this point the park factors by field (RF, RCF, CF, LCF, LF) were calculated. Then I used the distribution of batted balls to those different fields, from analysis of Hit Tracker data, to generate weighting factors for horizontal direction, and used these factors to arrive at an overall HRPF for each park on a “100 is average” scale.

Finally, I multiplied all the numbers by 0.97 to account for the difference between no wind and average wind. I derived this 0.97 factor by re-running the entire simulation with average wind in each park—essentially it means that the cumulative effect of wind in MLB parks over calm weather is to increase homers by 3%.

The highest overall rating belongs to Coors Field, driven by its higher altitude. The simulation used the same baseball characteristics for all 30 parks, but this might not be appropriate here, due to Colorado’s infamous humidor. We don’t have recent data on un-humidified balls at Coors Field, but if the humidor does as advertised and restores the baseballs to a “normal” state (rather than an extra-soggy state), then the calculated PF here should be accurate.

The easiest individual field for homers is RCF at Coors, with a rating of 145, while the most difficult field is center field at Comerica Park, with a PF of 35. St. Louis Cardinal Chris Duncan’s May 20, 2007 homer over the center field wall in Detroit was the first to clear that fence since at least 2005, before Hit Tracker was in operation; home runs to that field are even less common than the 35 rating suggests, probably due to Tigers hitters adjusting their swing to avoid hitting fly balls to center field. Duncan missed that memo, obviously…

Chase Field “plays” easier for home runs than its calculated HRPF suggests; Hit Tracker data may explain why. So far in 2007, the average standard distance (which is independent of atmospheric effects) for home runs hit at Chase Field is 394.3 feet, while the league average standard distance is 391.4 feet. For Arizona hitters, the effect is even more striking, with D-Backs hitters averaging 395.7 feet at home and 383.5 feet on the road, an amazing 12-foot difference.

Perhaps such an effect might come about due to random chance on a small sample size, but here’s one statistic that is quite convincing: Arizona’s top ten longest homers by standard distance so far this year were ALL hit at Chase Field. The odds of that happening by chance are more than 1,000-to-1 against. So, we can either believe that the Diamondbacks just happen to slug like Babe Ruth at home for no particular reason (e.g. Tony Clark and Eric Byrnes, who own the two longest MLB homers so far in 2007, both hit on May 8th at Chase), or we can believe that the ball is livelier in Phoenix, probably due to the hot, dry conditions in the Valley of the Sun…

How to Account for Temperature and Wind

The HRPF’s have been calculated with no wind at average park temperatures, but changes in the atmospherics (wind and temperature) obviously have to be considered when trying to accurately describe how easy or hard it is to hit one out of a park. Unfortunately, there are nearly infinite combinations of wind and temperature that can exist during a game, and it is therefore prohibitively time consuming to simulate them all, when each run takes about 30 minutes. Instead, I ran a series of “sensitivity” runs to gauge the impact of temperature and wind conditions on the calculated park factors, and was able to derive average adjustment factors for the HRPF’s.

For temperature, add or subtract 0.26 for each degree of temperature difference between the game-time temperature and that park’s average game-time temperature (see the data table for the average park temperatures.) For wind blowing straight in or out, add or subtract 1.9 for each mph of wind. For wind blowing diagonally 22.5 degrees in or out, add or subtract 1.7 for each mph of wind. For wind blowing diagonally 45 degrees in or out, add or subtract 1.4 for each mph of wind. For wind blowing 67.5 degrees in or out, add or subtract 0.5 for each mph of wind.

Example: Wrigley Field has average game-time temperature of 70.0 degrees, and HRPF’s of 86/122/61/98/86 for LF/LCF/CF/RCF/RF, respectively, with a total HRPF (no wind) of 89.

On a 40-degree day with the wind blowing in from CF at 10 mph, the HRPF’s will be adjusted as follows:

For temperature, adjust each PF by 30*-0.26 = -7.8 or -8.

For wind, adjust the CF PF by 10*-1.9 = -19. Adjust the RCF and LCF PF’s by 10*-1.7 = -17. Adjust the LF and RF PF’s by 10*-1.4 = -14.

The adjusted PF’s will be 64/97/34/73/64, for an overall HRPF of 67.

On a 90 degree day with the wind blowing out to RF at 20 mph, the HRPF’s will be adjusted as follows:

For temperature, adjust each PF by 20*0.26 = 5.2 or +5.

For wind, adjust the RF PF by 10*1.9 = +38. Adjust the RCF PF by 10*1.7 = +34. Adjust the CF PF’s by 10*1.4 = +28. Adjust the LCF PF by 10*0.5 = +10. The LF PF is essentially unchanged by the wind blowing out to RF.

The adjusted PF’s will be 91/137/94/137/129, for an overall HRPF of 115.

For simplicity’s sake, I have created one temperature factor and one set of wind factors to be used in all stadiums. The temperature adjustment factor is dependable, but the wind adjustment factors should be considered approximate, because of the different shielding effects of the grandstands in the different parks.

In parks where the outfield is fairly low and open (e.g. Wrigley Field, Fenway Park, Kauffmann Stadium), the wind factor will be most accurate, while for well-shielded stadiums (e.g. Rogers Centre, McAfee Coliseum, Rangers Ballpark in Arlington, any of the retractable-roof parks when open), the standard, one-size fits all wind factor may not describe the wind impact accurately. In particular, Rangers Ballpark in Arlington has become known for swirling winds that aid batted balls at low levels even as the high-level winds howl in from the outfield most of the summer.

HRPF’s generated by Hit Tracker should remain valid for the current MLB parks as long as the fences don’t move, and in the future, this park evaluation method will be used on new stadiums in Washington D.C., the Bronx, Queens and elsewhere.