Baseball ProGUESTus

How Far Did That Fly Ball Travel?

Most of our writers didn't enter the world sporting an @baseballprospectus.com address; with a few exceptions, they started out somewhere else. In an effort to up your reading pleasure while tipping our caps to some of the most illuminating work being done elsewhere on the internet, we'll be yielding the stage once a week to the best and brightest baseball writers, researchers and thinkers from outside of the BP umbrella. If you'd like to nominate a guest contributor (including yourself), please drop us a line.

Alan Nathan is Professor Emeritus of Physics at the University of Illinois at Urbana-Champaign. After a long career doing things like measuring the electric and magnetic polarizabilities of the proton and studying the quark structure of nucleons, he now devotes his time and effort to the physics of baseball. He maintains an oft-visited website devoted to that subject:go.illinois.edu/physicsofbaseball.

In my line of work, I get asked questions from all sorts of different people, such as reporters, kids doing science fair projects (and their mothers), and diehard baseball fans. Some recent examples: How much farther will a fly ball travel in Denver? What’s the deal with those BBCOR bats? Is the baseball juiced? Should the Red Sox trade Jacoby Ellsbury? Okay, I confess that no one has (yet) asked my opinion on that last question. However, one question that I get asked quite often is the following: Can we predict the landing point of a fly ball just after it leaves the bat? That’s what I want to talk about in this article.

Let me try to sharpen up that question a bit. Suppose we have data telling us the velocity of a fly ball just after leaving the bat, so that we know the batted ball speed, vertical launch angle, and horizontal spray angle. How well does that information determine the landing point? Such a question might arise, for example, in a batting cage situation. You measure the batted ball velocity—perhaps with a portable HITf/x or TrackMan system—and immediately tell the batter that he just hit a 385-ft home run, without the ball ever leaving the batting cage. But is this really possible? In a simpler, gravity-only world—I like to refer to it as the “Physics 101” world—it most definitely is possible. Under such conditions, once the initial velocity is known, the ball follows a trajectory that is completely predictable, landing in a location that can be calculated precisely with no more knowledge than one learns in the second week of Physics 101. End of discussion, right?

Wrong! Our real world is much more complicated because the ball experiences the additional forces of drag and lift as it interacts with the surrounding air. The drag, or more simply air resistance, slows the ball down. The lift, or Magnus force, acts on a spinning baseball to deflect it in a direction that depends on the spin axis. In particular, a fly ball hit with backspin will have a Magnus force that is primarily in the upward direction, opposing gravity. Still, in an ideal world, we could predict the landing point from the initial velocity, although to do so would involve a more complicated calculation and would require that we have complete knowledge of the drag and lift forces, the latter requiring that we know the spin rate and axis. Unfortunately, we don’t know these things perfectly well, so that will lead to some uncertainty in our predicted landing location. The big question now is how much uncertainty.

In principle, the question I have posed is an easy one to answer, given the availability of new technologies in MLB ballparks for tracking the baseball. One simply looks at the distribution of landing points for balls hit within a narrow range of batted ball speeds and angles. Such an analysis would be particularly well suited for venues in which TrackMan is installed, since the initial velocity vector, landing location, and hang time are readily provided by that system. The analysis is also well suited for venues with FIELDf/x installed, so that HITf/x provides the initial velocity and FIELDf/x provides the landing point and hang time. Unfortunately the data from neither system is publicly available, so I must resort to another technique, which I will now describe.

My technique uses data from two different sources. First I have the precise landing location and hang time for every home run hit in the majors during the 2009 and 2010 seasons, courtesy of Greg Rybarczyk’s ESPN Home Run Tracker. Second, for 8803 of these home runs, I have the location of the ball-bat impact and the initial velocity vector, courtesy of Sportvision’s HITf/x. Armed with the initial position and velocity, an aerodynamics model is fine-tuned to reproduce the landing point and hang time, with the result being that the entire trajectory can be reconstructed to a high level of accuracy. The trajectory can then be extrapolated to find the total distance the ball would have traveled had it eventually reached field level. It is the extrapolated distance—call it R—that I now want to investigate to see how well it is determined by the initial velocity vector.

But first a digression to bloviate a bit about this technique. There are three parameters in my model that are adjusted to fit the data: an average drag coefficient Cd plus two components of spin, wb and ws. The drag coefficient governs how rapidly the ball is slowing down as it moves through the air. The sidespin wsdetermines how much the ball deflects horizontally from its initial spray angle (hook or slice) during its flight. The backspin wbacts along with gravity to determine the hang time. Given three pieces of information—the x,y,z coordinates of the ball at the hang time—plus the initial velocity vector, it is always possible to find a solution, i.e., the corresponding Cd, wb, and ws. Once those parameters are known, the full trajectory can be calculated to a very high level of accuracy, including the extrapolation to field level to find the total distance R.

The validity of this technique has been verified with dedicated experiments I conducted a few years ago using a portable TrackMan device. Since TrackMan measures the full trajectory, one can compare it directly with the one determined by the technique I have described. It works remarkably well. Simulations confirm that finding and show that, once the initial velocity, the landing point, and hang time are specified, there is very little wiggle room left over for determining the rest of the trajectory. The technique is very powerful and one that I have utilized many times for baseball analysis.

Okay, end of my digression and back to the analysis. The first figure is a histogram of distances R for a small subset of home runs with an initial speed between 99 and 101 mph and an initial launch angle between 240 and 280. I will further restrict the analysis to homers for which the air density, which depends on temperature and elevation and affects the air resistance, is confined to a ±2% range. The blue-hatched plot includes all 281 home runs satisfying my conditions. There is considerable breadth to the R distribution, extending from below 380 ft to above 430 ft, with a mean of 403.9 ft and a standard deviation of 16.1 ft. There is nothing very special about the speed or launch angle range I chose for the analysis, except that they were the most common values in the data set. Suffice it to say that my conclusion does not depend on those values and persists throughout the full data set. We are forced by the data to the following conclusion:

The initial velocity poorly determines the landing point.

That is the primary conclusion of my investigation. The remainder of this article will focus on three possible reasons why R is so poorly determined by the initial velocity:

variation in wind, drag coefficient, and backspin.

First let’s talk about wind. In my analysis, wind was not included in the fitting procedure. Instead I looked at events in covered stadiums, including retractable roof stadiums with the roof closed, for which wind is not an issue. Restricting the analysis to these stadiums reduces the sample size to 49 events, for which the distribution is given by the red histogram. Statistically speaking, this distribution looks identical to the previous one, with a mean of 405.5 ft and a standard deviation is identical, 16.1 ft. I conclude that wind is not the primary contributing factor to the breadth of R.

Now let’s take a look at the variation in the other two factors by referring to the second figure, a scatter plot of Cd vs. wb for the 281 homers satisfying the initial conditions. The points are color coded by total distance R, with red, black, and blue corresponding to R<395, 395≤R≤415, and R>415 ft, respectively.

There is a wealth of information contained in this plot, and I will now summarize the essential features and what they teach us.

For a given value of wb, R decreases as Cd increases. This certainly makes good sense physically, since drag is expected to reduce the distance. Furthermore, the spread of Cd values for a given wb suggests a possible variation in drag coefficient from one baseball to another. This is certainly new information and something we would like to know more about. I’ll save that for another day.

For a given value of Cd, R increases as wb increases. Again this makes sense, since larger backspin keeps the ball in the air longer so that it travels farther.

Contours of constant R are roughly parallel diagonal lines on this plot, extending from lower left to upper right, with the higher R lines lying lower. As one climbs along one of these lines of constant R, both Cd and wb increase. However, the tendency of the increased Cd to reduce R is counterbalanced by the tendency of wb to increase R, thereby keeping R constant.

There is a moderately strong positive correlation between Cd and wb, suggesting that the drag on a baseball increases with increasing spin, all other things equal. While such an effect is well known for a golf ball, it has only been speculated for a baseball. While I would not characterize the evidence here as being a smoking gun, it certainly is suggestive. Once again, this is something we’d like to understand better.

So, let’s summarize what we have learned. We have found that the initial velocity vector is not sufficient to determine the total distance traveled, and that is the primary conclusion of this study. We have also found that wind is not the major cause. We have further shown that variation in both the drag coefficient and the backspin accounts for the spread of distance values, although some of the effects—particularly the correlation between drag and spin—are quite subtle. There are suggestions of a spin-dependence as well as a ball-to-ball variation to the drag on a baseball.

I don’t want anyone to get the impression that this is a completed piece of research. It is not. There are some annoying puzzles in the data that I have not completely sorted out yet. For sure, the conclusion about the initial velocity not determining the landing point is very firm and not likely to change with additional data and analysis. However, my conclusions about the reasons are still tentative, and I suspect there is a lot more to be gleaned from data such as these. In particular, it would be very nice to use the techniques I described here to analyze a larger data set that is not restricted only to home runs. I look forward to continuing this research.

Thinking a bit more about the suggestion of ball-to-ball variance in the drag of a baseball, some thoughts / questions:

1) Presumably that variance would result from variations in the height of the seams? Or from differences in the surface of the ball itself? (E.g. rougher ball, more drag)

2) A pitcher would typically gravitate, if given the choice, towards a ball with heightened seams ... but any benefit the pitcher derives from that (e.g. increased break) could be, presumably, counteracted by the additional distance that same ball, when well hit, might travel?

Actually I think it has more to do with variation in the surface roughness rather than the seams. There is some additional evidence for ball to ball variation in drag from pitchf/x and TrackMan data on pitched balls. I discussed this briefly at the 2011 pfx summit. The research is still in progress but it might be a good topic for a future article.

Excellent point. Certainly it is true that the ball really gets mashed when colliding with the ball. And although the ball recovers to its spherical shape shortly after leaving the bat, the resulting damage to the surface of the ball may have an effect on its aerodynamics. But I suspect that is not what is actually happening, given that we see a similar effect with pitched baseballs. My PITCHf/x summit talk from 2011, when I talked about variation in Cd for pitched balls, can be found here: http://webusers.npl.illinois.edu/~a-nathan/pob//ppt/NathanBattedBallAnalysis.ppt, with the relevant stuff on slides 31-35.

I'd love to see that addressed, too. I think it's outside the realm of this study but maybe Alan knows the answer anyway. If delivered at the same velocity, the only difference should stem from whether or not the pitch type impacts the spin rate off the bat, holding the launch angle constant. I suspect that there's enough friction between the bat and ball at impact (on solid contact, anyway) to arrest the incoming spin altogether such that it has no impact on the outgoing spin rate, and thus no effect on distance for a given velocity and launch angle. But that's just a hunch and I've never seen a conclusive answer. Would love to hear what Alan has to say.

Actually, this is a topic that I have investigated experimentally. An account (albeit a technical one from an academic journal) can be found in a paper I wrote last year:
http://webusers.npl.illinois.edu/~a-nathan/pob/ProcediaEngineering34Spin.pdf. I also talked about this topic at the 2010 PITCHf/x summit. You can see a video of the talk here: http://baseball.sportvision.com/summit/archive/2010 (click on my name in the right sidebar).

But here's the bottom line, surprising as it may be:

The spin of a batted ball depends very little on the spin of the pitched ball.

Someday when I get more time, I'll try to put together a less technical article and one more in the style of BPro.

I don't suppose there is a way to check for the freshness of the ball. Balls are usually pretty fresh. After all, a ball is going to have a different outer surface after a grounder than when it was first introduced into play.

What matters for the drag is primarily the density of the air. Viscosity does not directly play a role. The way I like to think about it is that the ball collides with many air molecules, losing a little bit of energy with each collision as it transfers energy to the molecule. When you work out the details, you find that it is the mass density of the air that matters. The air density depends on temperature, elevation, pressure, and relative humidity. The dependence on relative humidity is very small. You can investigate these things for yourself with my "trajectory calculator", http://webusers.npl.illinois.edu/~a-nathan/pob/trajectory-calculator.html. In my analysis, I restricted the home runs I looked at to those with an air density only within +/-2% of a nominal value.

Note: As you said, a higher relative humidity reduces the drag, since a heavier air (nitrogen) molecule is displaced by a ligher water molecule (therefore, lower density).

Side spin should not affect the distance in any serious way. It does affect the hook/slice, which affects the spray angle of the landing point. But it has only a small effect on the distance of the landing point from home plate.

Ah, I misunderstood your original question. In the article, I speculated that the drag increases with backspin. If that really is true, then the drag almost surely increases with the total spin, which is Pythag sum of sidespin and backspin. I did not fully investigate this dependence (but will). What I can say is that, generally speaking, the backspin is much larger than the sidespin, so the total spin is probably only a little larger than the backspin.

Enjoyed the article. A question, though: Do we know how accurate the data is? For example, when a home run is hit, the distance is always announced - but I wonder what the +/- is on the measurement. Could error in the data be what's making it hard to predict total distance traveled based upon initial velocity of the ball off the bat?

Good question. Certainly, the validity of the analysis depends on the accuracy of the data, including the landing point, hang time, and initial velocity vector. I will defer to Greg R. on the landing point and hang time data, since I got that information from him. I suspect the HITf/x data is good to about 1-2 mph, corresponding to a variation of distance by 5-10 ft, which is much smaller that what is observed.