Evaluating defense using HITf/x

This is a look at what’s possible, not a serious attempt at a defensive evaluation metric. We’ll get there someday (and hopefully by someday I mean “some day this month”), just not today.

Our own Harry Pavlidis has the best look I’ve seen so far at the sheer depth of data available from the preview HITf/x data we’ve been given courtesy of Sportsvision. It’s the most data that I’ve seen made available to the public about what happens to a batted ball after it leaves the bat. But how do we get from there to an evaluation of defense?

What we knew about batted balls before HITf/x

The answer is, not very much. Typically data providers put a batted ball into one of four buckets:

Ground ball

Line drive

Fly ball

Pop-up

This is simply not very descriptive for our purposes, as I’ve stated before. And if that’s not bad enough, different data providers often don’t agree on the difference between a fly ball and a line drive. For example, is a Texas Leaguer over the infield into shallow right a fly ball or a line drive? What if the outfielder is able to race up and snag it? As best we can tell, the former is more likely to be called a line drive and the latter a fly ball, even if they follow the exact same flight path.

What we want to know about a batted ball

That’s simple. To evaluate the play of an outfielder, we would preferably know the following about batted balls hit to the outfield:

What direction the ball is hit.

How far the ball is hit.

How fast it gets there.

Can we get there from what we have available to us via HITf/x

Right now, the answer is: sorta. I took a look earlier in the week, and what have right now is the angle (horizontal and vertical) as well as the speed off the bat of batted balls. What we don’t have is spin. How important is the spin? Here’s an example of the path of a batted ball, launched at 35 degrees with an initial velocity of 95 mph:

The blue line is the path the ball would take if there was no spin; the red line is the path the ball would take if there was 2000 rpm of backspin. With spin, the ball travels almost 50 additional feet, and stays in the air about a second and a half longer. That’s a significant difference.

(This of course only takes into account the spin of the ball along the flightpath, ignoring any spin to the sides. Sidespin is of course very important to the path of a batted ball—picture a long, deep drive that you just know would be a home run, if it wasn’t slicing into the stands and ending up as a much less exciting foul ball.)

Can we estimate spin? There has been some helpful progress made in this regard, almost entirely by people who aren’t me. (I hope to learn more in this regard at this weekend’s PITCHf/xSummit.) Until then, we’re left with an imperfect picture of the flight of a batted ball.

What we can tell from an imperfect picture

First, we’ll look at the effect of flight time on DER. (This differs from the chart earlier in the week in that 2000 rpm of backspin were included in the estimates.)

Time

DER

0.0

0.839

0.5

0.674

1.0

0.527

1.5

0.545

2.0

0.466

2.5

0.196

3.0

0.109

3.5

0.412

4.0

0.640

4.5

0.718

5.0

0.881

5.5+

0.964

And from another point of view, we’ll look at distance in feet travelled:

Distance

DER

0

0.851

50

0.761

100

0.704

150

0.640

200

0.467

250

0.609

300

0.695

350

0.601

400

0.353

Obviously there is some substantial overlap between the two; the correlation between time and distance is a very robust.

What we still need

So how do we get from here to a defensive metric? The first thing we need is the direction the ball is hit laterally, which HITf/x helpfully provides. The next thing we need is an idea of who was on the field when each batted ball is struck. This can presumably be parsed from the Gameday XML data that is freely available.

Probably the biggest thing we are missing is just more HITf/x data. That’s necessary to establish a baseline to compare a fielder to, as the more data we have, the smaller we can slice the data we have and the more precision we get.

And of course, as mentioned above, our estimates of the flight path of a batted ball can improve. But we now are a lot closer to having that sort of information than we ever were before.

Simply having better data won’t solve this problem by itself, but it will give us a powerful new set of tools in at least finding the right questions to ask. I’m very, very excited about this, and I hope you are, too.

I am still learning an awful lot about this myself, and I plan to learn a lot more this weekend. Hopefully I’ll be recovered enough by this time next week to pass on what I’ve learned. We may not have a defensive metric that uses HITf/x yet, but we’re very close, and I’m confident we will soon.

As an aside that only a few of you will care about – I do believe I’ve figured out how to parse the required data about who was playing what position from the Gameday XML data provided. I am not, however, certain of this. As of this writing, the final query to put the data all together has been running for a solid hour and likely will not be done for a while longer. If I am correct and the data checks out I will be more than happy to share it with interested parties.

Comments

Colin – I am glad you are going to make it to the Summit this year. I look forward to meeting you. Last year I worked with the Sportvision raw footage for 2 games in an attempt to manually calculate the Hit f/x parameters and also to try and estimate spin and hit ball landing location from those parameters. I was successful in showing that Hit f/x parameters could be calculated from the existing footage, a result that I presented at last years Summit. I was unsuccessful at the latter task, my conclusion being that there was too much variation in the spin characteristics and not enough input information to be able to make an accurate estimate. Dr. Nathan and Sportvision have come to the same conclusion. But everyone is still trying so there may be some further news this weekend.

You are on the right track with your effort to establish the limits that different spin rates might reasonably put on a hit ball. However, last year when I calculated spin rates on hit balls by matching their input parameters and the best guess of there landing location using an average of MLB Gameday, STATS, my observations from video, and Greg Rybarcyzk’s observations from video, I found spin rates that were over 3000 RPM, which extends the possible landing points of your graph considerably. I concluded that until we get full path tracking of hit balls from Sportvision’s proposed future Fielding f/x that we would not be able to pinpoint a landing location from the Hit f/x parameters that is any more accurate than that given by Gameday’s hit locations.

That does not mean that the Hit f/x parameters cannot be used to improve existing fielding metrics. My talk for the Summit will be specifically on that subject and I will publish a follow up Skill Based Fielding Metric to my Skill Based Batting and Pitching Metrics that will include the improvements that I will discuss on Saturday.

Just curious, why can’t you just use a metric of where the ball lands and how long it takes to get there? Wouldn’t that take all thr guesswork out of the whole classification of fly ball, liners, etc?

It seems to me that with the advanced tracking systems we have we can plot every point on a given ballpark, and also calculate pretty accurately how long a ball takes to land on that area, and from that we should be able to determine how often such a ball becomes a single, double, out, etc. From there it can easily be deterimed the average value of a ball hit with such a placing/timing ratio and therefore the expected value to the hitter/pitcher/fielder depending on the actual result.

Why worry about spin? Why not just break it down to the basics and go from there?

Actually, I’ll have quite a bit to say about spin at the summit on Saturday. By combining hitf/x and hittracker data, I can back out the two spin components (backspin and sidespin). I am finding backspin values for home runs looking more or less like a normal distribution, centered at about 2000 rpm and with an rms of about 600 rpm. I was not successful in finding a simple relationship between the backspin and the initial velocity magnitude or direction. The backspin more or less increases with launch angle, but there is a lot of scatter about the general trend. I found some very interesting results for the sidespin. The sidespin distribution is very dependent on the spray angle, as you might expect and there is much less scatter about the general trend than I find for the backspin. More about all this on Saturday.

Peter, for some reason, I remember read an article lately which I thought had said that obsticle had been overcome and some company was actually tracking that information. If I come accross it, I’ll post it here.

jedlovec3 – I can’t speak to the BIS data specifically. When it comes to hit location data, all I’ve ever had direct access to is the Retrosheet and Gameday data. That said, I may have mis/overstated my claims there, even as regards that data.

Re: DaninPhilly—From my perspective, the issue is how well we can predict hang time and landing point from hitf/x data alone. As Colin pointed out, to do that requires some knowledge of the spin of the batted ball. Of course, if we have the full trajectory or even just the landing point and hang time, the issue of spin is a moot one from the point of view of baseball analysis (although still an interesting issue from the point of view of baseball physics).

DaninPhilly – No one is currently mapping where a ball lands. MGL has a project to determine hang time but the information will not be publicly available. Eventually we will have access to both pieces of information, perhaps in the next year or two, and, as you suggest, that will be all we need to know. What people are talking about here is how to do the best we can with what information we have now.

I have suspected that the Rays (and possibly some other teams), have used vectorizing of the flight of the ball, which the game tracking information provides, and not hit charts to realign their outfield defenses. By measuring both the eventually landing site of a hit and time of flight, from multiple years of data at the Trop, they have calculated the position for the outfielders which maximizes the probability of catching any flyball weighted on how much value the hit would have should it drop in. This has allowed them to bring their outfielders in to catch the more frequently occurring, quickly falling single and thus spend less time defending against the less frequently occurring long fly balls. [It also helps that air resistance is quite significant at the sea-level sitting Trop.] While this is fine-and-dandy, the down side is what happened earlier in the year where some very hard line drives where hit over Upton’s head during key moments near the end of the game. Of course the team was crucified in the papers because “that’s not the we do things in baseball.” Kind of like not having a closer or why NFL coaches are too conservative on 4th down.

Based on watching a large number of Rays games, nearly every game there is at least one very long fly ball. It looks like it is significantly over Upton’s head and will fall him, but the time of flight is long enough he tracks it down. It helps to have a fast outfielder.

I’m pleased to see baseball start using physics solutions to an applied physics problem.