BP Unfiltered

More Thoughts on Robo-Umps

Perfection is not an attainable goal with any ball-strike detection system. We're going to fall short whether we use human umpires or technology. That simple fact sometimes gets lost when discussing balls and strikes. It's worthwhile to remind ourselves of that up front.

I appreciate some of the questions and comments in the original post about PitchTrax errors in Game 6 of the American League Championship Series. I would like to elaborate a little on what those errors mean for the technology of calling balls and strikes and possibilities for alternative technologies. The question is how close we can get to a perfect ball-strike detection system, at reasonable cost, without too much disruption to the game.

First of all, we should not be too quick to throw umpires out the window. Of course, human umpires are going to be subject to biases and mistakes. I'd like to see a thorough program to understand the nature and causes of those biases and mistakes. This is one of my major areas of ongoing research. If it's possible, and I believe it well could be, I’d like to see a training program to help umpires to improve. Not simply grading them, but actually providing them feedback in a form that is most useful for improvement. Bruce Weber discussed this in As They See ‘Em.

One of the reasons I oppose dismissing human umpires as the best solution a priori is that they don't tend to be subject to the major mistakes that technology is. When technology goes wrong, it tends to go wrong spectacularly, as we see in the instance from this game. Human umpires may be a little bit wrong a lot more often than PitchTrax, but they're rarely off by 5-6 inches for a whole game.

As to other possible technological solutions, there may well be some. I can't think of any that would address the (mostly minor) problems we have with PITCHf/x while not introducing more major problems of their own.

With sensors at home plate, the question is what kind of sensor? One that might be feasible is an optical sensor, i.e., camera. Putting it at the plate would significantly reduce the challenges that exist with PITCHf/x cameras of knowing the location of the camera relative to home plate or some other fixed field reference. However, in order to improve on or aid PITCHf/x accuracy, the alignment of the camera would need to be known and accurate within 1-2 degrees. A camera placed where players are going to step on it and slide into it is bound to encounter trouble maintaining such tight alignment. Moreover, there's no clear location reference for calibration in the field of view of a camera in such a location.

Tracking radars would not fit under the plate, and in fact, with radars as well as cameras, the information provided by a 3-D view and the time information from tracking the whole trajectory are both very valuable. That requires a radar or a camera located farther away from home plate. Having the information about the whole pitch trajectory reduces the error in the first place, not least by helping us identify when it is a pitched ball crossing the plate and not some other object or a baseball going in a different direction. It also helps identify errors and calibration problems when they occur, and even farther down the data consumption path, it aids in remove or correcting spurious data.

There’s a theoretical possibility for TrackMan radar or similar systems to complement the pitch tracking data we currently get from PITCHf/x, though the case for the additional cost is difficult based on the marginal improvement in plate location data alone.

Tango and others have mentioned the FoxTrax hockey puck. It was developed by Sportvision, the same company that does PITCHf/x. They put infrared emitting diodes in the puck and tracked it with infrared sensors placed around the hockey arena. Emitters in the ball (or on the players for that matter) probably don't gain us any resolution or reliability over what we currently have. That’s my understanding based on conversations with the engineers at Sportvision. Even if emitters did provide additional resolution, placing them in the baseball would be a challenge.

We currently do quite well tracking the ball from the mound to the plate with cameras and tracking the batted balls with radar. Both of those technologies can, and hopefully will, be refined. Whether they will ever be good enough to replace umpires in real time, I don't know.

Mike Fast is an author of Baseball Prospectus. Click here to see Mike's other articles.
You can contact Mike by clicking here

I'm not sure that we have the technology, but theoretically could a strike zone consisting of a laser grid be possible? A ball would be a strike if it intersected the grid at some point during the delivery for example.

Are you thinking about something akin to the garage door sensors, where if the ball broke the beams, it would register a strike?

I can see two big obstacles/challenges with that approach. One would be where to put the emitters and sensors. The other would be how to prevent false strikes from registering due to things other than a pitched ball passing through the detection grid.

How good a job have we done at using the data we have to evaluate umpires? I've seen some metrics like overall percentage of correct calls, but that doesn't take into account that it's the close calls that are difficult. It's sufficient to compare umpires to each other, but not to establish an absolute level of performance.

The reason I ask is that I'd wonder whether something more like what they have in tennis would work, where the machines indicate the result for close calls and people continue to handle the routine calls and can overrule the machine if obviously wrong. But to know if that makes sense, we'd have to have an idea of the error rates for the machines and the umpires for close calls.

We've not yet done anywhere close to all we can do with the data we have for evaluating umpires.

Dan Brooks has great data available with his strike zone map tool at BrooksBaseball.net. Unfortunately, the way I see that data used 99% of the time, though, is simply to count up the ball and strike calls inside and outside of the drawn box.

I suppose that's a mostly legitimate way to evaluate umpires in one sense, but it strikes me as being very divorced from the reality of how the game is played.

Has there been any discussion of the height of the strike zone? Specifically, how well does pitch/fx (or any technological solution) deal with the variance in batter heights and stances? Seems whenever I watch the FoxTrax box it's always the same height, regardless of who is batting.

While I am pretty confident that we can use a device to measure weather a pitch crossed the plate or not, I'm not sure we are as good at determining whether it was high or low. How is that calibrated from batter to batter, and how much judgment/error is involved in the process?

The quick answer is that for PITCHf/x, the operator watches the center field TV camera view and sets two lines as the batter settles into his stance. The first line is at the hollow of the back knee, and this is used for the bottom of the zone. The second line is set at the belt, and four inches are added to this to get the top of the zone.

How much judgment/error is involved in the process? A lot. The top/bottom limits produced by this method are not very reliable. There's a better answer to this question that takes more explanation. I'll try to get to it some time.

The other thing to notice is that the top line is not being set according the rulebook. It's set closer to where the umpires actually call the top of the zone, around 3.4 feet for a typical batter. The rulebook top line is more like 3.7-3.8 feet for a typical batter.

I'm guessing that most umps already know what areas they struggle with -- or at least vary from their peers. The hard part, I have to believe, is translating that knowledge in to what they're seeing on the field.

It would seem we could pretty easily provide umpires real-time feedback via a simple LED indicator built in to the device they currently use to track outs and pitch counts.

It would be used solely for their own edification, helping them to correct any systematic biases. As with all professionals, they will resist feedback systems which are (seemingly) punitive rather than instructive.

I think a lot of good could be done by simply taking steps to make pitchf/x a tool which helps them do their jobs better rather than a way to show fans when they screwed up.

I wonder, though, to what extent the umpires know what areas they struggle with and what causes the struggles. I imagine they know a lot, and I'd love to pick their brains on the subject. Nonetheless, having specific, quantitative records of one's performance can often be more helpful than just a general idea of where one is good or bad. Moreover, we could evaluate across the population of umpires whether certain umpiring techniques are good or bad.

Analyzing and making that sort of information available to the umpires, combined with something like you suggest with an LED indicator for feedback could be a very powerful tool for improvement. I forget the name of the gentleman who advocated for this approach with the Questec data, but his story is covered in Weber's book. Unfortunately, his approach didn't win out at that time.

I agree that a LED or other feedback to the umpires would be advisable. I imagine a system where the umpires have built into glasses or their mask or something some small lights that turn red for definite strikes, green for definite balls, and yellow for too close to call. The umpires get to see the lights just after the pitch but don't necessarily have to use the decision of the system (and are available to make calls also if there is an issue like the system misses a pitch entirely). MLB can evaluate umpires for when they deviate from the system and determine if it was a good call (misconfiguration system 5-6 inches off) or a bad call and follow up with the umpires.

I imagine you have an off field umpire manning the machine and doing things like setting the strike zone height and making sure the system is configured correctly and working. I also imagine over time we could have better technology to make the cameras work better too. Something like a stand of 5 posts that projects the plate up five feet with markers at each foot mark that could be used to configure the cameras. If you know the where in each camera the various markers are supposed to be the umpire and/or grounds crew could recenter the camera before the start of the game and periodically in between innings if needed (sort of like sweeping the plate, except to keep the cameras working).

"Perfection is not an attainable goal with any ball-strike detection system. We're going to fall short whether we use human umpires or technology."

To see how counter-productive this statement is, consider the following: "Perfection is not an attainable goal with any forecasting system. We're going to fall short whether we use statistical methods or astrology."

Perfectly true statements, but they add (at best) nothing to a discussion of the relative methods of statistical forecasting and astrology.

If there is an argument in favor of human umpires, it does not begin this way.

"The question is how close we can get to a perfect ball-strike detection system, at reasonable cost, without too much disruption to the game."

No. The question is how much better than human umpires can we get, at reasonable cost, without too much disruption to the game. We measure success by how much better off we are, not by how close to perfection we can get.

@Fantasyking (and reply): The question of strike zone adjustment to batter height and stance is usually raised as a barrier to automated ball and strike calls, but I never hear anyone mentioning the fact that there's very little evidence about how well (or poorly) human umpires do this. The fuzziness in the top boundary of the strike zone is well known, and I see no evidence (with the possible exception of Rickey Henderson) that umpires ever take into account a hitter's normal batting stance in their calls.

Finally, a question: I'm clearly not understanding why calibration is considered such a problem. Any three cameras can triangulate a position. Home plate doesn't move. How hard is it to sight on reference objects placed at the corners of the plate, at known heights, just prior to game time? That would correct for any changes in camera orientation since the last calibration. If the positional assessment is accurate to an inch or two, but the orientation is a concern, wouldn't that fix the problem? What am I missing here?

I frankly don't see any difference between "how much better than human umpires" and "how close to perfection." The answer should be exactly the same either way.

I'm not arguing in favor of human umpires. I'm arguing that everything needs to be compared against the same baseline, which is how they perform in practice. People seem to want to castigate human umps because they're not 100% perfect. The minute a technological solution, e.g., PITCHf/x, is shown to be less than 100% perfect, people are looking for replacements. I'm trying to remind everyone that the question is which reasonably applicable system is best over time, not which one never makes a mistake.

I've done quite a bit of investigation into how well both the umpires and the PITCHf/x operators do at the top and bottom lines. I have more investigation to do before I'm ready to present those results. However, my sense is that the umpires generally do better than the PITCHf/x operators. Umpires definitely call zones that take into account the vertical size of hitter's zone based on his stance. I understand skepticism of my claims on this until I present my evidence.

I don't believe calibration of the PITCHf/x system is an intractable problem. Sportvision has a procedure for doing it that involves something similar to what you suggest--putting markers on the field. For practical reasons, that's not something they currently do before every game. I imagine if their system was being used to replace the umpire, the money and personnel and unfettered access to the field would be available to make that happen on a daily basis. Right now that happens at best once per homestand and usually not that often.

I certainly agree that everything needs to be compared against how umpires really perform in practice. We're on the same page there. Can't wait to see your research.

I do still see a difference between "how much better than X is Y for our purposes?" and "how much closer to perfection is X than Y?". The ranking of X and Y is the same for both questions, but the importance of the absolute difference (or the perception of it) is not. People tend to rate the importance of a difference as a percent of the scale. If X falls 82% shy of perfection and Y falls 79% shy, that looks like a pretty small difference -- until you realize that you're talking about slugging average, and perfection is a 4.000 ...