Baseball ProGUESTus

Scouting with PITCHf/x

Believe it or not, most of our writers didn't enter the world sporting an @baseballprospectus.com address; with a few exceptions, they started out somewhere else. In an effort to up your reading pleasure while tipping our caps to some of the most illuminating work being done elsewhere on the internet, we'll be yielding the stage once a week to the best and brightest baseball writers, researchers and thinkers from outside of the BP umbrella. If you'd like to nominate a guest contributor (including yourself), please drop us a line.

Adam is the founder of Project Prospect, a scouting and statistical analysis website. He has been writing about baseball since 2006, when he began covering college baseball and conducting quantitative analysis of minor-league prospects.

Data is one of baseball’s purest byproducts. It’s interlaced with the past, present and future. It provides a platform for discussion. And just as the game has an effect on it, it has an effect on the game.

PITCHf/x data, which is part of a new breed of baseball statistics, can be intimidating and overwhelming. But thanks to amazing efforts by MLB Advanced Media, Sportvison (the creator of PITCHf/x), and a growing pool of analysts, PITCHf/x has made a mark on the game. That said, its roots are still shallow, and relatively few players, coaches and on-air personalities have fully embraced it.

"The only time I hear about that stuff is through the media," Tim Lincecum recently told me. "Reporters came to me early this season and said that I'd been throwing about 17% sliders. I hadn't thrown one slider up to that point."

Madison Bumgarner, Lincecum’s teammate, said his biggest concern with PITCHf/x is analysts aggregating data and masking the situational adjustments a pitcher must make.

As a primer for people seeking applications for PITCHf/x, I’ve detailed a few of my findings about how PITCHf/x can be utilized to improve your scouting eye. But first, let’s take a closer look at Lincecum’s concern about pitch categorization.

How PITCHf/x categorizes pitches
MLB.com’s Gameday application delivers near-live data that includes pitch type, speed, and movement information. Pitch types are defined by mathematical models that are built around velocity, spin, and movement. It’s a constantly evolving, sophisticated system.

“When we first started doing real-time classifications, we had one generic neural net [or mathematical model] for all pitchers, but we learned pretty quickly that wouldn’t work because one pitcher’s fastball can approximate another’s changeup,” Cory Schwartz, VP of Stats for MLB.com, explained in an email. “Ultimately, we built a custom neural net for each pitcher and now have one for over 1,100 pitchers.”

In addition to rookie pitchers and their unique arsenals, MLB.com’s models must also be adjusted for pitchers introducing new pitches and tweaking others, which happens regularly. In Lincecum’s case, he cut his slider out of his arsenal for a while, and MLB.com’s mathematical model still thought it saw some.

“It’s an extremely labor-intensive process, but we recognize the importance of accurate classifications, for fans, clubs and industry partners alike, and have invested literally hundreds of man hours into building the most accurate system possible,” Schwartz wrote. “While some pitchers do throw a very distinct repertoire that can be easily classified, many throw multiple pitches that blend together and are extremely difficult to differentiate from other pitch types.”

Harry Pavlidis, founder of Pitch Info LLC, has devoted considerable time to formulating his own PITCHf/x classifications, which now appear at Brooks Baseball. Thanks to the efforts of Schwartz, Pavlidis, and others, pitch classifications have improved dramatically, and I expect them to continue to improve.

A couple PITCHf/x findings
One of my biggest PITCHf/x projects to date has been creating an algorithm that grades a pitcher’s offerings on the 20-80 scouting scale. The first step of this project was gathering and combing through PITCHf/x data to study variables and compare them with visual data. A handful of scouts have also provided input to my study, particularly with which variables they’d focus on and which they wouldn’t. My objective has been to figure out what makes a pitch a swing-and-miss offering. And I’ve walked away from my initial study with three strong variables.

I’ll get the first one out of the way: velocity. The harder a pitcher throws, the more swing-throughs he tends to get. Glad the data agrees there.

The second is also pretty logical. The variable with the single strongest correlation coefficient—stronger than velocity—for what makes a pitch a swing-and-miss offering is the frequency with which the pitcher throws the pitch. Pitchers with good fastballs tend to throw them a lot. Pitchers with below-average fastballs use them more sparingly. Simple enough. Now let’s get to the juicy finding.

I’ve discussed quick-twitch ability, as it pertains to pitchers, with a number of people in baseball. (Hitters with quick-twitch ability are known for being able to generate elite bat speed). A major-league pitching coach told me he thought pitcher quick twitch could be measured by spin rates, with faster arms imparting elite spin. This would then be anticipated to result in elite “life” that might not show up in raw velocity. I put his hypothesis to the test.

To my surprise, my research showed virtually no relationship between PITCHf/x spin rates and swing-through percentage. I was later cautioned by a front office member about the analytical value of PITCHf/x’s current spin rates.

But I discovered something stimulating and unexpected nonetheless: the correlation coefficient for vertical fastball movement is very similar to the correlation coefficient for fastball velocity.

Could vertical fastball movement be a way to roughly quantify fastball life? Do fastballs that remain on a relatively linear path get more swing-throughs than fastballs that suffer the effects of gravity more strongly on their way to the plate?

I don’t know how one pitcher could throw a fastball that decelerates less than others of the same velocity on its way to the plate, but maybe it is quick twitch. And perhaps our tendency to privilege starting versus finishing fastball velocity (out of the hand instead of over the plate) is a roadblock in the way of a deeper understanding of the data.

PITCHf/x and scouting
I’ve been researching prospects for the last six years, mixing quantitative data and first-hand scouting to further my understanding of the game. PITCHf/x has helped me create a template of major-league pitchers that I can use to evaluate prospects.

Paired with video, PITCHf/x can be a great tool to learn to recognize pitches. When I’m first studying up on a big-league pitcher, I’ll watch him while I have Gameday and its pitch classifications open. It’s a quick and easy way to learn to identify his pitches and compare him to his counterparts. Remembering what the best pitches in baseball and their supporting data look like makes it easier to know what to look for from prospects and amateurs.

I also checked in with a few scouts—who have the luxury of reviewing minor league PITCHf/x data—to see how they use PITCHf/x in their scouting.

“Anything that provides supplemental information to blend with what we see is valuable,” the first scout said. “We're constantly comparing players to what ‘major-league average’ is, and PITCHf/x data for prospects can be no different.”

“It's a useful tool to obtain objective information on a pitcher to supplement the info we have from our scouts,” said the second scout. “It’s one of the first steps to help objectively measure pitchers in the way a scout would subjectively. As it gets put into more and more minor-league parks, the more valuable the information will be.”

The idea of objectifying major-league average is at the core of the 20-80 scouting scale and similar efforts. PITCHf/x gives fans and scouts alike an opportunity to quantify scouting. As the PITCHf/x database continues to grow and more information from it is studied, templates to objectively evaluate pitchers with data—like the algorithm I’m working on—will be written. The data is too good for it not to head in that direction.

Correlation between two variables does not imply causation. In the case of pitch frequency and swing-throughs, visual and anecdotal evidence, in addition to the data, lead me to believe there is a positive relationship between the two.

"To my surprise, my research showed virtually no relationship between PITCHf/x spin rates and swing-through percentage. I was later cautioned by a front office member about the analytical value of PITCHf/x’s current spin rates."

I would imagine there i no relationship because there is so much that effects why a hitter swings and misses. The previous pitches in the at bat. The location of previous and current pitches. How the pitcher has set up the at bat. The count.

My study included thousands of pitchers and compared specific pitch types. Three variables had correlation coefficients of 0.4 or higher when compared to swing-through percentage (frequency, velocity and vertical movement). Spin had a correlation coefficient of almost exactly zero. Yes, there are many variables and situations to account for, but I don't think there's enough noise in the data, after all the attention that has been put into it, to completely dilute any one variable. If the spin rates that Pitchf/x generates correlate with a pitcher's swing-through rates, I'd expect to see some sign of that.

Has anyone tried a similar study, with or without Pitchf/x spin rates, and gotten different results?

Adding to SaberTJ's comment, I agree that it is important to remember what the data can tell us and what it can't. Two more things that immediately come to mind are distance to home plate at release point and how well a pitcher hides the ball.

Those are both excellent examples of data that Pitchf/x does not capture Randy.

I almost included Kyle Davies as an example of a pitcher with above-average fastball velocity and vertical movent who still struggles, with my hypothesis for why being his lack of deception -- I posted a Tweet about that when I first started my study with Brooks Baseball data.

There's absolutely more to a fastball, or any pitch, that what's captured by Pitchf/x. But with the data we have now, we're still turning a corner where merely referencing a MLB pitcher's average or peak fastball velocity and using that information alone to deem a pitch above or below average is not utilizing relevant, public information to make an informed judgment.

And translating our thinking to draft prospects and the minors, we're learning more about how to quantify the attributes of a pitch that make it special, beyond raw velocities. We have video to study those attributes. And we can apply some of our knowledge to subjects, even if we don't have Pitchf/x data on them.

A pitch that has a lot of vertical movement necessarily has a lot of spin. It is the backspin that leads to the vertical movement. So, I am puzzled that you see a correlation with vertical movement but none with spin.

Now, I will admit that vertical movment is more likely to lead to swing-and-miss than horizontal movement, given the dimensions of the bat and ball.

"There's absolutely more to a fastball, or any pitch, that what's captured by Pitchf/x."

Not much more. The actual spin is not measured nor is the actual release point. Everything else you would ever want to know about the pitch IS measured (i.e., the full trajectory). The trick is finding the right way to look at the data from a scouting point of view.

Pitchf/x would be hard pressed to record data on deception. How well a pitcher hides the ball, how quickly his arm accelerates, his arm slot, what the rest of his body does during his delivery and how he finishes all play a role in his deception. It's a major aspect of throwing pitches.

Also, we're not going to be able to quantify how close a pitcher was to his target. Some catchers put up early targets and we can approximate how close the pitch was, but others establish a target after the pitch is thrown or signal for a pitcher to throw in a general area. How well a pitcher can locate his pitches plays a huge role in his performance.

And just to clarify my comment about the spin not being measured: It is inferred from the movement, not directly measured. A model is used to make the association between spin and movement. No "gyro" component of the spin is determined from this technique. Whether or not a gyro component is a useful piece of information has yet to be determined (at least not by anyone who is willing to talk about it).

See my new comment below. I had always thought that there is more value in the "pseudo-spin" inferred from the movement. You have said that your MLB contact says there is more value in the total spin (from Trackman, presumably). Hard to argue with someone who has all the data (and I don't).

Having pitched at a high level I find this very interesting. Unfortunately, there will never be an exact answer, as you are dealing with human beings. There is likely more correlation to strike percentage than spin rate. I like the statistical side of everything, and hate to be old school, but some pitchers just can't make the right pitch in the right location all time. How many times have you seen a guy with great stuff that always gives up a big hit or walk in a big situation?

The other system is probably TrackMan, which measures the total spin (including any gyro component). The TrackMan people have argued that the total spin is well correlated with swing-and-miss rate for curveballs (see http://sportsillustrated.cnn.com/2011/writers/tom_verducci/04/12/fastballs.trackman/index.html). The comment of your front-office contact that there is more value in the total spin than in the "pseudo-spin" measured by PITCHf/x is very intersting. It would be nice to see the analysis that shows that, but unfortunately the TrackMan data are not publicly available.

One more point: I wrote a little article last year for my web site on using the combination of movement and TrackMan spin to obtain the spin axis. It's a bit technical. But for what it's worth, here's the link:
http://webusers.npl.illinois.edu/~a-nathan/pob//SpinAxis.pdf

Interesting stuff about TrackMan. The findings Verducci talks about with spin and Trackman align with what I expected. This makes me question spin rates for Pitchf/x even more. The article touches on spin rates with fastballs, too (toward the end).

One side note: It's a little surprising that Trackman would spend time conducting studies that use ERA and batting average to evaluate hitters and pitchers. Maybe it was a quick thing someone put together for SI, but you'd hope someone with a data analysis product that's being implemented in baseball would hone in on some better variables.

As I said earlier, I regard the spin determined from PITCHf/x to be "pseudospin". It is the spin derived from the movement. I believe the movement that PITCHf/x determines. To use that to determine the pseudospin requires some relationship between movement and spin. I talk about that in the article I referred to earlier. The relationship is known only to about +/-20%. Even if it were known perfectly, this technique only determines the "transverse" spin (i.e., the components of spin perpendicular to the direction of motion). The component along the direction of motion (the "gyro" component) is not determined. I question the value of the latter. I am not saying it is not valuable. I am only saying that its value has not yet been demonstrated to me.