The sample includes every fastball that was swung at during the 2008 season, broken down into velocities of 85 mph and up. This yielded a sample of 38108 events. There were a number of interesting trends, particularly the correlation coefficients of fastball velocity relating to swing&miss percentage, and foul ball percentage.

To me, the foul ball percentage was the most interesting, but I’ll let you decide.

Below is a description of the data, with velocity in the first column, followed by the percentage of all swings at each velocity according to: Swing&Miss Percentage, the Foul Percentage, then Foul Tip Percentage, then In Play Percentage, followed lastly by the number of total events at each velocity.

Part 1: Velocity versus Swing and Miss Percentage

This one was no big surprise. In essence, the higher your velocity is, the more swings and misses you get.

The correlation coefficient for this data set was 0.89. Therefore, there is a strong linear relationship between the velocity thrown and the percentage of swings and misses at the pitch. One thing to notice, however, is that this graph is not completely linear. At the velocity gets above 95, especially with the point at 98 (which, granted, has a small sample size), the graph becomes non-linear, with what looks like an exponential relationship. Therefore, it gets exceedingly harder to make contact with a pitch that is going that additional mile per hour.

As a result, this also causes a lower value of the correlation coefficient, even though the graph has a clear upward trend. Remember, a correlation coefficient is a measure of linear relation. Therefore, when the graph is exponential, the linear relation will be less.

Still, no surprises, as this was expected.

Part II: Foul Ball Rates

This graph was particularly surprising. Maybe because I never have really given it much thought, but I didn’t think I would find such an interesting trend. Here’s the graph:

For the rate of foul balls per swing, there is a clear upward, linear trend until 95 mph, where the graph falls at a pretty steep rate.

Foul balls are one of the last unexplored realms of baseball statistical analysis. Hopefully Hit F/X will be able to give us some useful data, but until then, I’ll be waiting. Also, why do we only measure foul balls when they are caught by a fielder? Otherwise, they wouldn’t even be counted as a ball in play. There’s a lot we can learn about the batter-pitcher interaction by foul balls, but there is very little information out there. It would be a great leap forward if there were some good studies on foul ball data.

But, back to the graph. There isn’t a strong linear trend on the graph because of its parabolic shape. However, the correlation coefficient between 85 and 95 mph is .97949, which is an incredibly strong correlation.

This is a very important point when analyzing the success of soft-tossing pitchers. For pitchers who throw at low velocities, it is important to note that by getting fewer fouls, they are essentially giving away free strikes. These batted balls become balls in play, while for pitchers at higher velocities, the batter now has one additional strike on them, with a great chance for a strikeout. Besides the low swing and miss totals, these low-velocity pitchers have fewer strikes in their favor.

As to why there is a sharp downward trend in the data after 95 mph, I’m not totally sure as to why, though I do have a hypothesis. One, is to think of the graph not in terms of foul or non-foul, but in terms of being late on a pitch. While some of these fouls are going to be pulled, the fact that it is dictated by velocity means that the ones affected by velocity are those that the batter is late on. Therefore, as the velocity goes up, the batter will be late on the pitch to a greater degree. As a result, when the batter gets beyond 95 mph, they are no longer late and fouling off the pitch, but they are late for a swing and miss. This probably has something to do with the exponential increase in swing and misses for high velocities.

This may not change the end result of the at-bat too much, as a strike is still a strike whether its a whiff or a foul; though, higher velocities will have more 2 strike swing and misses (for a K), while lower velocities have longer 2-strike at-bats, due to the at-bat staying alive. The lower velocities will probably have more foul-outs as a result, however.

Part 3: Ball In Play Rate

This last graph shows the rate of balls in play per swing at each velocity. Again, the data is about where we’d expect it, as its harder to put a ball in play at a higher velocity. This speaks volumes as to why low-velocity pitchers struggle in the majors: if the batters can put your stuff in play more often, there are more chances for hits, and fewer for free outs (strikeouts). This one follows common logic: the faster the velocity, the fewer balls in play per swing.

The graph follows a very consistent linear trend from 85 to 97 mph, then drops suddenly at 98+ mph. It is difficult to say why there is a sudden drop, as it could be due to small sample size or due to the fact that they’re just so hard to make contact with at those speeds. It may very well be a mix of both, though the fact that even 97 mph is within the linear trend makes me believe there is a significant sample size component to this issue.

From 85 mph to 97 mph, the correlation coefficient is -0.977, which is another very, very strong correlation. The fact that there is a correlation is not surprising, though the strength of the correlation is quite shocking. I didn’t expect there to be such a substantial correlation.

This study brings about some very interesting trends, as the strength of these correlations are very strong. In particular, the relationship between velocity and foul balls (which is probably causal, velocity causing foul ball percentage for the reason explained) is particularly interesting, especially because the issue is rarely discussed. I think this could give us some insight as to the relationship between velocity and pop-up rate, as pop-ups are generally thought to be the result of being late on a pitch, particularly on inside pitches, where its hard to get the bat head to the ball on time.

In the end, the data seem to back up the reasons why it is so hard to succeed in the MLB without fastball velocity: low-velocity means fewer Ks, more balls in play. I’ll do more research on this, and I hope to post more next time.

Thanks to TheHardballTimes.com for their contributions to this article.

Mike Silver recently completed his requirements for the Sport Management Major at THE University of Massachusetts-Amherst, where he is a brother of Theta Chapter of Theta Chi Fraternity, the best house in the country. He is a huge Red Sox and Bruins fan, and longs for the days of the REAL Boston Garden, Cam Neely, and the ultimate Dirt Dog Trot Nixon. Aside from StatSpeak, you can find Mike at TheHardballTimes.com and FireBrandAL.com. If you have any questions, you can reach him at mjasilver@gmail.com. Have a good night readers, and know that Mike hopes to hear from you soon. If you quote Mike in an article, please let him know. He’d love to hear it.

The sample includes every fastball that was swung at during the 2008 season, broken down into velocities of 85 mph and up. This yielded a sample of 38108 events. There were a number of interesting trends, particularly the correlation coefficients of fastball velocity relating to swing&miss percentage, and foul ball percentage.

To me, the foul ball percentage was the most interesting, but I’ll let you decide.

Below is a description of the data, with velocity in the first column, followed by the percentage of all swings at each velocity according to: Swing&Miss Percentage, the Foul Percentage, then Foul Tip Percentage, then In Play Percentage, followed lastly by the number of total events at each velocity.

Part 1: Velocity versus Swing and Miss Percentage

This one was no big surprise. In essence, the higher your velocity is, the more swings and misses you get.

The correlation coefficient for this data set was 0.89. Therefore, there is a strong linear relationship between the velocity thrown and the percentage of swings and misses at the pitch. One thing to notice, however, is that this graph is not completely linear. At the velocity gets above 95, especially with the point at 98 (which, granted, has a small sample size), the graph becomes non-linear, with what looks like an exponential relationship. Therefore, it gets exceedingly harder to make contact with a pitch that is going that additional mile per hour.

As a result, this also causes a lower value of the correlation coefficient, even though the graph has a clear upward trend. Remember, a correlation coefficient is a measure of linear relation. Therefore, when the graph is exponential, the linear relation will be less.

Still, no surprises, as this was expected.

Part II: Foul Ball Rates

This graph was particularly surprising. Maybe because I never have really given it much thought, but I didn’t think I would find such an interesting trend. Here’s the graph:

For the rate of foul balls per swing, there is a clear upward, linear trend until 95 mph, where the graph falls at a pretty steep rate.

Foul balls are one of the last unexplored realms of baseball statistical analysis. Hopefully Hit F/X will be able to give us some useful data, but until then, I’ll be waiting. Also, why do we only measure foul balls when they are caught by a fielder? Otherwise, they wouldn’t even be counted as a ball in play. There’s a lot we can learn about the batter-pitcher interaction by foul balls, but there is very little information out there. It would be a great leap forward if there were some good studies on foul ball data.

But, back to the graph. There isn’t a strong linear trend on the graph because of its parabolic shape. However, the correlation coefficient between 85 and 95 mph is .97949, which is an incredibly strong correlation.

This is a very important point when analyzing the success of soft-tossing pitchers. For pitchers who throw at low velocities, it is important to note that by getting fewer fouls, they are essentially giving away free strikes. These batted balls become balls in play, while for pitchers at higher velocities, the batter now has one additional strike on them, with a great chance for a strikeout. Besides the low swing and miss totals, these low-velocity pitchers have fewer strikes in their favor.

As to why there is a sharp downward trend in the data after 95 mph, I’m not totally sure as to why, though I do have a hypothesis. One, is to think of the graph not in terms of foul or non-foul, but in terms of being late on a pitch. While some of these fouls are going to be pulled, the fact that it is dictated by velocity means that the ones affected by velocity are those that the batter is late on. Therefore, as the velocity goes up, the batter will be late on the pitch to a greater degree. As a result, when the batter gets beyond 95 mph, they are no longer late and fouling off the pitch, but they are late for a swing and miss. This probably has something to do with the exponential increase in swing and misses for high velocities.

This may not change the end result of the at-bat too much, as a strike is still a strike whether its a whiff or a foul; though, higher velocities will have more 2 strike swing and misses (for a K), while lower velocities have longer 2-strike at-bats, due to the at-bat staying alive. The lower velocities will probably have more foul-outs as a result, however.

Part 3: Ball In Play Rate

This last graph shows the rate of balls in play per swing at each velocity. Again, the data is about where we’d expect it, as its harder to put a ball in play at a higher velocity. This speaks volumes as to why low-velocity pitchers struggle in the majors: if the batters can put your stuff in play more often, there are more chances for hits, and fewer for free outs (strikeouts). This one follows common logic: the faster the velocity, the fewer balls in play per swing.

The graph follows a very consistent linear trend from 85 to 97 mph, then drops suddenly at 98+ mph. It is difficult to say why there is a sudden drop, as it could be due to small sample size or due to the fact that they’re just so hard to make contact with at those speeds. It may very well be a mix of both, though the fact that even 97 mph is within the linear trend makes me believe there is a significant sample size component to this issue.

From 85 mph to 97 mph, the correlation coefficient is -0.977, which is another very, very strong correlation. The fact that there is a correlation is not surprising, though the strength of the correlation is quite shocking. I didn’t expect there to be such a substantial correlation.

This study brings about some very interesting trends, as the strength of these correlations are very strong. In particular, the relationship between velocity and foul balls (which is probably causal, velocity causing foul ball percentage for the reason explained) is particularly interesting, especially because the issue is rarely discussed. I think this could give us some insight as to the relationship between velocity and pop-up rate, as pop-ups are generally thought to be the result of being late on a pitch, particularly on inside pitches, where its hard to get the bat head to the ball on time.

In the end, the data seem to back up the reasons why it is so hard to succeed in the MLB without fastball velocity: low-velocity means fewer Ks, more balls in play. I’ll do more research on this, and I hope to post more next time.

Thanks to TheHardballTimes.com for their contributions to this article.

Mike Silver recently completed his requirements for the Sport Management Major at THE University of Massachusetts-Amherst, where he is a brother of Theta Chapter of Theta Chi Fraternity, the best house in the country. He is a huge Red Sox and Bruins fan, and longs for the days of the REAL Boston Garden, Cam Neely, and the ultimate Dirt Dog Trot Nixon. Aside from StatSpeak, you can find Mike at TheHardballTimes.com and FireBrandAL.com. If you have any questions, you can reach him at mjasilver@gmail.com. Have a good night readers, and know that Mike hopes to hear from you soon. If you quote Mike in an article, please let him know. He’d love to hear it.

I decided to go back to researching pitcher strikeout correlations today, which has been one of my favorite topics of research lately. Now that I have MINITAB, I can generate this stuff a little easier. The interface is nice as well. So, here are a few graphs and correlations. Again, same rules as before: qualified pitchers from 2008.

Today we’ll look at some strong correlations and the regression equation I produced for projecting strikeouts.

It was generated using Fangraphs.com’s plate discipline statistics. There are a few interesting points. The two I liked most were the highest expected strikeouts (C.C. Sabathia: .2659 exp. K percentage, versus .2455 actual K%) and the highest actual strikeouts (Tim Lincecum, .2399 expected K%, .2858 actual K%). Lincecum’s outlandish strikeout totals make him an easy pick for an outlier or a player with substantial error.

However, I have a suspicion that the regression line for this problem would fit better as a non-linear relation.

Here are some other correlations that were pulled out of the data. Each correlation is the R value between the given variable and actual strikeout percentage.

Contact Percentage: -0.869

This one took the cake… and not surprisingly, either. If you miss bats, you will get lots of strikeouts. No surprise here.

Swing Percentage: 0.177

This one was a little surprising. I expected the correlation to be much stronger. If you swing more, you will make contact in more at-bats. The correlation is still there, but it is very weak. I would like to investigate this one a little more.

Zone Percentage: 0.065

To me, this one was shocking. I expected there to be at least some meaningful correlation between zone percentage and strikeout percentage. However, if seems that, for the range of zone percentage thrown among MLB pitchers, there is no correlation. Of course, a pitcher who never throws strikes will never strike anyone out, but, for the range that MLB players throw strikes, it makes no difference. If you want to avoid BBs, pound the zone. If you want strikeotus, I guess it doesn’t matter much.

O-Swing: 0.323

This was another surprising development. I expected this to be much higher: if you can get a batter to chase pitches, he’ll miss more often. The logic is certainly true, but, again, for the range of values among MLB pitchers, there is a weak correlation. Don’t get me wrong, it does matter, just not as much as I had expected.

ZSwing %: -0.052

Another surprise. This may have further implications to the ability of pitchers to get called strikes. The amount that a hitter swings in the zone matters very little and is negligible. Wouldn’t it seem that a hitter who swung a lot in the zone would make contact more and strike out less? Guess not. Funny how things are sometimes.

That’s all for now. Next time, we’ll continue this study of plate discipline statistics

Thanks to Fangraphs.com for their contributions to this article.

Mike Silver recently completed his requirements for the Sport Management Major at THE University of Massachusetts-Amherst, where he is a brother of Theta Chapter of Theta Chi Fraternity, the best house in the country. He is a huge Red Sox and Bruins fan, and longs for the days of the REAL Boston Garden, Cam Neely, and the ultimate Dirt Dog Trot Nixon. If you have any questions, you can reach him at mjasilver@gmail.com. Have a good night readers, and know that Mike hopes to hear from you soon. If you quote Mike in an article, please let him know. He’d love to hear it.

Per request of a reader, I was asked to see if Jeff Kent is a HOF player (even though he retired way back in January). So of course, I obliged.

To do this, I compared Jeff Kent to all the 2b in the Hall currently. Below are tables that feature WAR, WAR/150, OBP, SLG, wOBA, and EqA. The tables also feature the average of each stat for the selected pool of players NOT INCLUDING Kent.

Player

WAR

B. Mazeroski

27

R. Schoendiest

40.3

N. Fox

44.6

B. Doerr

48

T. Lazzeri

48.1

J. Evers

48.3

J. Gordon

54.9

B. Herman

55.5

B. McPhee

57.8

R. Sandberg

61.8

J. Robinson

63

F. Frisch

74.7

R. Carew

79.3

C. Gehringer

80.9

J. Morgan

103.5

N. Lajoie

104.1

E. Collins

125.7

R. Hornsby

127.7

Average

69.2

J. Kent

59.4

Player

WAR/150

B. Mazeroski

1.9

R. Schoendiest

2.7

N. Fox

2.8

B. Doerr

3.9

T. Lazzeri

4.1

J. Evers

4.1

B. McPhee

4.1

B. Herman

4.3

R. Sandberg

4.3

R. Carew

4.8

F. Frisch

4.9

C. Gehringer

5.2

J. Gordon

5.3

J. Morgan

5.7

N. Lajoie

6.3

E. Collins

6.7

J. Robinson

6.8

R. Hornsby

8.5

Average

4.8

J. Kent

3.9

Player

OBP

B. Mazeroski

0.299

R. Schoendiest

0.337

R. Sandberg

0.344

N. Fox

0.347

B. McPhee

0.355

J. Evers

0.356

J. Gordon

0.357

B. Doerr

0.362

B. Herman

0.367

F. Frisch

0.369

T. Lazzeri

0.38

N. Lajoie

0.38

J. Morgan

0.392

R. Carew

0.393

C. Gehringer

0.404

J. Robinson

0.409

E. Collins

0.424

R. Hornsby

0.434

Average

0.373

J. Kent

0.356

Player

SLG

J. Evers

0.334

N. Fox

0.363

B. Mazeroski

0.367

B. McPhee

0.372

R. Schoendiest

0.387

B. Herman

0.407

J. Morgan

0.427

R. Carew

0.429

E. Collins

0.429

F. Frisch

0.432

R. Sandberg

0.452

B. Doerr

0.461

J. Gordon

0.466

T. Lazzeri

0.467

N. Lajoie

0.467

J. Robinson

0.474

C. Gehringer

0.48

R. Hornsby

0.577

Average

0.433

J. Kent

0.5

Player

wOba

B. Mazeroski

0.293

N. Fox

0.324

R. Schoendiest

0.334

J. Evers

0.344

B. McPhee

0.354

R. Sandberg

0.355

B. Herman

0.362

R. Carew

0.37

B. Doerr

0.377

J. Gordon

0.377

F. Frisch

0.377

J. Morgan

0.382

T. Lazzeri

0.386

N. Lajoie

0.399

C. Gehringer

0.404

J. Robinson

0.412

E. Collins

0.414

R. Hornsby

0.459

Average

0.374

J. Kent

0.366

Player

EqA

B. Mazeroski

0.244

N. Fox

0.253

R. Schoendiest

0.258

B. McPhee

0.258

J. Evers

0.269

F. Frisch

0.274

B. Doerr

0.278

T. Lazzeri

0.284

B. Herman

0.284

R. Sandberg

0.284

J. Gordon

0.287

C. Gehringer

0.289

R. Carew

0.301

J. Robinson

0.308

N. Lajoie

0.309

E. Collins

0.311

J. Morgan

0.314

R. Hornsby

0.337

Average

0.286

J. Kent

0.292

According to Tom Tango the odds of someone making the HOF when their WAR is in the 50’s is 49%. Jeff Kent is sitting on a WAR of 59.4 If you vote based on precedent, Kent is obviously a HOF’er as Mazeroski and Schoendiest are in the HOF. If you vote on production, Kent should fall just short. In reality, he has a 50/50 chance, as Tango’s odds show. While he compares favorably to the pool in respect to SLG and EqA, he is below average when it comes to WAR, WAR/150, wOBA, and OBP.

The HOF is a museum where baseball greats are recognized (which is why I feel Bonds, Rose, etc should be allowed in). Considering Kent retired as the all-time leader among 2b in HR’s, he should be voted in and recognized for that accomplishment, even if his production is not enough to get in.