# Inspect data, the data is pitches tracked over a 2 month stretch in the 2013# MLB season.baseball=pd.read_csv('./data/baseball-pitches-clean.csv')printbaseball.shape[0]," pitches were tracked."baseball.head()

Ok so from what we see above that pitch that's 49 MPH is definately an error, there's no way a guy who's throwing 90 MPH on average is going to throw a 49 MPH pitch.

In [69]:

printlen(baseball[baseball['start_speed']<60]),'pitches are under 60 mph'# R.A. Dickey is a knuckleballer, one of only ones in the entire leaguedickey=baseball[baseball['pitcher_name']=='R.A. Dickey']print'R. A. Dickey has ',len(dickey[dickey['start_speed']<60]),'under 60 mph'

7 pitches are under 60 mph
R. A. Dickey has 0 under 60 mph

If Dickey who's a knuckleballer isn't throwing anything under 60 MPH, then it's pretty safe to say these pitches under 60 are outliars.

In [75]:

over_60=baseball['start_speed']>=60baseball=baseball[over_60]

Now that we've cleaned up the dataset a little, let's start visualizing it.

Knuckleballs look to be have the most variance. This isn't that suprising since knuckleballs are based on Chaos Theory.

Changeups appear to be located mostly in the bottom half of the zone. This intuitively makes sense since a changeup is meant to look exactly like a fastball, the changeup has a slower speed than the fastball thereby confusing the hitter.

Because the changeup is on the same trajectory as a fastball but slower, gravity has a greater effect, therefore the pitch ends up lower in the strikezone.

Ok so I watch baseball and I've literally never heard of the Eephus pitch. From the graph it looks like it's really unpredictable, but also that there's not much data on it. Let's take a look at the actual counts.

This rules out my suspicion that the Eephus pitch is similar to the Knuckleball. It's suprising the knuckeball distribution is centered where it is in the high 70's. Traditionally Knuckleballs are high 60's pitches. This might be due to R.A. Dickey being the dominant Knuckleball user in today's game. His are known to be faster than most.

In [120]:

# Let's see how many of these Dickey throwsknuckles=baseball[baseball['pitch_name']=='Knuckleball']dickey=knuckles[knuckles['pitcher_name']=='R.A. Dickey']print'Percentage of Knuckleballs belonging to Dickey',(len(dickey)/len(knuckles)*100)

Percentage of Knuckleballs belonging to Dickey 100

Well it turns out all the Knuckleballs in our dataset are thrown by R.A. Dickey! Well that confirms the suspicion about the Knuckleball speeds.

We saw previously that it was pretty difficult to gain much insight into pitch types aside from general differences. This might be more meaningful if we analyzed a specific pitcher. Let's do Yu Darvish.

Darvish is known for having a wide array of pitches at his disposal and is one of the best current pitchers in baseball so he's a solid choice.

Apart from his slider there's no drastic change in pitch speeds. Further if we take a lot at his top 3 pitches: fastball, cut fastball and slider we see that the distribution of the pitches speeds is consistent and it stays consistent throughout the entire game.

If a hitter's hope was that Darvish was become weaker over the course of a game it looks like they're out of a luck.

Already we can see Verlander is a drastically different pitcher than Darvish, fastballs make up 55% of his routine, Darvish fastballs made up 36% of his routine. Verlander throws his 3 other pitches for around the same amount.

It's interesting to note that 94% of Darvish's routine was made up of fastball, cut fastball and slider.
Verlander 2nd and 3rd pitches are Darvish's 4th and 5th, thrown for ~32% vs ~6%.

Verlander's fastball becomes faster over the course of the game and his changeup slower. We can also see that Verlander isn't as consistent with his pitch speeds as Darvish. He's more consistent during the middle innings.

This makes sense intuitively since in the first couple of innings the pitcher is finding their "groove" and in the latter innings fatigue starts to set in.

I found it weird that Verlander's fastball gets faster over the course of the game. So I decided to compare it to the norm.