Diamond Dollar$http://vincegennaro.mlblogs.com
Insights and analysis from SABR President Vince Gennaro
Sat, 01 Sep 2018 04:47:12 +0000 en
hourly
1 http://wordpress.com/http://0.gravatar.com/blavatar/c8c49cb08350f0657a57bafc012b67e9?s=96&d=http%3A%2F%2Fs0.wp.com%2Fi%2Fbuttonw-com.pngDiamond Dollar$http://vincegennaro.mlblogs.com
Opening Day, Mike Trout and 2014 Season Expectationshttp://vincegennaro.mlblogs.com/2014/03/31/opening-day-mike-trout-and-2014-season-expectations/
http://vincegennaro.mlblogs.com/2014/03/31/opening-day-mike-trout-and-2014-season-expectations/#commentsMon, 31 Mar 2014 19:52:20 +0000http://vincegennaro.mlblogs.com/?p=385Happy (full) Opening Day. First a comment on the Mike Trout contract, also known as the “perfect deal” for both sides. The Angels and Mike Trout nailed it. Six years at $144.5 million is the dream scenario for both sides. The length and terms of the deal are optimal. For those who think the deal should be longer, I would argue that each of the next two years would be priced at $40 million per year, a price that the Angels were well advised to avoid. The reason the Angels could get Trout to agree to a deal at this relatively modest commitment is because they are giving him the opportunity (actually gift) of re-pricing himself at the young age of 29. For a player of Trout’s caliber (or at least what most people expect Trout to be), hitting the free agent market at 29 vs. say 31 years old, could be worth $100 million or more to Trout. I’m not suggesting he will necessarily get $50 million per year, but he is much more likely to get a longer term deal as a 29-year old versus as a 31-year old. In the end, the Angels maximize the return on their investment by keeping it to six years. Yes, Trout may walk after this contract ends and some may argue that the Angels lost the opportunity to make him a lifelong Angel. I would argue that the price of doing so would not be worth the risk.

Here are my picks for the 2014 season:

Division Winners–Angels, Indians, Rays in the AL; Dodgers, Cardinals, Nationals in the NL

Wild Cards–Royals and Red Sox in the AL; Giants and Braves in the NL

World Series–Cardinals over the Angels

Cy Young Winners–David Price and Madison Bumgarner

MVP’s–Mike Trout and Joey Votto

Top Story line of the year–a big bounce-back by the Angels, driven by another MVP year by Trout, a resurgence by Pujols and a real contribution from Josh Hamilton

2nd biggest story line–The Cardinals young pitching dazzles the NL. The Cardinals could win 105 games this season, on the strength of their great young arms

]]>http://vincegennaro.mlblogs.com/2014/03/31/opening-day-mike-trout-and-2014-season-expectations/feed/3bball21The Broken Hall of Fame Voting Processhttp://vincegennaro.mlblogs.com/2013/12/30/the-broken-hall-of-fame-voting-process/
http://vincegennaro.mlblogs.com/2013/12/30/the-broken-hall-of-fame-voting-process/#commentsMon, 30 Dec 2013 15:39:44 +0000http://vincegennaro.mlblogs.com/?p=365The Hall of Fame has a math problem—too many deserving candidates and too few ballot slots to elect them. This is not a new problem, although it is likely coming to a head in the soon to be released class of 2014 voting results on January 8th. The problem has been building for years, as either the voting rules (maximum ten votes per ballot) or the attitude of the voting body (unwillingness to expedite the election of unambiguously qualified candidates), has resulted in a backlog of “should be” Hall of Famers, still stuck on the Hall of Fame ballot. Couple this reality with the pool of super star candidates who are suspected of using performance enhancing drugs and the problem is compounded. Players like Barry Bonds and Roger Clemens, who would have been quick honorees are lingering on the ballot, likely to consume votes year-after-year until voters ultimately sort out their fate. The controversy over PEDs is like a tax on votes. By my count, over 30% of last year’s votes went to those who have been suspected by some of using steroids, drastically reducing the number of votes available to other worthy candidates.

The 2014 ballot introduces non-controversial HOF candidates—Greg Maddux, Tom Glavine, Frank Thomas, and Mike Mussina—who are added to the list of clear cut HOFers—Biggio, Bagwell, Piazza, Raines, Alan Trammell, and Edgar Martinez. In addition, there are players who are on my HOF list, although I concede they will not be on everyone’s list—Curt Schilling, Fred McGriff and Larry Walker. Next, let’s add in the “absolutely HOFers, if not for steroid allegations”—Bonds and Clemens. Finally, I haven’t even mentioned Mark McGwire, Sammy Sosa, Rafael Palmeiro, Don Mattingly, Jack Morris, Lee Smith, and newcomer Jeff Kent who are likely to garner votes. The math simply does not add up.

I’m not writing to make the case for HOF candidates. We’ll save the merits or de-merits of various HOF candidates for another day. I want to focus on the election process. Let’s briefly look at the ballot history and make some assumptions about the 2014 ballot. In the past twenty years, we have seen as many as 581 ballots (2011) turned in by the eligible voters—the baseball writers who are members of the BBWAA. Over the same timeframe we have seen the average ballot contain six names, with a high of 6.7 names per ballot in 1999, the year that Nolan Ryan, George Brett and Robin Yount were voted into the Hall of Fame. Over the twenty years, we saw 28 players elected, an average of 1.4 per year.

For the upcoming 2014 election results, let’s assume that the number of ballots are at their all time high of 581, which is similar to the number of ballots submitted the last three years (569, 573, 581). This would require a player to reach 436 votes to achieve the 75% threshold of election. Let me create a best case scenario. In 2013 we saw an average of 6.6 names per ballot, but in our best case scenario let’s say that each 2014 ballot has an average of 8.6 names. This is a generous assumption (which is why it’s a best case scenario), since the last time we saw an average of 8.6 names per ballot (or greater) was 1960—54 years ago. Next, let’s assume that voters are consistent year-to-year and that most players who were on last year’s ballot receive the same number of absolute votes in 2014, with several notable exceptions:

In his second year on the ballot, let’s say that 3000-hit Craig Biggio ups his percentage by four percentage points to 72.2%, still leaving him 16 votes short of election.

Since many voters continue to view Bonds and Clemens as consensus HOFers, even before there was any suspicion of steroids, let’s assume that some writers avoided voting for them on their first appearance on the ballot in 2013, as a form of punishment. I’ll give them 6 additional percentage points of votes for their second year on the ballot, raising their totals to 42.2% (+39 votes vs. 2013) and 43.6% (+41 votes), respectively.

Tim Raines has most deservedly increased his vote total each of the last five years (22.6%, 30.4%, 37.5%, 48.7%, 52.2%). Let’s assume that trend continues with a modest 3 percentage point increase to 55.2%.

Jack Morris, in his last year of eligibility, is likely to see a push, which will fall short, but raise his vote total to 70% of the ballots from last year’s 67.7%

Finally, voters will be in search of ballot space to vote for the most deserving candidates, so the support is likely to wane for Don Mattingly and Rafael Palmeiro, both of which will sink to 25 votes (from 75 and 50 last year) and will fall off the ballot, as they will dip below 5%.

The net result: Based on these assumptions, we are left with 1338 votes to allocate to ballot newcomers for 2014. This includes the 106 votes that went to Dale Murphy in 2013, his last year on the ballot. (Remember, 1338 remaining votes is based on an unprecedented one-year jump in votes per ballot from 6.6 to 8.6). I’ll assume that universally beloved Greg Maddux gets a vote on all but 30 ballots, which translates into 551 votes. This leaves only 787 votes for Glavine, Thomas, Mussina, Jeff Kent or for that matter all the remaining candidates on the ballot.

Under this scenario, I expect Maddux to be the lone inductee from this year’s BBWAA ballot, with Glavine and Thomas dividing the modest number of remaining votes to fall short of securing their rightful place in the Hall of Fame. (Former Hall of Fame senior research associate and longtime SABR member Bill Deane came to the same conclusion. You can read about his thought process in this blog post). I have serious concerns about Mike Mussina’s ability to remain on the ballot, given the anticipated voting dynamics, despite his impeccable credentials which place him slightly ahead of Glavine on my Hall of Fame candidate slate. Pitching his entire career in the tough AL East division, Mussina put up extraordinary numbers while facing opposing lineups that touted a .760 OPS. By comparison, Maddux and Glavine faced lineups that averaged .730 and .738 OPS, respectively.

The scenario I’ve laid out represents my best case scenario. If voters use only 8 votes per ballot versus the 8.6 I have assumed, we may see some big names—McGwire, Sosa, Mattingly, Palmeiro, Mussina, McGriff—fall off the ballot by not achieving the modest 5% minimum. The story does not end here. This is just the beginning. The ballot limitations of 10 votes and/or the mindset of the voters being stingy with their votes will compound itself in subsequent years as the backlog of deserving candidates grows quickly. In any of the plausible scenarios I see the problem getting worse rather than better as a result of the 2014 election. What would it take to begin to correct the problem? A “victory” for those who want to eliminate the glut of deserving candidates who are wallowing away on the ballot would require 6 or 7 players being voted into the HOF this year—a virtual impossibility, given the arithmetic. Treading water—not solving the problem, but not allowing it to worsen—would result in four players being voted in, such as Biggio, Maddux, Glavine , plus a 4th from the list of deserving candidates. This “tread water” scenario would take 5 players off the ballot and not worsen (nor help) the backlog. (Regardless of his fate, Jack Morris will come off the ballot after this vote).

For 2015, Randy Johnson, Pedro Martinez, John Smoltz, and Gary Sheffield are among the high quality candidates joining the ballot. From 2016 to 2018 we will add Ken Griffey, Jr., Pudge Rodriguez, Manny Ramirez, Jorge Posada, Vlad Guerrero, Jim Thome and Chipper Jones to an already bloated HOF ballot.

On a recent episode of Clubhouse Confidential on MLB Network, I discussed my hypothetical voting strategy (I am not a member of the BBWAA and do not have a vote) in the upcoming HOF elections. Given my assessment of the situation, I could no longer afford to vote for my ten most deserving candidates, because that approach would likely result in many worthy candidates falling off the ballot with less than 5% of the votes. My voting goals would be two-fold: First, vote for the candidates with the highest probability of being elected, in hopes of electing them and clearing ballot space for next year’s ballot. Second, vote for the deserving candidates on the lower end of the ballot, in order to protect their inclusion on future ballots. Who would I leave off my ballot? Those in the middle category—on my list it would likely be Bagwell and Piazza—deserving candidates who are not in jeopardy of falling off the ballot, but are not going to get elected this year. We are now in a situation that requires “gaming” the ballot, in order to gerrymander the broken process and preserve its original objectives—electing deserving candidates to the Hall of Fame.

In Washington this year we watched as lawmakers dealt with the “fiscal cliff.” In 2014 we will watch closely as the BBWAA and the Hall of Fame deal with the “ballot cliff” and explore ways to reform the voting process.

]]>http://vincegennaro.mlblogs.com/2013/12/30/the-broken-hall-of-fame-voting-process/feed/7bball21Quantifying the Quality of Opponentshttp://vincegennaro.mlblogs.com/2013/10/08/quantifying-the-quality-of-opponents/
http://vincegennaro.mlblogs.com/2013/10/08/quantifying-the-quality-of-opponents/#commentsTue, 08 Oct 2013 14:23:39 +0000http://vincegennaro.mlblogs.com/?p=354Many baseball analysts strive to create “context neutral” stats, so they can compare players stats, while minimizing biases. A player’s traditional stat line–batting average, home runs, OBP, slugging percentage, OPS, etc.–are a product of more than the player’s talent level and luck. For example, they are also impacted by the handedness of the pitchers they face and the ballpark in which they play . These factors can be accounted for with evaluating platoon splits or park-adjusted stats. Perhaps because it is more difficult to measure, advanced stats are seldom adjusted for the quality of opponents. Adjusting for the quality of opponents is particularly important when interpreting starting pitcher stats, since we deal with a small sample size of 25 to 35 starts per year, with a start every 5th day, regardless of who is on the schedule. Since talent tends to be clustered in certain Divisions, the unbalanced schedule produces a skewed distribution of opponents. A pitcher in the AL East is likely to be playing a different game than one in the NL East.

I use a simple measure to provide capture the quality of opponents a starting pitcher has encountered over the course of a season–the OPS of the offenses he has faced. More specifically, the OPS of the teams he has faced, against the same handed pitchers. In other words, for David Price, I take the opponents for each of his 27 starts for 2013 and use the OPS against LHP as my measure. If a RHP, faced the same teams, their opposition would be categorized based on their OPS against RHP. For example, Texas mashes LHPs to the tune of a .798 OPS, while they maintain a modest .705 OPS against RHP. On the other hand, the Cardinals hit RHP at a .753 clip, while batting only .675 against lefties. So, it’s not enough to say a pitcher faced the Cardinals or Rangers. It’s also important to distinguish his handedness.

I use a quality of opponents factor in ranking starting pitchers each year. Several years ago when I developed my SPR, I wanted to reduce the context bias that is in our everyday pitching stats, like ERA or K/9, etc. By adjusting for both the ballpark a played pitched in and the opponents he faced, we get closer to “context neutral” in evaluating how well a pitcher performed. Let’s take a look at the landscape for 2013. When we look at pitchers who faced the toughest (and weakest) competition, we find patterns. We tend to find pitchers clustered in the same division and often even on the same teams. Esmil Rogers of the Toronto Blue Jays holds the distinction of pitching against the toughest opponents in 2013. The Yankees and Astros dominate the top 20 pitchers who faced toughest opponents, for two reasons. First, they play in offensive divisions. (It is rare to see a NL pitcher near the top of the rankings, due to the lack of DH in the NL.) Second, both the Yankees and Astros were weak hitting teams in divisions with decent offensive prowess. (When was the last time we could say the Yankees were a weak hitting team? In 2013 they batted .676 against LHP and .686 against RHP, nearly 30 points below the league average.) Five Yankees populate the top 20: Ivan Nova #2, Hiroki Kuroda at #4, CC Sabathia at #11, Andy Pettitte at #12, and Phil Hughes at #19. Houston also had 5 of the top 20–Dallas Keuchel #3, Eric Bedard #6, Bud Norris #9 (including his time with the Orioles), Lucas Harrell #10 and Jordan Lyles #18.

The other end of the list–pitchers that faced the easiest competition in 2013 are dominated by NL East hurlers. Of the bottom 15, twelve are from the NL East. Mike Minor, Julio Teheran, Kris Medlen, and Tim Hudson from the Braves, along with Jordan Zimmermann, Dan Haren, and Stephen Strasburg of the Nats are all bottom 10, along with the Mets’ Matt Harvey. The Mets also have Hefner, Niese and Wheeler in the bottom 20.

So, how wide is the range in the quality of opponents? The top pitchers face offenses with OPS about 4% greater than the league average, while the bottom pitchers tend to face offenses that are 3-4% weaker. This amounts to about 25 to 30 OPS points difference. Below is a list of the top 20 (faced toughest opponents) and bottom 20 (faced weakest opponents).

]]>http://vincegennaro.mlblogs.com/2013/10/08/quantifying-the-quality-of-opponents/feed/4bball21The Yankees’ Magic Number–$189 millionhttp://vincegennaro.mlblogs.com/2013/10/02/the-yankees-magic-number-189-million/
http://vincegennaro.mlblogs.com/2013/10/02/the-yankees-magic-number-189-million/#commentsWed, 02 Oct 2013 14:18:39 +0000http://vincegennaro.mlblogs.com/?p=351The Yankees spent much of September saying goodbye to an old friend–Mariano Rivera. Perhaps they will spend November and December saying goodbye to the notion of having a payroll below the $189 million luxury tax threshold for 2014. I was among the first to infer their intentions, as I digested the implications of the trade that brought them Michael Pineda from the Seattle Mariners in January 2012. Several days later, on Clubhouse Confidential on MLB Network, I opined that the Pineda acquisition, coupled with the development track of some of their star young prospects (e.g., Manny Banuelos, Delin Betances) could allow the Yankees to do the unthinkable–have a major league payroll of less than $189 million, while maintaining a competitive, contending team. This is the baseball equivalent of re-fueling the airplane, while in flight, and doing so with discount fuel–a pretty nifty magic trick if one can achieve it. A week or so after my comments on-air, Yankees managing partner Hal Steinbrenner stated publicly that the Yankees had ambitions of tucking under the $189 million luxury tax threshold for their payroll.

Make no mistake, if the Yankees can reduce their payroll below $189 million for even one year, they stand to gain significant dollars well beyond the direct payroll saved. They would reset their luxury tax rate from its current 50% level to a step-ladder set of future rates that begin at 17.5%. This means reducing say a $210 million payroll to $188 would save $22 million in payroll dollars and another $7 million in luxury tax. Even if the Yankees payroll escalated in future years, the value of resetting the luxury tax carries forward as the tax escalates over the balance of the current Collective Bargaining Agreement. However, if this quest for efficiency comes at the expense of making the playoffs and challenging for championships, its a bad financial decision. The potential savings pale by comparison to the revenue opportunities from being a reliable participant in October baseball.

Without access to Yankee financial information, it is difficult to project the financial implications of being non-competitive vs. competitive. I have always maintained that the Yankees have more to lose (than any other team in baseball), by failing to be a perennial playoff team. Their entire business model, including their pricing structure is built around being among the best teams in baseball and having more than a fair share of the games biggest stars on their roster. Along with their storied legacy, being the the best team in baseball (or at least in the discussion) is their identity. Given my research and analysis devoted to understanding the relationship between on-field performance and revenues, as well as my experience assessing the motivations and perspectives of fans, I would estimate that a two or three-year run of winning a respectable 85 games per year could cost the Yankees between $50 and $100 million in revenue per season. Add in the impact of the decline in market value of their assets–the franchise, their stake in the YES Network, etc.–and the financial penalty for failing to maintain excellence gets real big, quickly.

There is a lagged effect to winning (or losing). Fans don’t respond immediately. The first signs of fans withdrawing their financial support for a team come in the form of declining TV ratings and the no show rate at home games. In today’s New York Times, Richard Sandomir reported that viewership of Yankee games was down more than 30% this year. No show rates from fans who had purchased tickets, including season ticket holders, appeared to be on the rise at the Stadium this season. I have no doubts that the Yankees leadership will quickly abandon ambitions of going to battle in 2014 with a payroll of under $189 million, if the strategy threatens fielding a dominant team. If Cano is re-signed to a contract with an average annual value of $25 to $30 million, even with A-Rod’s status in limbo, it will be very difficult for the Yankees to acquire the necessary talent to contend for a championship, while staying under the luxury tax threshold. Get ready for the “Under $189 million Farewell Tour”.

]]>http://vincegennaro.mlblogs.com/2013/10/02/the-yankees-magic-number-189-million/feed/1bball21Which Starting Pitcher Owns Baseball’s Best Change Up?http://vincegennaro.mlblogs.com/2013/07/08/which-starting-pitcher-owns-baseballs-best-change-up/
http://vincegennaro.mlblogs.com/2013/07/08/which-starting-pitcher-owns-baseballs-best-change-up/#commentsMon, 08 Jul 2013 13:03:13 +0000http://vincegennaro.mlblogs.com/?p=340I’ve always been fascinated by the change up. Slower than a fastball, with less movement than a typical slider or curveball, it thrives on deception–appearing to be something other than what it is. According to the pitch f/x data, 92 starting pitchers have thrown change ups this year. On average, they account for about 14% of their pitches, are 7.6 miles per hour slower than their fastball, move a total of 11.8 inches. The vertical movement of the change up accounts for 4.4 inches per pitch, or about 37% of the total movement of the pitch. Some pitchers throw a change up one-third of the time (Justin Verlander), while others throw it as little as 1% of the time (Edwin Jackson).

Which starting pitcher has the best change up in baseball for the first half of the 2013 season? I’m not in favor of looking at the batted ball results of change ups put in play, as it can be a misleading measure. No one pitch-type can be judged by how batters perform against it, in isolation. Pitches live and die by the sequence that precedes them. A change up that follows a fastball is very different than a change that follows two previous change ups. (In a separate analysis, I’m in the process of evaluating pitch sequences and developing a system to “value” sequences of pitches, rather than stand alone pitches.) For the purpose of this piece I rated pitchers’ change ups based on four factors. I looked at the velocity differential versus the pitcher’s fastball and the total movement on the pitch. I also factored in the percent of a pitcher’s mix of pitches, giving a pitcher more “credit” for using the pitch if it were say, 20% of his pitch mix versus 5% of his mix. Finally, I gave additional points to a pitcher for his vertical movement. Although it is already included in total movement, I placed a premium value on the drop of a pitcher’s change up, effectively double-weighting it. The net result is a points system that rates the change ups thrown this season.

At the top of list is Justin Verlander. His velocity differential is around the league average of 7.2 mph, but his total movement, vertical drop and percentage thrown are all well in excess of the MLB-wide average. Closely behind Verlander are Jason Vargas, Hyun-Jin Ryu and Jeremy Hellickson in second, third and fourth place. Ryu and Hellickson (along with Clay Buchholz) have the highest velocity differential versus their fastball at over 11 mph. Vargas (followed by Derek Holland, Mike Minor, Cliff Lee and Wade Miley) have the greatest total movement on the pitch, all exceeding 16 inches. The number 5 rated change up belongs to Cole Hamels–the top ranking change up artist, applying the same formula to 2012 data. Tommy Milone, Jarrod Parker, Eric Stults, Joe Saunders and Matt Moore round out the top 10. The Ray’s and Oakland A’s each have two starters in the top 10.

The biggest vertical drop belongs to Clayton Kershaw, who throws the pitch only 3% of the time, followed by the Orioles’ Chris Tillman, and the Rangers’ Derek Holland. The top 20 change ups for the first half of 2013 are listed below:

]]>http://vincegennaro.mlblogs.com/2013/07/08/which-starting-pitcher-owns-baseballs-best-change-up/feed/1bball21Ch Up 7-7-13Clustering Pitchers By Similarity: Part 2http://vincegennaro.mlblogs.com/2013/06/03/clustering-pitchers-by-similarity-part-2/
http://vincegennaro.mlblogs.com/2013/06/03/clustering-pitchers-by-similarity-part-2/#commentsMon, 03 Jun 2013 13:12:36 +0000http://vincegennaro.mlblogs.com/?p=329In my last post, I discussed one of my latest research projects, clustering pitchers by their similarities. The problem I’m trying to address with the analysis is to come up with an alternative to what is possibly the overall worst use of quantitative analysis in baseball–evaluating batter-pitcher match ups, based on career historical performance data between one batter and one pitcher. Instead, I’m trying to identify groups of pitchers that are likely to induce similar offensive performance by a single batter. If we can find a cluster of pitchers who present a similar challenge to a hitter, then we can enlarge the sample size of batter-pitcher “results” and at the same time shorten the timeframe over which we are measuring performance. For example, against right-handed hitters, my analysis suggests that lefty pitchers Barry Zito, Mark Buehrle, Paul Maholm, Zach Duke, Chris Narveson, Eric Stults, Joe Saunders and Jason Vargas (among others) are “similar”. This similarity is based on the profiling factors listed in the previous post, including the pitch repertoire, release points, most common 2-pitch sequences, the portion of the strike zone the pitcher favors, etc.

Below is a visual mapping of pitcher clusters. Each node represents a pitcher and each line between pitchers represents a “connection” or a similarity, based on a defined minimum threshold level. This graph includes only LHPs and it clusters them against only right-handed hitters.

Take note of the large cluster in red, at the top of the graph. Below is a zoomed version with labels identifying the pitchers. This is the cluster I reference above, which includes Zito, Buehrle, et. al.

Let’s take a deeper look at an example of Matt Holliday against this particular cluster of LHP. Over his career, Matt Holliday is 2 for 14 (in 17 plate appearances) against Joe Saunders. However, my analysis shows that Holliday crushes this cluster of LHP’s with an OPS in the 85th percentile against this cluster. So which is it–does the Holliday-Saunders match up favor Saunders, as the one-on-one career data suggests, or does it favor Holliday, as my analysis suggests? I don’t have a definitive answer (although I do have a test in mind, which I may conduct and write about at a later time), but I can make the case. Of the 17 PAs Holliday has had against Saunders, nine of them occurred four years ago in 2009, with just 8 PAs occurring in the last two seasons. By contrast, Holliday had 82 PAs against Saunders’ cluster of “like” pitchers over the same two-year period–2011 and 2012. I like the recent experience of two years vs. a career and I like the sample size of 82 vs. 17. I hope to have further comments on the value and predictive power of the pitcher cluster analysis approach in the coming weeks.

]]>http://vincegennaro.mlblogs.com/2013/06/03/clustering-pitchers-by-similarity-part-2/feed/2bball21LvRnewspreadoutLvR-0 w-LABELS-2Clustering Pitchers by Similarity: Part 1http://vincegennaro.mlblogs.com/2013/04/22/clustering-pitchers-by-similarity-part-1/
http://vincegennaro.mlblogs.com/2013/04/22/clustering-pitchers-by-similarity-part-1/#commentsMon, 22 Apr 2013 13:57:17 +0000http://vincegennaro.mlblogs.com/?p=317About six weeks ago I presented some of my latest research at the SABR Analytics Conference in Phoenix. The analysis focused on identifying pitchers who are similar to one another, grouping them into clusters, and determining how hitters have performed against various clusters. I worked closely with George Ng a data scientist at YarcData and made use of their sophisticated Urika hardware appliance, which specializes in graph analytics. The intent of the project is to develop an alternative to the relatively uninformative one-on-one batter-pitcher match up data that teams tend to use to inform their lineup, pinch-hitting and bullpen match up decisions. There are numerous problems with relying on the one-on-one batter-pitcher history, including small sample sizes and data that is old and stale. Is it relevant that Derek Jeter’s career stats vs. Roy Halladay includes a 4 for 10 in 1999?

The process to create pitcher clusters begins with determining the attributes that will define “similarity” between pitchers. I chose to tackle this issue from the batter’s perspective. In other words, what criteria would hitters use to “type” a pitcher? I matched the criteria–in the form of questions, with Pitch f/x attributes. The framework, which includes about 12 different attributes, is detailed in the chart below. Keeping with the approach of judging similarity from the perspective of the hitter, I segmented the data for each pitcher, based on left-handed vs. right-handed hitters. In other words, Jered Weaver wasn’t profiled once on these attributes. Instead, he was profiled twice–vs. LHB and vs. RHB, separately. Some pitchers–Jered Weaver, Hiroki Kuroda and Lance Lynn are particularly good examples–approach lefty and righty hitters completely differently. For example, at a very basic level, Weaver’s top 2 pitches against RHB are a 4-seam fastball and slider, while his top two pitches against lefties are a sinker and change-up. Some pitchers not only alter their pitch selection, but also change their release point (alter their starting point on the pitching rubber), or their movement (add a little more cut to their fastball or tilt to their slider), as well as many of the other attributes I include in the analysis. These nuances make it important to differentiate pitchers by their lefty-righty batter splits. Furthermore, I cluster a pitcher by his handedness, which leads to four separate categories of pitcher clusters–RHP vs. RHB, RHP vs. LHB, LHP vs. RHB, and LHP vs. LHB.

The results of the similarity analysis show that some pitcher pairs are similar against right-handed batters, but very different when judged against left-handed batters. The Red Sox Felix Dubront and the Rangers Matt Harrison are similar when facing LHB, but less so when facing RHB. Other highly similar pairs of pitchers include Bruce Chen and Randy Wolf (vs. LHB), Jonathan Niese and Wandy Rodriguez (vs. RHB) and David Price and Felix Dubront (vs. RHB). Pitchers who are least similar, or most opposite to one another include Brandon Morrow and Kyle Lohse (vs. LHB) and Nathan Eovaldi and Shaun Marcum (vs. RHB).

We can also see which pitchers are most similar to themselves, when facing righty and lefty hitters. It’s not surprising to see RA Dickey as the pitcher who differentiates the least, between RHB and LHB. Many closers dominate this list, as they tend to have a limited pitch repertoire and use it in the same fashion regardless of who they face. But other starters who rank high are AJ Burnett, Wade Miley and Manny Parra. Those who are most opposite to themselves when pitching to LHB and RHB include Lance Lynn, Matt Cain and Wade Davis.

In future posts I’ll describe the process and share the results of pitcher clusters, as well as patterns of hitter performance against clusters.

]]>http://vincegennaro.mlblogs.com/2013/04/22/clustering-pitchers-by-similarity-part-1/feed/3bball21Clustering PitchersIt’s Time for the Yankees to Make the Big Movehttp://vincegennaro.mlblogs.com/2013/04/19/its-time-for-the-yankees-to-make-the-big-move/
http://vincegennaro.mlblogs.com/2013/04/19/its-time-for-the-yankees-to-make-the-big-move/#commentsFri, 19 Apr 2013 12:37:48 +0000http://vincegennaro.mlblogs.com/?p=307With the news of Derek Jeter’s return delayed until at least late July, guaranteeing he’ll miss 100 or more games this year, it may be time to go to Plan B. The perfect move for the Yankees may be to trade for Texas Ranger’s, Jurickson Profar, a shortstop and the top rated prospect in all of baseball. When Jeter plays his next game as a Yankee, he will be 39 years old. Considering many have questioned his ability to play a credible shortstop for several years, a 39 year old version, coming off of serious ankle surgery, does not seem to be a great fit with a championship caliber team. On the other side of this potential trade we have a team that has two outstanding shortstops. Elvis Andrus, the incumbent Ranger shortstop is a 24 year old who has already made two All Star teams and played in two World Series. Profar made his major league debut last September, as a 19 year old, and promptly homered in his first MLB plate appearance. He is Baseball America’s #1 ranked prospect in all of baseball. He projects to be a legitimate major league shortstop, with above average power and a significantly above average hitter–a rare trifecta of skills.

I can’t think of a better time to gracefully slide Jeter to another role in the Yankee lineup. With his extended absence, uncertain return and even more uncertain physical capacity once he does return, it’s hard to argue with a move to acquire the top shortstop prospect since Troy Tulowitzki. At age 20, Profar would be under Yankee control at least through his age 26 season. His quick bat will likely amplify his left-handed power at Yankee Stadium, making him an even greater than expected run producer. The hope is that within a year or two–by age 22–Profar is a .280 hitter with 15 home runs, plus an above average major league shortstop. His ultimate upside could be the second coming of Robinson Cano.

One question is what can the Yankees give up to induce the Rangers to trade baseball’s top prospect. The Yankees would need to assemble an impressive package of players to acquire Profar. The Yankees farm system is not depleted, but many of it’s top prospects are at lower levels. A package that includes 21 year old outfielder Mason Williams and another highly rated prospect, like Tyler Austin, along with Brett Gardner, may at least get the Rangers attention. If you need to add Joba Chamberlain to the package, it’s worth considering. I realize that Brett Gardner is an integral part of the Yankee offense today, but with Granderson coming back soon, it might make sense to deal from a position of relative strength, in order to solve the long term problem of Jeter’s successor. I just don’t believe Edwardo Nunez has the defensive chops to be an everyday big league shortstop on a contending team. There may not be a cheaper option anytime soon, or one that has the chance to be an enduring, long term solution like Profar.

The toughest question may be where Jeter will play when he returns. Making him the primary DH may be the best option, while easing him into 3B, a position that requires much less lateral range. When the Yankees acknowledge that Jeter cannot play shortstop at a high level, a logjam is inevitable at either DH or the position Jeter moves to. When (if?) A-Rod comes back, it gets even more complicated. A-Rod may be best suited for DH. Hafner can only be a DH. Youkilis is limited to 1B, 3B or DH. However, these problems are only marginally more complicated with Profar replacing Jeter at shortstop. The issue of how to allocate playing time among players who have evolved into immobile, primarily offensive contributors is an issue that is not going away for the Yankees of the next several years. Now may be the time to confront the issue head on.

]]>http://vincegennaro.mlblogs.com/2013/04/19/its-time-for-the-yankees-to-make-the-big-move/feed/3bball21Stats vs. No Stats—a Controlled Experiment?http://vincegennaro.mlblogs.com/2013/04/01/stats-vs-no-stats-a-controlled-experiment/
http://vincegennaro.mlblogs.com/2013/04/01/stats-vs-no-stats-a-controlled-experiment/#respondMon, 01 Apr 2013 14:04:53 +0000http://vincegennaro.mlblogs.com/?p=301Over the last week, two articles appeared discussing two teams’ contrasting approaches to making baseball decisions. The Washington Nationals were called a “scouting first” organization that integrates statistical analyses into team decisions. By contrast, the Philadelphia Phillies seem proudly defiant of the trend to incorporate advanced metrics into their decision criteria. While there are a large number of MLB teams that put significant energy and dollars into objective analysis of data, the other end of the spectrum is often a mystery. Who are the clubs and how do they process information. In recent years teams like the Orioles, Dodgers and Giants have been accused of shunning stats in favor of intuition or the perspective and wisdom of career baseball people. However, when pressed these teams typically deny an aversion to the numbers side of the game and in fact tout their otherwise low-profile prowess in this area. It now seems that the Phillies are willing to be the proud flag-bearers for a shrinking group of ballclubs who believe that “new stats” fail to add value to decisions. We may finally have a controlled experiment of the stats team vs. the no-stats team. If two clubs, who fit those descriptions were to maintain their loyalty to their respective internal decision processes, it would be interesting to see how they perform over the next 4 or 5 years.

So who is our poster-child for the stats gurus? In the opposite corner, representing the stat heads, we have the Houston Astros. Truth be known, the opposite corner is actually quite crowded with teams that strive to make stat analysis a potential competitive advantage, with the Tampa Bay Rays at the top of the list, but we’ll choose the Astros as our subject for our controlled experiment. Under the leadership of former Cardinal executive Jeff Luhnow, Astros have assembled a team that more closely resembles a NASA lab crew than a baseball front office. From former NASA engineer Sig Mejdal, the team’s Director of Decision Sciences, to Assistant GM David Stearns and Pitch f/x guru Mike Fast, Luhnow has attracted a top-notch staff. Team CEO George Postolos seems fully bought-in to Luhnow’s approach and the baseball world is watching to see how the Astros fare over the next five years.

I like matching the Astros against the Phillies , because this match up also has a bit of handicapping embedded in it. The Phillies have been a competitive club, who some believe can still contend for the NL East, while the Astros are thought to be the worst team in baseball—by a lot. Given the predictions of how each team is expected to perform in 2013, we’re probably giving the Phillies a 20-win per season head start for the coming season. We can see how long the Astros take to close the gap and try to assess if the two teams approach to decisions was responsible for the outcome.

My view is that well thought out problem solving—quantitative and qualitative—can add enormous value to decision processes. Over my career, I’ve seen analytics supplement intuitive judgment, experience and observation on hundreds of occasions, almost always leading to higher quality decisions. I’ve seen baseball teams integrate analytics with scouting information and the wisdom of veteran baseball people to improve the confidence in their decisions.

The baseball data world is changing rapidly. Just six years ago baseball was producing about 900,000 data points to capture the outcomes of each pitch thrown and ultimately of each plate appearance in a major league season. With the introduction of Pitch f/x and related datasets, beginning on a full scale basis in 2008, we now have over 15 million annual data points that chronicle the baseball season, ranging from the angle of break on Derek Holland’s slider, to the most popular two-pitch sequence by Jered Weaver. There are literally thousands of questions that we could only speculate on six years ago, that we can answer objectively today. Even if you believe that statistical analysis may not have been a difference maker in 2006, the 15x increase in data we have today changes the game. It can help reduce the risk on $100 million contract decisions to a manageable level. I’m not arguing against the scouting perspective. The scouting perspective is critical and often the lead horse in a decision process. But that’s different than excluding statistical analysis from the ultimate decision.

My bet on how the controlled experiment turns out: I would expect the experiment will be aborted before we reach our five-year timeframe, as the Phillies will eventually modify their decision processes to integrate more quantitative information. If that change occurs, it may be interpreted as an answer to the controlled experiment.

]]>http://vincegennaro.mlblogs.com/2013/04/01/stats-vs-no-stats-a-controlled-experiment/feed/0bball21Pitching—in All the Wrong Placeshttp://vincegennaro.mlblogs.com/2013/01/08/pitching-in-all-the-wrong-places/
http://vincegennaro.mlblogs.com/2013/01/08/pitching-in-all-the-wrong-places/#respondTue, 08 Jan 2013 13:30:25 +0000http://vincegennaro.mlblogs.com/?p=286In the era of multi-purpose stadiums in the 1970s and 1980s, it seems that there were more similarities across the spectrum of ballparks than there is today. In the post-new Comiskey era, which began with Camden Yards, we’ve brought quirkiness back to the ballpark. We may not have returned all the way back to Ebbets Field, the Polo Grounds or the Baker Bowl, but today’s ballparks certainly don’t look alike. There are enough extreme characteristics in some of today’s parks to have a profound impact on players’ stats and careers.

The impact of parks on pitchers shows up several ways, but the most vivid is in the HRs a pitcher yields. Let’s look at two pitchers who have changed ballparks over the careers—moves which were beneficial to one and detrimental to the other. Aaron Harang began his big league career with Oakland, but then moved to Cincinnati, before he moved back to the west coast with San Diego and now the Dodgers. For right-handed (RHH) and left-handed hitters (LHH), the HR park factor for Cincinnati is 143 and 121, respectively (from Bill James Handbook—the average of the most recent 3 years). The index for Dodger Stadium is slightly above 100, while the other two ballparks Harang called home are well below 100, indicating they are pitcher-friendly, run (and HR) suppressing ballparks. Harang’s HR-rate as a Cincinnati Reds pitcher is 11.1% per flyball. His rate with the other 3 teams—all based in pitchers’ parks—is 7.5%. He clearly benefited by the move to San Diego and then LA. On the flip side we have Mat Latos, who has played for San Diego and Cincinnati. In San Diego, Latos notched a 7.9% HR/FB rate, while it soared to 11.8% in his first year as a Red. He mitigated the problem somewhat by being slightly less of a flyball pitcher in Cincy, but the leap in HRs is still a drag on his effectiveness.

There are four pitchers who standout to me as being mismatched with their home ballpark. Phil Hughes (NYY), Colby Lewis (TEX), Brian Matusz (BAL), and Rick Porcello (DET). Porcello has the reverse problem—and in a sense, it’s a smaller issue. He is an extreme groundball pitcher (approximately 90th percentile for 2012), but he pitches in a massive pitcher’s park, where flyballs will do far less damage than in a hitters park. So, what’s the problem, since the Tigers still benefit from his high groundball rate? First of all, not with that defense they don’t, but that’s another issue entirely. My point is that Porcello should have greater value pitching elsewhere, with a team that has a ballpark that penalizes flyballs, rather than a ballpark that is forgiving, like Comerica. Flyball pitchers like Colby Lewis and Brian Matusz would be far more effective in Oakland, Seattle, or any of the west coast parks, which tend to be more cavernous and/or where the ball will not carry as far.

Phil Hughes is a fascinating case study. I’ve always believed that Yankee Stadium was one of the worst venues for him to pitch. A right-handed flyball pitcher, pitching in a park that has a LHH HR index of 153—second only to Coors Field. The reason I list the LHH HR factor is because he will face more than 50% LHH. (Incidentally, Yankee Stadium has a RHH HR index of 102.) If you take a close look at peripheral stats such as K-rate, BB-rate, etc., you will see that Phil Hughes and Jered Weaver are very similar. There are two huge differences between the two. Weaver has perfected a change-up, which he uses extensively to LHH, keeping the ball away from them. The second difference is the ballpark. Weaver pitches perfectly to his ballpark, yielding flyball after flyball, many of which would be HRs in Yankee Stadium, which turn into outs in Anaheim. If Jered Weaver were to pitch regularly in Yankee Stadium, he would either need to alter his gameplan, or be relegated to a middle/back-of-the-rotation starter. If Phil Hughes were to pitch in San Diego, Seattle, or another of the west coast pitcher-friendly parks, he would likely be a bona fide number two starter and frequent All Star. Yes, the ballpark can make a big difference.