19 May 2015

Reviewing Big Data Baseball

Travis Sawchik's Big Data Baseball: Math, Miracles, and the End of a 20-year Losing Streak is the next logical step in the Baseball Sabermetric non-fiction collection, preceded by Michael Lewis' Moneyball and Jonah Keri's The Extra 2%. Those books took meandering walks through concepts devised by the front office and how those were applied in the clubhouse and on the field while noting it was the perfect collection of personnel at the right time to make it all work. With Big Data Baseball, the premise is largely how there is a great deal of data points out there now and the key is knowing not only what to do with them, but, also, how you communicate what you learn to the people who actually step out onto the field. It is a story of how do you turn 10 million dollars into 10 additional wins...and having the right people just at the right time.

Perhaps the heart of this book is Clint Hurdle. He is the Billy Beane of this story. A man broken by the game and on his way who challenges his own convictions about the game. Hurdle overcame his failure as a player (if you can call making the Major Leagues and burning out being a failure) and his difficulties winning as a manager to fully embrace a deeper application of what the Pirates' data analysis department was coming up with. This not only included analytical scouting reports and frequent team meetings, but the actual inclusion of the data science team in the clubhouse and interacting with the players. This is presented as quite revolutionary.

For the non-narrative readers, the pull is by and large the focus on defensive shifts as well as the player development, acquisition, and application of players who fit the style of defensive shifts they are incorporating. After experimenting with minor league clubs, the organization decided to more fully adopt defensive shifting. They are certainly not on an island as other clubs like the Orioles have dedicated themselves to the shift as well. However, it is certainly true that the Pirates are one of the few teams on the tip of the shifting spear. Through the use of shifting, they essentially gained the plurality of those added wins.

The rest of those added wins were made up by Russell Martin who was a Yankees castoff. Martin's ability to pitch frame was able to give many runs back to the team simply by converting a few called balls to called strikes. This helped keep starters longer in the game by keeping hitters in pitcher's counts. It is a concept well-written about here at the Depot and elsewhere. As with shifting, it is also a concept that has become largely mainstream within the game. The part on Martin does get a little loose as the book tries to describe his pitch calling technique as being like Jazz even though Martin hates Jazz. It describes his way of calling pitches as for the pitcher to throw what the batter is not expecting, which is actually quite ordered.

The other aspect of the book that I found a little off was that this was a book about data science, but it did not seem to be edited by anyone well-versed in data science. For instance, a point was made of Pirates pitcher Gerrit Cole and how Cole's father was well studied in baseball analysis. The major point driven in this aside is how his dad instituted an application of the Verducci rule which largely centers on a gradual buildup of total innings year by year in order to prevent arm injuries. What is interesting is that the Verducci effect was initially poorly studied with an apparent confirmation bias. In the years past, it has been resoundingly discredited as being anything useful in application.

Perhaps truthing out the Verducci effect was not the place of this book, but I think it highlights something missing from the book as well as those written by Lewis and Keri. That would be that Science Fails. It fails a lot. It certainly is better than going blind into something, but the marvelous thing about scientific endeavors is that we refine reality as we know it as times moves on. While the book highlights how we have entered a new era of millions upon billions of data points to digest, it fails to note that having a lot of data points can also be problematic and wind up with a great deal of false positives. It is that false positive story that is needed here. Verducci's was a proto-big data false positive.

In the end, what this book does well is deliver a solid narrative with several interesting characters while also introducing many readers to more current thought of data analysis and market inefficiency opportunism. Travis Sawchik is able to take some relatively complicated concepts and provide a soft, inviting touch for less data obsessed readers. We are also quite pleased that former Camden Depot writer, Stuart Wallace, is name dropped in the book as a significant hire as the club moves forward.

Contact Camden Depot

We look forward to your questions as well as any suggestions you may have for us.

Additionally, we are always looking for new contributors, so if you want to write for the Depot then e-mail us with an example column that you think fits the tone of the site.

Contributors

Jon Shepherd - Founder/Editor@CamdenDepotStarted Camden Depot in the summer of 2007. By day, a toxicologist and by night a baseball analyst. His work is largely located on this site, but may pop up over at places like ESPN or Baseball Prospectus.

Matt Kremnitzer - Assistant Editor@mattkremnitzerMatt joined Camden Depot in early 2013. His work has been featured on ESPN SweetSpot and MASNsports.com.

Patrick Dougherty - Writer@pjd0014Patrick joined Camden Depot in the fall of 2015, following two years writing for Baltimore Sports & Life. He is interested in data analysis and forecasting, and cultivates those skills with analysis aimed at improving the performance of the Orioles (should they ever listen).

Nate Delong - Writer@OriolesPGNate created and wrote for Orioles Proving Ground prior to joining Camden Depot in the middle of 2013. His baseball resume includes working as a scorer for Baseball Info Solutions and as a Video Intern for the Baltimore Orioles. His actual resume is much less interesting.

Matt Perez - Writer@FanOfLaundryMatt joined Camden Depot after the 2013 season. He is a data analyst/programmer in his day job and uses those skills to write about the Orioles and other baseball related topics.

Joe Reisel - WriterJoe has followed the Norfolk Tides now for 20 seasons. He currently serves as a Tides GameDay datacaster for milb.com and as a scorer for Baseball Info Solutions (BIS). He is computer programmer/analyst by day.

Joe Wantz - WriterJoe is a baseball and Orioles fanatic. In his spare time, he got his PhD in political science and works in data and analytics in Washington DC.