A Guide to Sabermetric Research: The Basics

Sabermetrics was first introduced to a wide public in 1982, with the first mass-market publication of the Bill James Baseball Abstract. And for my generation of sabermetricians of a certain age, this was the very first sentence about sabermetrics that we ever read:

“If you sometimes get the feeling between here and the back cover that you are coming in on the middle of a discussion, it is because you are.”

That is: Bill James and a handful of colleagues, mostly SABR members, had been working on a body of knowledge for a few years. There was an established, although private, literature of sabermetrics, and part of James’ task was to explain what had already been discovered, and how.

That was a number of people “who could congregate peacefully in the restrooms in the left field bleachers of Yankee Stadium,” working for a few years, without computers or formal publication. Still, those few researchers had built a considerable base of knowledge that we had to be caught up on.

Imagine, then, the situation today. Sabermetrics has been in full force since the mid-1970s. “By The Numbers,” the SABR Statistical Analysis Committee newsletter, has been publishing since the late 1980s. Before that, there was Baseball Analyst, Bill James’ own sabermetrics journal in the 1980s. With the advent of Rotisserie/Fantasy Baseball, an industry of professional sabermetrics research sprang up. Publications like Baseball Prospectus and Baseball Forecaster do their own proprietary research and publish some of it in their annual books.

And, most importantly, in the past few years, “amateur” sabermetrics has found its stride and, in my opinion, taken over the lead. In the past half-decade, a vast number of researchers have published to websites and blogs, giving us serious, state-of-the-art results that are instantly seen by thousands in the community, who often build on the findings and take them further.

Five years ago, I would have argued that the main outlets for sabermetric research were print publications, and that a few books and websites could bring you reasonably well up to date on what sabermetricians had learned over the years. But, now, things have moved so fast that it’s hard to keep up, especially with articles and papers and studies spread all over the web.

It’s a little like the software industry. In the 1990s, almost all software came shrink-wrapped from retail stores, and most of it was by big industry players, such as Microsoft and IBM. Today that still exists, but with open-source, shareware, file sharing and hundreds of third-party iPhone apps created every year … well, now it takes some effort to keep track.

Still, the basics haven’t changed that much. As with any science, the earliest discovered principles tend to be the most fundamental, and, over time, there gets to be a bit of an unwritten consensus of what findings are most important. So I’m going to do my best here to give you a short reading list of “classical” sabermetrics, a way to try to get a good feel for what sabermetrics has been up to over the past few decades.

This work is three decades old and counting, and it’s getting harder to find. Still, it remains the best place to learn what sabermetrics is, how it works, and how sabermetricians think.

That’s all attributable to Bill James himself. Not only did he make most of the discoveries in the book (there were other sabermetricians active at the time, but James was well over 90% of the field), but his writing style makes the explanations effortless. Anything by Bill James is a joy to read.

If you can’t find the 1982 edition, try whatever other years you can find. Generally, the earlier the year, the more space is devoted to the basics.

Wayne Winston, a professor and consultant to the NBA’s Dallas Mavericks, wrote this 2009 summary of sabermetrics findings in various sports. The baseball section comprises seventeen basic explanations of various sabermetric principles, such as runs created, streakiness and momentum, pitcher evaluation, and situational strategy.

There’s no original baseball research in Mathletics, but if you want a quick and concise introduction to some of the basic findings in the field, this is the book to get.

This book, by sabermetrician Pete Palmer and baseball historian John Thorn, is considered by many to be the “bible” of sabermetrics. I’d consider it a complement to the Bill James books.

While James developed some methods and formulas by trial and error, Palmer mines historical data and shows the theoretical underpinnings of the methods he uses. If you like a more mathematical approach to sabermetrics, this is the work that lays the foundation.

Thorn and Palmer’s book will tell you, for instance, that a leadoff double helps the team by an average of .614 runs. How do they know that? Well, they looked at many years of play-by-play data, and they found that, on average, .454 runs are scored in the average inning. But, with a runner on second an nobody out, an average 1.068 runs were scored. And so, the double is worth the difference between the two situations, which is 0.614 runs.

While the Bill James Baseball Abstract is the philosopher and theoretician of sabermetric thought, The Hidden Game of Baseball is its engineering department.

A collaboration by three exceptional sabermetricians, The Book studies over 100 different questions on baseball strategy. While it does cover topics that have previously been studied by others, it does so, usually, with much greater rigor. For instance, when looking at player performance in various situations, The Book will often correct for park, home/road, the identity of the opposing pitcher, the ball/strike count. As a result, its conclusions are very detailed and very well-considered.

The Book is intended more for fans with a hardcore interest in sabermetrics and strategy issues. It’s included here because it has been so influential among current researchers, and you will see its ways of thinking, especially as described in Chapter 1, repeatedly surface in emerging research.

If the three books above comprise the reading list for Sabermetrics 101, then The Book could be the text for Sabermetrics 301 or 401.