Smart Baseball: The Story Behind the Old Stats That Are Ruining the Game, the New Ones That Are Running It, and the Right Way to Think About Baseball

Ikhtisar

Predictably Irrational meets Moneyball in ESPN veteran writer and statistical analyst Keith Law’s iconoclastic look at the numbers game of baseball, proving why some of the most trusted stats are surprisingly wrong, explaining what numbers actually work, and exploring what the rise of Big Data means for the future of the sport.

For decades, statistics such as batting average, saves recorded, and pitching won-lost records have been used to measure individual players’ and teams’ potential and success. But in the past fifteen years, a revolutionary new standard of measurement—sabermetrics—has been embraced by front offices in Major League Baseball and among fantasy baseball enthusiasts. But while sabermetrics is recognized as being smarter and more accurate, traditionalists, including journalists, fans, and managers, stubbornly believe that the "old" way—a combination of outdated numbers and "gut" instinct—is still the best way. Baseball, they argue, should be run by people, not by numbers.?

In this informative and provocative book, teh renowned ESPN analyst and senior baseball writer demolishes a century’s worth of accepted wisdom, making the definitive case against the long-established view. Armed with concrete examples from different eras of baseball history, logic, a little math, and lively commentary, he shows how the allegiance to these numbers—dating back to the beginning of the professional game—is firmly rooted not in accuracy or success, but in baseball’s irrational adherence to tradition.

While Law gores sacred cows, from clutch performers to RBIs to the infamous save rule, he also demystifies sabermetrics, explaining what these "new" numbers really are and why they’re vital. He also considers the game’s future, examining how teams are using Data—from PhDs to sophisticated statistical databases—to build future rosters; changes that will transform baseball and all of professional sports.

Pratinjau Buku

Smart Baseball - Keith Law

Publisher

Introduction

Like many of you, I imagine, I grew up in a Pleasantville-esque world of baseball statistics, where everything you might want to know about a baseball player was displayed in tabular form on the back of his baseball card (until you destroyed it by flipping it against a brick wall or sticking it in the spokes of a bicycle tire). A hitter’s home runs, average, and RBI were right there, along with the obscure and intimidating OBP and SLG. A pitcher’s won-lost record, saves, and ERA were shown, along with strikeouts, innings, and the undefined GS, which for a few of my elementary school years I could only assume meant grand slams, which never made mathematical sense to me. (It means games started.) I was born in 1973 and the eighties were my formative years as a fan. For most of that time, it didn’t even occur to me that there might be more information out there to learn about players’ performances, or that the stats were birthed from the sanctum sanctorum of baseball accounting. This was what there was, and if it was good enough for Topps and Newsday and WPIX, it was good enough for me.

Of course, there came a point where I realized that these stats weren’t doing a particularly good job of telling me what was happening on the field or helping me predict what players might do in the future. I played fantasy baseball for thirteen years, from my senior year of high school (1990) until my first year as a front office employee of the Toronto Blue Jays, and in the first few years of playing, I was awful at it. I founded the league and finished dead last. I thought being good at math gave me some kind of advantage at the game, but it turned out it gave me a lot of false confidence and nothing else.

Eventually, the desire to be better at a frivolous endeavor—we never played for money—drove me to seek out some new perspectives on baseball, which led me to the small but active online sabermetrics community of the time, and eventually to books like Baseball Prospectus and works by Bill James and Eddie Epstein. None of these were specifically guides to statistics, but they all looked at the game a different way, often incorporating new stats—James was a sort of Thomas Edison of the field, generating new stats as easily as most of us breathe—to tell the reader something new about a player. The more I read, the more I wanted to read. Baseball had always been my favorite sport, likely because it was my parents’ favorite and my grandmother’s as well, but now I could watch and follow the sport with a totally new set of eyes.

In the twenty years since I first wrote my first public piece on baseball, in 1996, the field of baseball analysis has undergone a quantum-state change, going from one or two consultants providing statistical insight to a handful of interested teams to all thirty clubs employing departments of full-time quants. Where media coverage of baseball in the 1990s was homogeneous in people and in content, today it is exploding with diversity of faces, voices, and opinions. This revolution has had, at its heart, the rising adoption of statistical analysis within and around the game. If you said OBP was better than batting average in 1996, you’d be looked at as if you were a little strange. If you say it now, you’ll be asked why you’re not looking at wOBA or wRC+ instead.

Why has baseball as an industry, including the media covering it and the fans who follow it, stuck with outdated statistics for so long? The answer is largely a giant appeal to tradition, a common type of fallacious argument that says we should keep doing it this way because we’ve always done it this way. Baseball has always suffered from a sort of inertia. Whether it’s about the rules of the game, replay, or the unwritten code of player behavior, old ideas are hard to unseat. For too long people have put faith in old numbers and stats precisely because they’re old; these are the numbers that the baseball gods graced us with all those years ago, so we must follow them—even if there are numbers out there that actually work better. A game with a century and a half’s worth of history has a hard time escaping the gravitational pull of that past.

The fact that baseball’s irrational reliance on tradition, gut instinct, and flawed stats continued even as better stats became widely available to everyone isn’t just an academic concern. Because allegiance to these old stats is not rooted in accuracy or success, people who’ve repeatedly failed at their jobs are often given new opportunities to fail some more. Using the wrong measurements has resulted in bad decisions on contracts, playing time, trades, and draft picks. It’s led the voters in the Baseball Writers’ Association of America (BBWAA) to pick the wrong players for MVP, Cy Young, and Rookie of the Year awards, and often to screw up even obvious stuff like which players to put in the Hall of Fame. It drives conversations around teams and players—often driving the conversations right off a cliff. Even now, in 2017, you will still hear broadcasters refer to and rely on outdated or flat-out useless statistics to try to analyze what’s happening on the field, to advocate bad strategies, or to praise a player for doing something that actually wasn’t very good. This isn’t just an issue for Major League Baseball (MLB)—it’s a problem at all levels of the game. Just go to a college or high school baseball game and watch the bench empty as players rush to congratulate a hitter who just advanced a runner via a bunt or an out. Yay! We’re in worse shape than we were a few pitches ago!

But even as commentators, managers, writers, and talking heads have resisted the statistical sea change of the last decade, most front offices around the league have long recognized that these and other numbers lie at the heart of the game precisely because they work better. They describe in-game events with greater accuracy and they predict what players will do in the future with greater accuracy. Baseball might be a sport fueled by nostalgia and adherence to the past, but no team wants to go back to a time when they used to lose more often.

As such, teams evaluate players substantially differently today than they did in 2000, and it’s time that we as journalists, bloggers, and fans adapt. For that to happen, the conversation has to go beyond merely pointing out that batting average and the pitcher win are bad, into a discussion of what stats are better, allowing us to reframe how we discuss player performance. Communicating that is my main goal in writing this book. (Also money. But mostly communicating that.)

The world of baseball is changing, and has been for some time now, but the mainstream discussion and coverage of the sport has lagged behind the changes within major-league team operations. You’ll still read elegies to the pitcher win in your local paper, arguments that poor defenders are actually great because they don’t make errors, and managers are brilliant for employing small ball tactics that lead to fewer runs. There’s no reason on earth for any baseball fan to cling to old, anachronistic, or disproven notions like these. I coined the Twitter term #smrtbaseball a few years ago, an homage to a Simpsons joke, to refer to managerial moves and executive comments that were, in fact, the opposite of smart. I’ve restored the a to smart here because the point of this book is to try to educate the reader on the way front offices look at player statistics and valuation today, and where their thinking is likely to head in the future.

As 2016 drew to a close, Major League Baseball was coming off one of its most successful postseasons ever, one full of drama, narratives, and rising young stars, where the Chicago Cubs managed to end the longest championship drought in US professional sports, and did so in no small part because they went from also-rans in the stats department to industry leaders. You couldn’t watch or follow the 2016 playoffs without noticing, reading, or hearing about the statistical revolution—players’ Wins Above Replacement values, defensive positioning, advanced fielding metrics like Ultimate Zone Rating, and the use of leverage to determine when to use your best reliever. This was unthinkable when I first started dabbling in baseball analysis in my early twenties. It’s now standard, with every MLB owner who wasn’t already on board looking at the most successful teams the last few seasons and realizing that if they didn’t add this capability in-house they’d only fall further behind their direct competitors.

You don’t have to understand FIP or dRS or exit velocity to enjoy a baseball game or follow a team. Granted, there are folks out there who’ll make you feel like you have to—I’m sure I’ve been guilty of that a few times—but the truth is you don’t need to know all this. It will make you a more educated fan, and to me, becoming educated makes me enjoy the game even more. It will help you when you hear your team made a trade or a signing and you don’t immediately get why they did it. It will help you understand a pitching change or a decision to bunt or bring the infield in—or maybe help you question it. And with coverage of every aspect of the sport, from games to transactions to postseason awards to the Hall of Fame, now suffused with the vernacular of sabermetrics, it’ll help you keep up with all of the great content being written and spoken about our national pastime.

Smart Baseball is, more than anything else, a book for the reader. If we were sitting at a game together—something I’ve done with a handful of fans over the years—and you asked me why the save statistic is a travesty on the order of the Alien & Sedition Acts, or what I’m looking for when I scout a player in person, this book gives you the monologue version of the conversation we’d have.

I try to build up from zero here, assuming you come into this book without knowledge of advanced statistics, or that you come into it knowing some stats are bad and some are good but would like a rational explanation of why. In Part One, I cover most of the traditional statistics that just don’t tell us what they purport to tell us. RBI, batting average, wins, saves—they’re a bunch of filthy liars, really, and they’ve been lying to us for decades now. In Part Two, I work my way up through some better traditional statistics, like on-base percentage (OBP), on my way to discussing entirely new stats that show how teams and analysts try to value a player’s production. If you want to pay for a player, first you have to know what he’s worth, and to do that, you have to know how much baseball value he produced. In Part Three, I apply these concepts to Hall of Fame debates, explain how traditional scouting works and is changing in light of new data, and discuss the MLB Statcast product, an entirely new stream of data that dwarfs anything teams have worked with previously. The future of baseball analysis revolves around Statcast, which has the potential to change the way teams look at everything from contracts to scouting to player development to keeping players (especially pitchers) healthy.

Sabermetrics is baseball math, but I’ve tried to keep the math in this book to a bare minimum. This isn’t a manual to build your own, better sabermetric mousetrap, although I won’t discourage you from trying; this is about a new way of thinking about the game, a general philosophy of player valuation and evaluation that over the last fifteen years has gone from the lunatic fringe to the predominant way of thinking. Every MLB team has made or is making statistical analysis a core part of its baseball decision-making process, and the effects of this revolution were all over the 2016 postseason, from Cleveland’s unconventional use of closer Andrew Miller to the World Champion Chicago Cubs exploiting new data to become defensive wizards. Even if you just want to follow the conversation around the game, it will help to know where we came from and where the world of baseball statistics is going. This book will take you there.

PART ONE

Smrt Baseball

1

Below Average:

The Fundamental Flaws of Batting Average

The language of baseball is built around some of its most basic statistics. Batting average, the simple division of a hitter’s hits recorded by the number of at bats he had, is the foundation of baseball’s batting title. The player in each league with the highest batting average is named the batting champion. When hitters retire, we count their batting titles and compare them to other batting champions’ totals. We revere the lifetime .300 hitter as if he ascended to a higher plane of existence than the mere .299 hitter. But the batting title and the stat behind it are both guilty of telling us half-truths, giving us a less-than-complete story of the hitter’s performance.

Consider the descriptions found on the plaques for these Hall of Famers:

An artisan with a bat whose daily pursuit of excellence produced a .338 lifetime batting average, 3141 hits, and a National League record–tying eight batting titles . . .

—Hall of Fame plaque for Tony Gwynn

A five-time batting champion who also led the league in on-base percentage and intentional walks six times each . . .

—Hall of Fame plaque for Wade Boggs

Led American League in batting twelve times . . .

—Hall of Fame plaque for Ty Cobb

Accomplished as these players were, their lionization in Cooperstown ignores a crucial question: If you’re only leading the league in batting average, one flawed and incomplete stat, should we really say you led the league in batting? Are you the batting champion if other players hit better than you did?

Batting average has been at the top of the heap of hitter stats for as long as hitters have been putting bat to ball. The English-American statistician Henry Chadwick is credited with creating batting average (among many other common baseball stats) in the late 1800s, designing it along the lines of cricket’s version of batting average, which is runs divided by outs. Baseball in the nineteenth century resembled today’s game, but had several significant differences, such as times when batters could tell the pitcher where they wanted the ball thrown, or periods where the number of balls required for a walk or strikes required for a strikeout varied from today’s 4 and 3. Hitting the ball over the fence for a home run was rare—in 1895, the National League leader in home runs had 18—as most hitters were just trying to put the ball in play. So, at the time, Chadwick’s idea had merit: when batters rarely walk and are focused on making contact, hits divided by at bats probably is a good measure of their performance.

Batting average today still has some value, albeit a limited one; batting average’s primary problem is one of marketing. If batting average were content with second-tier statistical duty, to impart some small amount of information, without claiming to be the be-all and end-all of hitting statistics, then it would probably fly under the radar without attracting much notice from traditionalists or statheads.

Ah, but when you claim to be the King of All Stats and fail to deliver, then you have earned my ire—and that of analysts and executives around the sport, who now recognize that you can get all the information batting average is supposed to give you in other, more complete, less flawed statistics. So while we still celebrate the player who won the batting title or led the league in hitting for having the highest batting average in the league, the stat itself has been falling out of favor for twenty years already—and its decline is only accelerating.

All this history may be impressive, but it obscures what batting average actually tells you. Batting average is a simple calculation any third grader could do—take a player’s hits, divide it by that player’s at bats, and round to three digits. That’s batting average, and while in tiny samples it can range from .000 to 1.000, in the modern era batting averages have typically fallen in the .200 to .400 range. In the five seasons from 2011 to 2015, no player qualified for the batting title with an average above .350, and only two players even cracked .340 (Jose Altuve once, Miguel Cabrera twice).

Did you notice that odd phrase in there—qualified for the batting title? Because batting average is a rate stat, a statistic that measures something per something else—in this case, hits per at bat—MLB sets a minimum threshold to appear on its leaderboards, in this case a reasonable 3.1 plate appearances per team game played. Since a full season for most teams is 162 games played, that means a player must have 503 plate appearances on the year to qualify for the league’s batting title or appear anywhere on the leaderboard.

Yes, but weren’t we talking about at bats a moment ago? What’s this with plate appearances? Indeed, that bait-and-switch exposes batting average’s first major flaw. Batting average doesn’t tell you how often a player gets a hit, but how often he gets a hit ignoring times he draws a walk, gets hit by a pitch, hits a sacrifice fly, makes a successful sacrifice bunt, or reaches via catcher’s interference. Those scenarios don’t count as at bats, but do count as plate appearances. (The first three count for the purposes of on-base percentage, a stat so valuable it will get its own chapter later in the book.)

So why does batting average ignore all of these other events, which in some extreme cases can account for more than a third of a player’s trips to the plate? (Barry Bonds did this twice, in 2002 and 2004, the only MLB player in history whose plate appearances were more than 50 percent higher than his at bats.) Because . . . well, there’s no really good explanation for this. I mentioned above the most likely theory, that when Chadwick created the stat, those other events were rare or just weren’t considered the result of a hitter’s skill or effort, so he chose to omit them. This alone should tell you why using batting average by itself, or even just as your primary metric, to evaluate a hitter leaves out far too much crucial information. Leaving walks drawn, an important skill for a hitter, out of the numerator (just hits) and denominator (at bats), only gives you a portion of the hitter’s season.

The sins of batting average, though, are not just of omission. The numerator is even more flawed than you’d think, because it treats all hits as equal—a single and a home run both carry the same weight in batting average, even though we know they carry substantially different weights in the game.

So what does batting average really tell us about what a hitter did over some period of time? It tells us how often he got a hit in trips to the plate where he didn’t walk or get hit by a pitch or hit a sac fly or bunt or have some other very rare thing that isn’t actually an at bat happen, and it only tells us that he got a hit but not what kind of hit. (Hence the old baseball axiom, often heard after a weakly hit infield single, It’ll look like a line drive in the box score.) It’s a bad tradition, but it has stuck with us for well over a century and still carries undue importance in discussions and evaluations of hitters, especially those who lead the league in batting average because we say they won something. It’s often confusing because hitters who hit for high batting averages are generally good hitters, period; we’re not getting totally false information from the stat, but we’re misled by its false precision, acting as if going to the third decimal place is a summary judgment on the player. To see the full extent of the flaws in batting average, it helps to compare it to stats that are better equipped.

One basic statistical tool I’ll use often in this book is correlation analysis, where I compare two columns of data to each other and get a number between 0 and 1 that tells us how strongly correlated the two are—that is, how much the two columns move together, 0 meaning no correlation at all, 1 meaning perfect correlation. The higher the number, the greater the correlation between the two stats, meaning that when stat A moves, stat B moves more with it. This does not mean A causes B or B causes A; you’ve probably heard someone say correlation does not prove causation at some point, because all a correlation analysis can tell is whether two statistics appear to be related. It could be a direct cause and effect, and it could be coincidence, but this tool only tells us to what extent the two numbers move together. In this book, I will often refer to a correlation between two statistics by saying that one predicts the other.

In the table below, I used MLB team stats from the five seasons from 2011 through 2015 to show the correlations between four commonly used hitter-rate stats at the team level and those teams’ runs-scored-per-game figures:

Batting average correlates pretty well to runs scored, at about 75 percent—while this doesn’t show causation, it stands to reason that if a team as a whole is getting more hits during its (arbitrarily, narrowly defined) at bats, it will score more runs. But batting average fares worse compared to the two other common rate stats used for evaluating hitters: on-base percentage and slugging percentage. On-base percentage, or OBP, does just what it claims to do, taking the times a hitter reached base safely, dividing it by all plate appearances other than sacrifice bunts and times reached on interference, and giving the frequency with which the hitter gets on base. A hitter with a .400 OBP, which would put him above the league leaders, reached base in 40 percent of those plate appearances, meaning he made an out of some sort in the other 60 percent. Of all basic batting stats—those you might find on the back of a baseball card or on the stats you find in a game program—OBP is probably the most important for telling you about a hitter’s ability to produce.

Slugging percentage is calculated like batting average but no longer treats all hits as equal. The denominator (the bottom of the fraction) remains at bats, but the numerator changes from hits to total bases. A single counts for one total base, a double two, a triple three, and a home run four. This isn’t an accurate reflection of their relative values; a home run isn’t worth four times as much to an offense as a single, but something like twice as much. It does create some needed separation between hit types, however, and you can see that it correlates extremely well to runs scored at the team level. If you hit for more power, you’re going to score more runs. (In fact, home runs per plate appearance all by itself has a coefficient of correlation of 0.623 with runs scored over this same sample—ignoring absolutely everything else a team does, home runs still drive a substantial fraction of run-scoring.)

OPS, which stands for On-base Plus Slugging, is a kludge stat, a brute-force addition of OBP and slugging percentage that is deeply flawed at a basic math level, yet it has gained momentum in popular discussions of the sport, including media coverage, because it kind of works: you can see it correlates better with run-scoring than either OBP or slugging do individually. OPS is popular and problematic enough to merit its own section later in the book, but for now, its purpose here is to show how much information is missing from batting average. If these other rate stats correlate better to team run-scoring, and they’re all easily available at the individual player level, what, exactly, is batting average actually good for?

Despite the deficiencies of batting average, the batting champion tag still matters quite a bit within baseball circles, especially where fans and the media are involved. The title features prominently on several Hall of Fame plaques, including the three players cited at the top of the chapter, and becomes a talking point in Hall of Fame elections, but perhaps most important, it’s a primary focus for postseason award balloting and often used as a justification for voting for players who were not in fact the best hitters in the league.

In 2007, Detroit Tigers outfielder Magglio Ordoñez led the American League in batting average at .363, but he wasn’t the best offensive player in the league because he didn’t do enough besides hitting for average. The best offensive player in the league was Alex Rodriguez of the New York Yankees, who led the AL with 54 homers and a .645 slugging percentage, so while he only hit .314, he produced more total value with the bat. He had 26 more home runs than Ordoñez and drew 19 more walks, so the total value of all of his contributions—considering the values of all those hits, walks, and extra bases, compared to the number of outs he produced—exceeded that of Ordoñez, even before we consider things like defense. Rodriguez did win the AL MVP award that year, although two Detroit-based writers, Tom Gage and Jim Hawkins, made the absolutely-not-biased-at-all decision to list Ordoñez, the local player, first on their ballots. Gage specifically cited Ordoñez’s batting average in defending his vote, dismissing home runs as a glamour stat.

Similarly, the Miami Marlins’ Dee Gordon led the National League in batting average in 2015 at .333, but the statistics site Baseball-Reference.com doesn’t list Gordon among the top ten in the National League in Adjusted Batting Runs (ABR) an advanced metric that does just what I described above: assigns weights to different offensive events and adds ’em up. Bryce Harper led the NL in just about everything else, winning the NL MVP award unanimously. (Gage and Hawkins are no longer active award voters, and wouldn’t have voted on a National League award as members of the Detroit chapter.) You can see below just how large the gap between Gordon and Harper was, even though Gordon led Harper in batting average:

Harper got on base more, hit for far more power, and made 75 fewer outs. Gordon’s .003 advantage in batting average turns out to be not just meaningless but outright misleading: these two players were nowhere close to each other in offensive production, so exactly what good is batting average doing for us?

In 1991, Barry Bonds was the best player in the National League by a wide margin, and should have walked away with his second straight NL MVP award. He led the NL in on-base percentage that season, ranked fourth in slugging percentage, and even finished second in runs batted in, a statistic that at the time was a major criterion for MVP voters. But Bonds lost the award to Atlanta’s Terry Pendleton, whose primary achievement that year was leading the league in batting average at .319. Bonds was by far the more valuable hitter; he reached base 29 more times than Pendleton did, despite having 10 fewer plate appearances. They hit for almost identical slugging percentages. Bonds had 3 more homers and stole 33 more bases. Both were

Ulasan

I remember as a kid opening up a pack of baseball cards and spending hours pouring over the statistics on the backs of them. Then arguing with my friends over which players were better. In this book Keith Law first shows us how the statistics we grew up do not do as good of a job as we thought in showing how the players actually performed on the field. Then he introduces us to some new statistics that go beyond the traditional ones and where sabermetrics might be going next. So grab yourself some peanuts and Crack Jack and learn some new ways to debate with your friends over who are the better players and teams!

I am curious about the perceived audience for this book - Law is merciless in his dead-horse beating of batting average and fielding percentage that ordinary fans will be put off, and the converted can only wince. He hits his stride in the last chapters, describing the current state of the game and what might be coming our way in the near future. Worth the read.

This an excellent book for those who either haven't followed the last twenty years of baseball analysis or are still holding on to RBI as their favorite stat. Really does a nice job explaining advances in player evaluation and projection without making you slog through a lot of spreadsheets and tables. Although there are some tables. Nothing new in here for me personally, but I like Keith so I wanted to read it!

'Smart Baseball', by ESPN's Keith Law, is off the charts great if you're a baseball fan and interested in the 'new' stats that are changing, and helping us understand, the game. Law is an ex-scout/front-office guy who is articulate, knowledgeable, and opinionated, and you don't need much more than that. I consider myself highly educated in all things baseball, but I learned a ton from this book. Highly, highly recommended.