What are little runs made of?

Sports Illustrated’s newest hire seems to be holding auditions for America’s Next Top Offensive Metric. So far his candidates are Hitting Average and Production, both based on Tom Tango’s Linear Weights Ratio. (Hitting Average, in fact, is just LWR with a new name slapped on it, based on the premise that the casual fan won’t like calling something Linear Weights Ratio.)

What about runs batted in?

Every so often you will encounter someone who looks at this whole business of estimating runs and asks, “Why not just use runs and RBIs? Those are real runs, not all this nonsense!”

And at the team level, this is essentially true; outside of a few oddities (like a runner scoring on a double play ball or an error) a team’s RBIs and runs scored will be equal. So yes, at the team level everything works out. But the process by which runs batted in are assigned to individual players is not necessarily any more “real” than other methods we devise. And that process is based upon a critical misunderstanding of the nature of baseball.

The problem is that by awarding a run scored and a run batted in for every team run scored, it treats every run (except for when the batter scores on his own home run) as though it was contributed by only two players, and that each player contributed equally to that run. Both of those assumptions are pretty obviously not true, at least most the time.

As for for instance, let’s say that the first batter of an inning reaches base and the second batter drives him in. The first batter gets a run scored, the second gets an RBI (for that run). Now this is true if:

The first batter gets hit by a pitch and the second batter homers.

The first batter gets a single and steals second and the second batter gets a single.

The first batter hits a triple and the second batter hits a sacrifice fly.

Does anyone really think that those two batters are equally responsible for that one run, in all of those cases?

And that considers only two batters. Of course oftentimes the player who drives in a run does not bat immediately after the batter who scores the run. And of course none of those batters are given any credit whatsoever for that run scoring. Using RBI logic, a player who strikes out is just as valuable as one who sac bunts a guy over to second or one who hits a single and advances the runner to third. Again, exactly how does that make sense?

There is also another way that a player can indirectly participate in a run being scored, and that is by avoiding outs. A team has a limited number of outs but, so long as they do not expend all of their outs they have a theoretically inexhaustable supply of plate appearances. A player who goes 4-for-4 in a game can make a positive contribution to his team’s total runs scored even if he never advances a runner and never comes around to score, simply by securing more plate appearances (and thus more scoring chances) for his teammates.

This is what I want to emphasize is that the process of awarding runs scored and runs batted in is simply a model for crediting the team run scoring process to individual players, and it’s a flawed model at that. It is not “reality” in the way that some anti-sabermetricians would have you believe.

So let’s construct another model, shall we?

Bases and outs

The two fundamental units of run scoring are bases and outs. Essentially, you try to advance as many bases as you can before you use up all of your outs. Now of course we’re used to the idea of bases – a single is one base, a double is two bases, a triple is three bases, a home run is four bases. And it takes four bases to score one run.

The problem with this logic, as it’s expressed in calculations like slugging average, is that this only considers the bases advanced by the batter-runner. When we are trying to break down how runs score, we need to look at the advancement of all runners.

First, let’s look at how often an event occurs with runners on base, for the years 1993-2008:

Event

RUN1

RUN2

RUN3

Out

32%

20%

10%

K

29%

21%

11%

NIBB

28%

24%

12%

IBB

3%

83%

52%

HBP

33%

24%

15%

ROE

35%

23%

12%

FC

47%

52%

51%

1B

33%

19%

11%

2B

31%

20%

11%

3B

30%

22%

12%

HR

32%

19%

10%

Some of those are pretty obvious: someone has to be on base for there to be a fielder’s choice, for instance. And an intentional walk almost never occurs with a runner on first base. But for the most part, those numbers are pretty consistent from event to event. From here on out, when I present values they will be under the assumption that a runner is on that base (unless noted otherwise).

Now, here’s a chart on the average number of bases a runner advances based upon the different events:

EVENT

BATTER

RUN1

RUN2

RUN3

Out

0.05

0.13

0.26

0.28

K

0.00

0.03

0.01

0.01

NIBB

1.00

1.00

0.40

0.19

IBB

1.00

1.00

0.01

0.00

HBP

1.00

1.00

0.47

0.22

ROE

1.17

1.37

1.29

0.95

FC

1.10

0.82

0.46

0.26

1B

1.02

1.29

1.56

0.99

2B

2.00

2.37

1.99

1.00

3B

2.99

3.00

2.00

1.00

HR

4.00

3.00

2.00

1.00

What I really want to draw your attention to is the times when a batter gets a hit. The values for the batter are what we expect – a home run is four times as valuable as a single. But for runners on third base, there’s practically no difference between a home run and a single. A home run is only about 1.25 times as valuable as a single for a runner on second, not four times as valuable.

[And if you were wondering – the fractional values for the batting events reflect things like a runner getting a single and advancing to second on an error.]

Now if we take the total bases advanced and divide by four, we’ll get a figure in excess of total team runs scored. We need to account for runners who are out on base and runners stranded at the end of an inning. Let’s start by looking at the outs per event, broken down by base:

EVENT

BATTER

RUN1

RUN2

RUN3

Out

0.95

0.28

0.03

0.02

K

1.00

0.02

0.00

0.00

NIBB

0.00

0.00

0.00

0.00

IBB

0.00

0.00

0.00

0.00

HBP

0.00

0.00

0.00

0.00

XI

0.00

0.00

0.00

0.00

ROE

0.01

0.01

0.01

0.00

FC

0.01

0.30

0.46

0.67

1B

0.01

0.01

0.03

0.00

2B

0.01

0.03

0.00

0.00

3B

0.00

0.00

0.00

0.00

HR

0.00

0.00

0.00

0.00

Pretty much what you’d expect, isn’t it? Sometimes a batter is out on a single getting thrown out trying to leg out a double, but that sort of thing is rare enough that you could typically ignore it. Now, if a runner on third is out on base, that’s three bases lost. You have to subtract those values from your bases advanced.

We also need to account for runners left on base to end an inning. What I did was count a runner left on third as three, a runner left on second as two, and a runner left on first as one. For the time period in question, there were an average of 1.48 “bases” stranded at the end of an inning. Given three outs per inning, that works out to a half base per out.

Now, let’s put it all together:

EVENT

BASES

OOB

OUTS

NET_BASE

LWTS

Out

0.17

-0.07

-0.50

-0.39

-0.10

K

0.02

0.00

-0.49

-0.48

-0.12

NIBB

1.40

0.00

0.00

1.40

0.35

IBB

1.03

0.00

0.00

1.03

0.26

HBP

1.48

0.00

0.00

1.48

0.37

ROE

2.07

-0.04

-0.02

2.01

0.50

FC

1.86

-1.20

-0.32

0.34

0.08

1B

1.85

-0.12

-0.04

1.69

0.42

2B

3.24

-0.07

-0.04

3.13

0.78

3B

4.45

0.00

0.00

4.45

1.11

HR

5.41

0.00

0.00

5.41

1.35

The first column represents the average number of bases gained by the batter and baserunners (this time, accounting for how often a batter is on base). The second column represents bases lost to an out on base. The third represents the “inning ending” bases lost. Then the fourth column is the sum of those three columns—in other words, net bases.

Then a funny thing happens: you divide that by four and, ta-da, it’s linear weights! (To be precise, linear weights in absolute runs.) In other words, we can use this logic to predict (pretty accurately) team runs scored, and then apply that model to determine an individual’s contribution to his team’s runs scored.

The missing element? Playing time. If we take two guys with 200 net bases (or 50 runs produced) and compare them, the guy who made the least outs in the process is more valuable. (There are some linear weights implementations, like wRC on Fangraphs, that account for this; in those cases, the correct comparison is the number of PAs.)

Odds and ends

Of course, this is not a typical way of figuring out linear weights. There are a few simplifying assumptions made, which is exactly why I thought it would be useful as a way to present the idea to people who are uncomfortable with exactly where the figures are coming from.

And now people are going to say, “But we still don’t have real runs!” And you would be correct; what we have is run values that presume a hitter batted essentially the same with runners on and with the bags empty. But it is certainly possible to apply the logic of linear weights in a way that preserves how a batter actually did in various situations; FanGraphs presents those values (relative to a league average player) as RE24 on its player pages. If you want to know how many a runs a hitter is really responsible for, past tense, look there.

A personal note

This seemed a fitting topic for this week, as the idea of linear weights and run estimation has dominated my sabermetric work (it’s what got me into last year’s Annual and thus an invitation to become a regular writer here). Of course that makes it a fitting topic for any week, but this week is notable in that it’s my last regular article for THT for the foreseeable future. Certainly there will be irregular articles going forward, but I’m going back to college and I can’t commit to THT as much as I might like to.

This isn’t goodbye, of course, but just the same I’d like to thank some of the people who helped me get to this point: definitely Pizza Cutter for inviting me to join StatSpeak, Dave Studenmund for giving me the opportunity to come here to THT and my editor Bryan for putting up with my systemic inability to meet deadlines. And of course I’d like to thank all of my devoted readers for following me this long, even after it became clear I was doing nothing to help you win your fantasy league.

And again, I’ll still be around, just less regularly. Until then, take care of yourselves, and estimate your runs responsibly.

References & ResourcesThe information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at “www.retrosheet.org”.

Comments

Thanks for explaining Linear Weights, I’ve always wondered how that was done.

I consider myself to be relatively well versed and familiar with statistics, but I really don’t understand the first table. When I see percentages of times an event happens, I look to see where things add up 100%. I don’t see how I would do that. I understand that some of them overlaps, but at some level there should be something adding up to 100% if we are looking at how often an event happens. What am I missing in understanding what this table is showing?

OGC: I had the same problem at first. I believe the columns tells you how often a runner is on that base when the event occurs. So, when a HR is hit, 32% of the time a runner is on 1B, 19% of the time there’s a runner on 2B, 10% of the time there’s runner on 3B. These categories overlap (mult baserunners), thus can add across to more than 100%.

Rather than saying “let’s look at how often an event occurs with runners on base,” it would be clearer to say “how often runners are on base when an event occurs.”

Congratulations on college! I’ll miss your articles; hopefully you’ll still have time to get one in now and again.

Is it possible to give a brief explanation of the advantages of runs/PA vs. runs/outs? I remember reading that with the latter, you run into distortions with very high OBP players like Bonds. Is that pretty much the reason?

I went from James and runs created and rc/27 to this era, and missed the reason for the change in denominator.

Ah, my longtime partner on THT Thursday nights…TGIT won’t be the same without you. Your bleeding-edge analysis allowed me to carve a niche in safe harbor hours, but now they’re moving me to late night, and you know how rough the crowd can be.

Thanks for giving meat to the readers when I was dithering about in Historyville (which is every time I write, really). One day, we’ll look on this as the Golden Age of Thursdays…

Puck – I don’t know that there really are any practical advantages to R/PA versus R/O. You just can’t use the same linear weights values for the one as the other. But neither one is “right,” you just have to make sure you’re consistent in your approach.

The main reason that R/PA has become more popular is probably because it seems more familiar – we’re used to talking about how many times a hitter comes up to bat. It’s harder for us to conceptualize playing time based on the number of outs made. But that doesn’t mean it’s more or less correct.

As far as Bonds, I don’t know if I’d call those “distortions.” He really did create so many more plate appearances for his teammates, and that really was incredibly valuable.

I think a per-out scale is something of a distortion. If a player gets on base every time he bats in a game he doesn’t get to bat forever because his teammates still make outs. He earns himself about a half of an extra PA compared to someone that has an 0-fer. Similarly, Bonds getting on base half the time for a season doesn’t get 6 PAs per game. A per-out scale gives Bonds credit for getting on base in a world where all his teammates were him as well.