Monday, January 05, 2015

Filling The Gap Between RBI And Runs - 2014

Angels second baseman Howie Kendrick led the AL in Runs Assisted in 2014(Photo Credit: James Squire/ Getty Images)Imagine the following scenario. Tigers slow-footed designated hitter Victor Martinez leads off an inning with a single and is removed for speedy pinch runner Rajai Davis. Third baseman Nick Castellanos doubles Davis to third and Davis eventually scores on a weak grounder by Alex Avila. This sequence goes into the books as a run scored for Davis and an RBI for Avila, but Martinez and Castellanos get no credit for the team scoring a run despite contributing important hits.

To the best of my knowledge, this kind of run participation by Martinez and Hunter described above is not publicly tracked like runs scored and RBI. My goal is to track this run involvement for all players with the help of play-by-play data at Retrosheet.org. I want to account for every instance of a player helping to create a run, whether it be a run scored, run batted in or an indirect contribution for all games where play-by-play data are available.

Limitations of Runs Scored and RBIThe above example illustrates that the runs scored and RBI statistics do not always give players the credit they deserve for participation in run scoring, but that is not their only limitation. Many analysts eschew these metrics because they measure things that are, to some extent, out of control of the individual batter. Unless a batter hits a home run or steals home, he needs teammates to help him score runs. Even a relatively poor base runner will score a lot of runs if he gets on base frequently and has good hitters behind him. Who bats behind him in the line-up is as important as base running skill in determining how many runs a player will score.

The RBI statistic has similar limitations to runs scored. Unless he smacks a home run, a player needs teammates on base in order to drive in runs. If a player has hitters batting in front of him who frequently get on base, then he is more likely to drive in runs than if he has weaker hitters setting him up. Thus, a player on a good hitting team has more chances to drive in runs than a player on a poor hitting team.

A batter’s position in his line-up also influences his runs scored and RBI totals. For example, a lead-off hitter usually has fewer opportunities to drive home runs than a clean-up hitter, since the generally weaker 7-8-9 hitters bat in front of him. The RBI leaders at the end of a season are as likely to be the players with the most opportunities as the players most proficient at hitting with men on base.

Many mathematically-minded fans would like to see RBI and Runs become extinct in favor of statistics, such as on-base percentage, Weighted On-base Percentage (wOBA) and Batting Runs, which isolate a player's contribution from those of his teammates. Despite the shortcomings of these measures however, most traditional fans still like the concreteness of runs scored and RBI. Players like it too which is understandable. A batter does not want to reach base to improve his on-base percentage, but rather to put himself in position to score a run. Moreover, a batter up with a runner in scoring position is not focused on his slugging average, but rather he is thinking about driving in the run.

The Origins of Runs and RBI

The runs scored and RBI statistics both have long histories. Shortly after Alexander Cartwright and the New York Knickerbockers established the first set of modern baseball rules, the first box score appeared in the New York Morning News on October 25, 1845. The only statistics that were included in this box score were hands out (Today, they are simply called “outs”.) and runs for batters. Some of the early baseball writers had ties to cricket, a relative of baseball, and early box scores reflected that association. Hits that did not result in runs were not included because, in cricket, one either scores a point by reaching the opposite wicket or is out.

The runs batted in statistic was recorded in newspapers in 1879 and 1880 and was an official statistic in the National League in 1891. However, fans complained that the measure was unfair to leadoff batters and too dependent on opportunity and it was quickly dropped. Ernie Lanigan, an important baseball statistician in the early 20th century, personally tracked runs batted in and included the statistic in New York Press box scores starting in 1907. It became an official statistic again in 1920 under the name, “Runs Responsible For”. The RBI statistic gradually gained acceptance and eventually became even more popular than the runs scored metric.

Runs Assisted

Because of their extensive history and their popularity with fans, media and players, the runs scored and RBI metrics are not going to disappear as some in the sabermetric world would like. I would argue that they really shouldn't be eliminated altogether even from the sabermetric community. While they should not be used as overarching player evaluation measures, it is good to know how actual runs were scored along with how they theoretically should have been scored.

If one is going to use actual runs scored in any analysis of players though, it is a good idea to consider the entire run as opposed to the popular practice of just looking at RBI. To that end, the Runs Assisted (or RAS to distinguish it from the pitching metric "Run Average") statistic gives players credit for contributing to runs without a run scored or RBI. Here are the ways a batter can get a Run Assisted:

A batter advances a runner to either second or third with a hit, base on balls, hit batsmen, error, sacrifice bunt, or another kind of out. If that runner then scores either during the same at bat or an ensuing at bat, the batter who advanced him is given a Run Assisted.

A batter reaches base and is removed for a pinch runner or is replaced by another runner on a force out. If the new runner then scores, the batter who originally reached base is given a Run Assisted.

The 2014 American League Runs Assisted Leaders are listed in Table 1 below. Angels second baseman Howie Kendrick led the league with 68 Runs Assisted. Kendrick assisted runs on the following events:

34 hits (H)

11 walks (BB)

1 hit batsman (HBP)

3 times reached on errors (ROE)

1 sacrifice bunt (SH)

13 outs (OUT)

3 Removed from bases due to force out or pinch runner and new runner scored (RR)

The information used here was obtained free of charge from and is copyrighted by Retrosheet.

Runs Participated In

The addition of Runs Assisted allows us to expand the Runs Participated In (RPI) measure. The current RPI definition is the number of runs to which a player made a direct contribution. It is calculated by adding runs scored and RBI and then subtracting home runs:

RPI = RS + RBI - HR

RPI was first introduced as runs produced in the 1950s by Sports Illustrated writer Bob Creamer but was more recently renamed RPI by Tom Tango. If Kendrick doubles and then scores on a single by Erick Aybar, neither player actually produces the run by himself. Both participate in creating the run but neither is 100% responsible for producing the run. Thus, the name “runs participated in” is more appropriate than "runs produced". Home runs are subtracted in the RPI formula, so that a player does not get credit for two runs (an RBI and a run scored) when he only participated in one team run.

Adding Runs Assisted to the RPI formula yields:

RPI = RS + RBI + RAS - HR

One might question whether a Run Assisted should count as much as a run scored or an RBI since it is more likely to also produce an out. I would guess that a player getting an assist typically contributes less to the run than a player with a run scored or RBI, (although the opening example shows that is not always the case). More complicated statistics involving linear weights are better for answering that question. By definition, runs scored, RBI and Runs Assisted will count the same in the Runs Participated In measure..

Also, remember that RPI does not address the biases of runs scored and RBI (and RAS for that matter). It is still the case that some players have more opportunities to contribute to runs based on their teammates and batting order position. RPI is not a replacement for something like Batting Runs, but rather a simple alternative for those that prefer to look at actual runs scored.

Keeping the above caveats in mind, the American League RPI Leaders are listed in Table 2 below. AL MVP winner Mike Trout led the league with 234 RPI. followed by Tigers slugger Miguel Cabrera (230) and Blue outfielder Jose Bautista (223). Other Tigers among the leaders were Kinsler (222) and Victor Martinez (213).

Table 2: AL Runs Participated In Leaders, 2014

Player

Team

PA

R

RBI

RAS

HR

RPI

Mike
Trout

ANA

705

115

111

44

36

234

Miguel
Cabrera

DET

685

101

109

45

25

230

Jose
Bautista

TOR

673

101

103

54

35

223

Ian
Kinsler

DET

726

100

92

47

17

222

Howard
Kendrick

ANA

674

85

75

68

7

221

Josh
Donaldson

OAK

695

93

98

56

29

218

Victor
Martinez

DET

641

87

103

55

32

213

Brian
Dozier

MIN

707

112

71

53

23

213

Michael
Brantley

CLE

676

94

97

42

20

213

Albert
Pujols

ANA

695

89

105

42

28

208

Erick
Aybar

ANA

642

77

68

68

7

206

Evan
Longoria

TBA

700

83

91

50

22

202

Adam
Jones

BAL

682

88

96

44

29

199

Alexei Ramirez

CHA

657

82

74

50

15

191

Melky
Cabrera

TOR

621

81

73

52

16

190

The information used here was obtained free of charge from and is copyrighted by Retrosheet.

8 comments:

A nice stat and a fascinating chart, Lee. It occurs to me that it has a lot of the same issues as the old RBI and Runs stat though; you actually allude to this. If you play on a team with lots of good hitters in front of you and behind you, you are going to be involved in a lot of runs scored. Kinsler, it seems to me, did not have a particularly good offensive season, in particular a disappointing second half. He is among the leaders in Runs Assisted in large part because he had two likely future Hall-of-Famers batting behind him.

Love the post! Kinsler seems even more valuable to me now. How did you get the info from Retrosheet to the player? If you do it by hand, it'd take a long time. Do you use a formula or coding of some sort?

SAS is very expensive. I get it for free because we use it at work. R is free. There is a book analyzing baseball data with R by Max Marchi and Jim Albert. If you don't have much experience programming though, starting with excel is a good idea.