Wednesday, April 15, 2015

It is in my nature to snark about bad baseball analysis. Maybe more of it is nurture, as much of my early sabermetric reading was the younger Bill James, with later exposure to early BP and other r.s.bb derivatives, where snark was an integral part of the culture.

That is not really intended to be an excuse, although it may well read that way. As I have grown older I believe that I have generally become more aware of how little I actually know, but more consequently to snarking, less interested in engaging. I have lost almost any desire I ever had to evangelize about sabermetrics to the “unwashed masses” (now there’s a snarky, loaded term). Instead I am content to write to my very small audience, which even so is almost entirely based on what I want to write rather than what I think anyone might want to read, and take passive-aggressive potshots on Twitter. This probably still tilts me more towards the jackass side of the scale than the average sabermetrician, but so be it.

Every once in a while, though, I run across something that irks me so much that I have to respond to it in full. Against my better judgment, I feel compelled to draft a polemic in response, even though I know there’s nothing good that can possibly come of it. That is the case with an article that appeared in the Fall 2014 issue of SABR’s The Baseball Research Journal entitled “A New Formula to Predict a Team’s Winning Percentage” and written by Stanley Rothman, Ph.D.

Historically, the quality of sabermetric articles in the BRJ has been a mixed bag. Early BRJ editions included seminal research by pioneers of the field like Pete Palmer and Dick Cramer. Eventually the quality of such articles significantly dropped off, and BRJ was a leading purveyor of the rehashing of bases/X metrics that I rail against , and other equally banal statistical pieces with notable but rare exceptions. (That is particularly amusing since in the heyday of BRJ as a place where sabermetric research was published, Barry Codell introduced Base-Out Percentage, one of only a few times that metric could have been legitimately been said to have been “invented”).

In recent years, the quality of the statistical pieces in BRJ has been significantly improved, so I hope that my mockery of this particular piece is not taken as an indictment of the entirety of the body of work the editors (now Cecilia Tan) have been doing on this front. In fact, the Fall 2014 issue features a couple of sabermetric pieces I enjoyed greatly, both based on Log5 and other predictors of head-to-head matchups (John A. Richards’ piece “Probabilities of Victory in Head-to-Head Matchups” covered the theoretical basis for Log5 and a comparison of Log5 estimates to empirical results, and Matt Haechrel did likewise for individual batter-pitcher matchups in “Matchup Probabilities in Major League Baseball").

Dr. Rothman’s piece is an unfortunate exception. And since I consider myself (perhaps incorrectly so) to be something of a subject-matter expert in winning percentage estimators, I feel compelled to point out areas in which Rothman’s findings bury obvious, well-established principles in a barrage of linear regressions.

Rothman opens his paper by discussing Bill James’ ubiquitous and groundbreaking Pythagorean method, and then asks “Why not just use the quantity (RS-RA) to calculate EXP(W%)”? Why not indeed? This question is never satisfactorily answered in the paper. Nor is it even addressed henceforth.

Rothman proceeds to set up a W% estimator that he christens the Linear Formula as:

EXP(W%) = m*(RS-RA) + b

Note that Rothman’s terms RS and RA are just that--runs scored and runs allowed by a team. Not per game, per inning, or on any other sensible rate basis--raw, unadulterated seasonal totals.

Next, he provides the standard equations for m and b, and makes some simplifying assumptions. His regressions are run separately for each MLB season, so each team’s number of games is 162 (obviously there are some limited and non-material exceptions) and there are 30 observations in each regression (Rothman uses 1998-2012 data in his analysis). After these substitutions, the intercept b is equal to .5 and the slope m is:

m = SUM[(RS - RA)*W%]/SUM[(RS - RA)^2]

Rothman notes that for major league seasons viewed in aggregate, there is a strong correlation between SUM(RS - RA)*W% and SUM(RS - RA)^2, and so he develops a formula to predict the latter from the former:

EXP[SUM(RS - RA)^2] = 1464.4*SUM[(RS - RA)*W%] + 32710

This is substituted into the regression formula for expected W% with the intercept dropped since it has little impact to get the following equation:

EXP(W%) = SUM[(RS - RA)*W%]/{1464.4*SUM[(RS - RA)*W%]}*(RS - RA) + .5

= .000683*(RS - RA) + .5

This is the final formula that Rothman refers to as the Linear Formula. At this point, I will offer a few of my own comments:

1) There is nothing novel about presenting a W% estimator based on some relationship between run differential and W%. The rule of thumb that ten runs equals one win is just that. One of the earliest published W% estimators, from Arnold Soolman, was based on a regression that used RS/G and RA/G as separate variables but could have just as easily used the difference (and the insignificant difference in regression coefficients for the terms back that up).

2) The author’s choice to express this equation on a team-seasonal basis is, frankly, bizarre. It results in the formula being much less easy to apply to anything other than team seasonal totals, and it obscures the nature of the relationship between runs and wins, hiding the fact that this is little different than assuming ten runs per win. If you divide 1464.4 by 162 games/season, you find that the formula implies 9.04 runs per win and would be more conveniently expressed as .1106*(RS - RA)/G + .5.

3) I don’t understand the rationale for using a separate equation for each league-season, then developing a single slope by running another regression of various league quantities. It would be much more straightforward to combine all teams from the data set together and run a regression. Such an approach would also result in a higher R^2 for the team W% estimates. I don’t think that maximizing R^2 should be a paramount in constructing a W% estimator, but in this case I fail to see the advantage of not studying the relationship between runs and wins directly at the team level rather than aggregating team-level regressions across multiple seasons.

Returning to the article, Rothman uses a Chi-Square test on 2013 data to compare the Linear Formula to Pythagorean. Setting aside the silliness of using thirty data points for an accuracy test when hundreds are available, I must give Rothman credit for not using the Linear Formula’s better test statistic to trumpet its superiority--instead he writes that “there is no reason to believe that both of these formulas cannot be used.”

The article than includes a digression on applying this approach to the NBA and NFL. The conclusion and “additional points” sections of the article provide a handful of interesting contentions:

* Rothman suggests that one of the chief advantages of the Linear Formula is that it is “easier for a general manager to understand and use”. The premise is that GMs can use the Linear Formula to calculate the marginal wins from player transactions.

While there is certainly nothing wrong with these types of back of the envelope estimate, this comment would have been less bizarre twenty years ago. Now it seems incredibly naïve to suggest that the majority of major league front offices could improve their planning by using a dumbed down win estimator. It’s hard to determine which is sillier--the notion that front offices that would entertain such analysis would not be using more advanced models (the outcome suggested by which would depend much more on the projection of player performance than how that performance is translated into wins), or the notion that front offices who were so inclined and needed to do back of the envelope calculations would not be able to grasp Pythagorean.

* Apparently referring to the approximation used to derive the multi-year version of the formula above, Rothman asks “Why is there a strong positive correlation between SUM[(RS - RA)^2] and SUM[W%*(RS - RA)] in MLB?”

I might be accused of under-thinking this, but my response is “Why wouldn’t there be?” The key quantity in each sum is run differential. We know that run differential is positively correlated with W% (if it were not, this article would never have been written), so it should follow that the square of run differential (or the square root, the cube, the logarithm, any defined function) should have some relationship to the winning percentage times the run differential. And since the quantities Rothman is comparing are sums on the league level, both should increase as the differences between teams increase (i.e. if all teams were .500 and had zero run differentials, both quantities would be zero. As teams move away from the mean, both quantities increase).

* Rothman notes that if a team’s run differential is greater than 732, than the linear formula will produce an estimated W% in excess of 1.00. “However, this is not a problem because for the years 1998-2012 the maximum value for (RS - RA) is 300.”

Note that Rothman does not discuss the opposite problem, which is that a run differential of -732 will produce an equally implausible negative W%. But the hand-waiving away of this as a potential issue coupled with the posed but unaddressed question “Why not just use the quantity (RS-RA) to calculate EXP(W%)?” is why this article got under my skin.

If Dr. Rothman has taken five seconds to consider the advantages and disadvantages of how to construct a W% estimator, scant evidence of it has manifested itself in his paper (and given as this is a commentary on the paper and not Dr. Rothman himself or whatever unpublished consideration he gave to these matters, that is all I have to go on). There is certainly nothing wrong with experimenting with different estimators, but these experiments should not rise to the level of publication in a printed research journal unless they yield new insight in some way. Nothing in Rothman’s piece did--in fact, given the bizarre manner in which he chose to express the equation, I would suggest that if anything the piece regresses the field’s knowledge on W% estimators.

So allow me the liberty of answering Rothman’s question and the hand-waived problem for him.

Q: Why not use run differential to estimate W%?

A: Because doing so, at least through the simple linear regression approach, does not bound W% between zero and one, does not recognize that the marginal value of runs is variable, and does not recognize that the value of a run is dependent on the scoring environment.

Other than that, it’s great!

“Why not?” is a great reason to experiment, but it’s not a great reason to formally propose a new method (well, really, recycle existing methods, but I’m piling on as it is). There is also nothing wrong with using a model with certain deficiencies that other models avoid, whether due to computation restrictions, ease of use, a lack of deleterious effect for the task at hand, etc. But it should be incumbent on the analyst and the publisher to acknowledge them.

Finally, anyone publishing sabermetric research in this day and age should recognize that whatever new approach you believe you have developed for a common problem (like win estimation, or measuring offensive performance), it’s probably not new at all. This is certainly the case here given the work of Soolman, the rule of thumb that ten runs equals one win, the dynamic runs per win formula used in The Hidden Game of Baseball and Total Baseball by Pete Palmer, and other related approaches. All of these are based on the basic construct W% = m*run differential + b.

Personal anecdote: I don’t remember when this was exactly, maybe when I was in the eighth grade, but in our math class we were learning about linear equations of the form y = mx + b and there was an example in the textbook that showed how one could eyeball a line through a scatterplot and develop the equation for that line. In other words, a manual, poor man’s linear regression.

So I did just that with a few years of team data, plotting run differential per game against W% (I want to say I used 1972-74 data), and came up with W% = .1067*RD + .5. Foolishly, I actually used this for W% estimates for a period of time. Thankfully, I was cognizant that it was not a new approach but rather just a specific implementation of one developed by others, and I did not attempt to/no one permitted me to publish it as if it was. Years later, W% = .1106*RD + .5 appeared in the pages of the Baseball Research Journal.

So that this post might have some smidgeon of lasting value, I will close by reiterating the three conditions of an ideal win estimator that such linear constructs fail to satisfy. I have written plenty about win estimators in the past (and will doubtlessly rehash much of it again in the future), but I don’t believe I’ve explicitly singled out those properties. An ideal W% estimator would satisfy all three, which is not to say there is no use for an estimator that satisfies only two or even zero. The Linear Formula satisfies none. I will discuss how three of the common approaches perform: Pythagorean (with fixed exponent), Pythagenpat, and Palmer (RPW = 10*sqrt(runs per inning by both teams). Palmer can serve as a stand-in for any method that allows RPW to vary as the scoring level varies, and of course there are other constructs that I am not discussing.

1. The estimate should fall in the range [0,1]

The reason for this is self-explanatory. Pythagorean and Pythagenpat pass, while Palmer does not. Obviously this is not really an issue when you apply the method to normal major league teams. It can become an issue when extrapolating to individual/extreme performances, though.

2. The formula should recognize that the marginal value of runs is variable.

This is somewhat related to #1--the construct of Pythagorean results in it passing both tests. However, there are other constructs that are bounded but fail here. Palmer fails here, which is inevitable for a linear formula. The gist here is that each additional run scored is less valuable in terms of buying wins and each additional run prevented is more valuable. This is also the hardest to articulate and the hardest to prove if one has not bought into a Pythagorean-based approach (or examined other W% models such as those based on run distributions).

3. The formula should recognize that as more runs are scored, the number of marginal runs needed to earn a win increases.

This could be confused with #2, but #2 is true regardless of the scoring level in question--it's true in 1930 and in 1968. In this case, the relationship between runs and wins changes as the run environment changes. This is where a fixed exponent Pythagorean approach falls short, while both Pythagenpat and Palmer take this into account.