Sabermetric Research

Phil Birnbaum

Monday, August 30, 2010

Why are the Yankees willing to give up so much profit?

Last week, someone leaked the financial statements of several Major League Baseball teams. It turned out that two of the worst teams in baseball, the Pirates and Marlins, regularly turned a profit. They did that, in part, by pocketing MLB's revenue sharing payments, and simultaneously keeping their payroll very low.

Actually, the leaked statements didn't tell us a whole lot that we didn't already know. Every year, Forbes magazine comes out with their estimates of baseball teams' financials, and every year there is some criticism that the Forbes estimates are inaccurate. But if you compare the Forbes revenue figures to the teams' numbers on the leaked statements, you'll find that Forbes is pretty close -- in a couple of cases, they're right on. Forbes' *profit* numbers, as opposed to revenue numbers, aren't quite as accurate, but that's to be expected: a 5% discrepancy in revenues can easily lead to a 100% difference in profits.

Previously, I argued that the small-market teams will never be able to compete with teams like the Yankees and Red Sox, simply because their revenue base is too small. A win on the free agent market costs somewhere around $5 million (Tango uses $4.4 million, which may be more accurate), but brings in a lot less than $5 million in additional revenues for he Pirates or Marlins. And so, they maximize their profit by keeping their payroll down. Long term, teams like those aren't able to compete.

One obvious solution to this problem of long-term competitive imbalance would be to share all revenues equally, and force each team to spend roughly the same amount on payroll. However, that solution has its own problem -- the Yankees have double the revenues of most of the other teams in both leagues. Why should they suddenly be willing to share?

That is: suppose you bought the Yankees for $1.5 billion, thinking you'll pull in $400 million in revenues and make $50 million in profit (numbers made up). If MLB suddenly decides all revenue has to be shared, then, suddenly, you're pulling in only $200 million in revenues. You have to stop signing free agents, and, after everything shakes out, you now have only $25 million in profit.

Since businesses are valued on profits, and your profits are permanently cut in half, your $1.5 billion investment is now worth only $750 million. No matter how rich you are, you won't want to take a bath of $750 million even if it does make baseball better. Seven hundred and fifty billion dollars is just too much money.

Or is it?

The thing is, revenue sharing has been around in MLB for almost a decade now. It's not full sharing, which is what I described in the above example, where every team winds up the same. It's just partial sharing. Every team contributes 31% of its revenues to a common pool, which then gets split among all 30 teams. Effectively, if you're above average, you lose 30% of the amount by which you're above average (you pay 31%, but get 1% back as your 1/30 share). If you're below average, you gain 30% of the amount by which you're below average.

That's still a lot of money, then, that the Yankees are losing. In 2009, according to Forbes, the Yankees had revenues of $441 million, as compared to the league average of (I'm estimating just by eyeing the chart) about $200 million. However, I think that $441 million is *after* revenue sharing payments. (Why do I think that? Because in the cases of the actual leaked statements, the Forbes estimates are significantly closer to the revenue statements *after* adjusting for revenue sharing.)

So, the Yankees probably had about $549 million in gross revenues; they then paid $170 million into the pool, and received $62 million back, leaving $441 million.

That is: revenue sharing cost the Yankees $108 million in cash last year.

That's still very large. Forbes values the typical large-market team at 2.5 to 3 times revenues (the Yankees are above 3, almost 4). Assume the Yankees' market price really is 3x their annual revenues. (I'm not convinced -- I prefer a valuation based on profit -- but never mind). That means that, if they continue to consent to the MLB revenue sharing plan, and if they continue to spend the way they do, the Yankees have effectively agreed to hand $324 million to the other 29 clubs.

Looked at another way: according to Forbes, the Yankees have lost money every year from 2002 to 2008:

If you added back in the Yankees' revenue sharing payments for those years, that would probably turn every loss back into a profit. And, in 2009 alone, without revenue sharing, the Yankees would have made *five times* as much money -- $133 million instead of just $25 million.

So what's going on? Some possibilities:

1. George Steinbrenner is so rich that he doesn't mind subsidizing the other teams. He was old enough when revenue sharing came in to being that he knew he'd never spend his wealth before he died. And so, he figured, whatever is best for baseball was fine with him, regardless of cost, so long as he could keep winning.

2. The Yankees believed that, without revenue sharing, competitive balance would be so bad, and the Yankees so much better than the other teams, that the fans would stay away and the Yankees would wind up being worse off. There may be something to that: according to Forbes, the value of the Yankees doubled since 2001, while the other teams increased in value by only maybe 50% (eyeballing again). So maybe revenue sharing is bad for the Yankees bottom line in the short term, but better in the long term.

That is: maybe by creating a league where teams like Tampa Bay might be able to compete once or twice every 20 years, fan interest rises to the point where the Yankees' investment in revenue sharing pays for itself, by increasing interest not just in the Yankees, but in baseball in general -- resulting in more revenue from the website, a bigger TV contract, and so on.

3. Maybe the Yankees (and Forbes), know that the operating losses are temporary, caused by Steinbrenner's desire to win at any cost. Maybe they figure, correctly, that they can cut down their free agent spending whenever they want, and start making significant profits.

That's in keeping with standard models of business valuation: you assess the value of a business on its *future* earnings potential, not its past.

4. Maybe revenue sharing drops the price of free agents enough that, in combination with some of the other factors, it makes revenue sharing profitable.

Suppose a free agent will increase the Yankees' revenue by $10 million. Then, if the guy costs less than $10 million, the Yankees will sign him. But, with revenue sharing, the Yankees only get to keep $7 million of the $10 million. And so, they're only willing to sign the player if he costs $7 million or less.

With the 31% revenue "tax", every team is in the same position. That depresses demand for free agents, which keeps prices down. And since the Yankees are the largest consumers of free agents, they get the largest benefit from the price decrease. The question: is the effect enough to pay for itself? I bet the answer is no -- it helps, not enough to make up for the tax. But that's just a gut feeling. Any economists out there able to estimate the size of the effect?

-----

My guess is that it's partly desire to win at any cost, and part rational economic calculation. That is, part of it is Steinbrenner's willingness to spend part of his fortune on fame. The other part is numbers 2, 3, and 4: the Yankees are doing what they have to do to make MLB attractive to fans in general, and are willing to lose money temporarily to be winners. I'm not so willing to believe #1, that George Steinbrenner is willing to give away half the value of his team just like that.

If that's correct, my prediction is that, eventually, when the new owners of the Yankees decide they're not as willing to spend their entire profit, and more, to make the playoffs every year, they'll cut down on their player spending. Instead of winding up in a class by themselves, with a payroll 50% higher than the average of the next seven highest-spending teams, they'll join those other teams, and wind up looking more like the Red Sox and Cubs. They'll still be highly successful, but not so much so that they make a mockery of the rest of the league.

Or not. There could be other, better explanations for what's going on. Any other ideas?

" ... players are rated on all kinds of advanced stats. They have exotic names such as Qualcomp, Qualteam, WOWY, ZoneShift, ZoneStart, Fenwick and Corsi plus/minus, NHLe (equivalencies) ... all of them trying to improve on the current basic scoring stats, called "boxcar" numbers, which are rightly dismissed byt he stats guys as being imprecise, misleading, and lacking context."

What I found interesting was Don Cherry's reaction to the "Corsi plus/minus" stat -- which the article focuses on, but doesn't describe at all despite the fact that it's so easy to explain (just like regular plus/minus, but based on shots instead of goals):

On a broadcast last March ... host Ron MacLean explained to Cherry how the stat worked, then told Cherry, that, by this measure, Chicago's Marian Hossa was the NHL's best player and Vancouver's Ryan Johnson the worst.

"This shows how stupid the Corsi thing is," Cherry fired back. "This is how stupid guys come up with, trying to earn a living, they come up with dumb (stats) ... I'll tell you one thing, Vancouver loves this guy (Johnson), he is unbelievable, and that dumb-dumb system you're talking about, I would love to have Ryan Johnson on my team and every coach would have him on, too."

I'm a bit surprised here, because, while I don't believe Don Cherry is a deep analytical thinker, I've always thought that he knows hockey as well as anyone, and his talent evaluations were as good as anyone's.

Since, contrary to what MacLean may have said to Cherry, the player with the lowest Corsi stat is not necessarily the "worst" player in the league, perhaps Cherry was reacting to that. My guess is that, while Cherry is wrong about Corsi being a "dumb-dumb system," he might be right that Ryan Johnson is a better player than Corsi alone would suggest.

Of course, I could be wrong, and my faith in Cherry's scouting abilities misplaced.

Wednesday, August 18, 2010

More on oenometrics

She and her husband had been invited to dinner, where the host decided to test attendees' wine-identifying skills. Ten wines were served, and guests had to identify the country from which each wine came.

The top score -- 10 out of 10 -- was submitted by Kay's husband, who "hadn't tasted a drop of wine but just for kicks filled out the ballot on the basis of mathematical probabilities."

Well, you can't guess wines with 100% accuracy just on an understanding of probability theory. So, *how*, exactly, did he do it? The odds of guessing 10 out of 10 different countries (assuming the list of ten countries was provided) are 1 in 3,628,800. Even if there were two wines for each of five countries, that would still be a 1 in 113,400 chance.

So how did Ms. Kay's husband do it? That was the most interesting part of the entire article, and we get no explanation at all.

---

UPDATE: I e-mailed Ms. Kay, and she was kind enough to respond. It turnso ut that there were several small groups rather than one group of ten, as I had incorrectly assumed. If there were 2 groups of two and 2 groups of three, that would make the odds 1 in 144.

Her husband just tried to guess how the host would have permuted the answers -- for instance, he guessed that the first group of two was backwards, so that the lines on the scoring sheet would cross instead of being straight horizontal lines.

The moral of the story, I guess, is that if you're the host, choose the order of the answers by randomizing!

Monday, August 09, 2010

Do pitchers perform worse after a high-pitch start?

Last week, J.C. Bradbury and Sean Forman released a study to check whether throwing a lot of pitches affects a pitcher's next start. The paper, along with a series of PowerPoint slides, can be found at JC's blog, here.

There were several things that the study checked, but I'm going to concentrate on one part of it, which is fairly representative of the whole.

The authors tried to predict a starting pitcher's ERA in the following game, based on how many pitches he threw this game, and a bunch of other variables. Specifically:

-- number of pitches-- number of days rest-- the pitcher's ERA in this same season -- the pitcher's age

It turned out that, controlling for the other three factors, every additional pitch thrown this game led to a .007 increase in ERA the next game.

Except that, I think there's a problem.

The authors included season ERA in their list of controls. That's because they needed a way to control for the quality of the pitcher. Otherwise, they'd probably find that throwing a lot of pitches today means you'll pitch well next time -- since the pitcher who throws 130 pitches today is more likely to be Nolan Ryan than Josh Towers.

So, effectively, they're comparing every pitcher to himself that season.

But if you compare a pitcher to himself that season, then it's guaranteed that an above-average game (for that pitcher) will be more likely to be followed by a below-average game (for that pitcher). After all, the entire set of games has to add up to "average" for that pitcher.

This is easiest to see if you consider the case where the pitcher only starts two games. If the first game is below his average, the second game absolutely must be above his average. And if the first game is above his average, the second game must be below.

The same thing holds for pitchers with more than two starts. Suppose a starter throws 150 innings, and gives up 75 runs, for an ERA of 4.50. And suppose that, today, he throws a 125-pitch complete game shutout.

For all games other than this one, his record will be 141 innings and 75 earned runs, for a 4.79 ERA. So, in his next start, you'd expect him, in retrospect, to be significantly worse than his season average of 4.50. That difference isn't caused by the 125 pitches. It's just the logical consequence that if this game was above the season average, the other games combined must be below the season average.

Now, high pitch counts are associated with above-average games, and low pitch counts are associated with bad starts. So, since a player should be below average after a good start, and a high pitch start was probably a very good start, then it follows that a player should be below his average after a high pitch start. Similarly, he should be above his average after a low-pitch start. That's just an artifact of the way the study was designed, and has nothing to do with the player's arm being tired or not.

How big is the effect over the entire study? I checked. For every starter from 2000 to 2009 starting on less than 15 days rest, I computed how much his ERA would have been higher or lower had that start been eliminated completely. Then I grouped the starts in groups, by number of pitches. The results:

(Note: even though I'm talking about ERA, I included unearned runs too. I really should say "RA", but I'll occasionally keep on saying "ERA" anyway just to keep the discussion easier to follow. Just remember: JC/Sean's data is really ERA, and mine is really RA.)

To read one line off the chart: if you randomly found a game in which a starter threw only 50 pitches, and eliminated that game from his record, his season ERA would drop by half a run, 0.50. That's because a 50-pitch start is probably a bad outing, so eliminating it is a big improvement.

That's pretty big. A pitcher with an ERA of 4.00 *including* that bad outing might be 3.50 in all other games. And so, if he actually pitches to an ERA of around 3.50 in his next start, that would be just as expected by the logic of the calculations.

It's also interesting to note that the effect is very steep up to about 90 pitches, and then it levels off. That's probably because, after 90, any subsequent pitches are more a consequence of the pitcher's perceived ability to handle the workload, and less the number of runs he's giving up on this particular day.

Finally, if you take the "if this game were omitted" ERA difference in every game, and regress it against the number of pitches, what do you get? You'll get that every extra pitch causes an .006 increase in ERA next game -- very close to the .007 that JC and Sean found in their study.

-----

So, that's an argument that suggests the result might be just due to the methodology, and not to arm fatigue at all. To be more certain, I decided to try to reproduce the result. I ran a regression to predict next game's ERA from this game's pitches, and the pitcher's season ERA (the same variables JC and Sean used, but without age and year, which weren't found to be significant). I used roughly the same database they did -- 1988 to 2009.

My result: every extra pitch was worth .005 of ERA next game. That's a bit smaller than the .007 the authors found (more so when you consider that theirs really is ERA, and mine includes unearned runs), but still consistent. (I should mention that the original study didn't do a straight-line linear regression like I did -- the authors investigated transformations that might have wound up with a curved line as best fit. However, their graph shows a line that's almost straight -- I had to hold a ruler to it to notice a slight curve -- so it seems to me that the results are indeed similar.)

Then, I ran the same regression, but, this time, to remove the flaw, I used the pitcher's ERA for that season but adjusted *to not include that particular game*. So, for instance, in the 50-pitch example above, I used 3.50 instead of 4.00.

Now, the results went the other way! In this regression, every additional pitch this game led to a .003 *decrease* in runs allowed next game. Moreover, the result was only barely statistically significant (p=.07).

So, there appears to be a much weaker relationship between pitch count and future performance when you choose a better version of ERA, one that's independent of the other variables in the regression.

However, there's still some bias there, and there's one more correction we can make. Let me explain.

-----

In 2002, Mike Mussina allowed 103 runs in 215.2 innings of work, for an RA of 4.30.

Suppose you took one of Mussina's 2005 starts, at random. On average, what should his RA that game be?

The answer is NOT 4.30. It's much higher. It's 4.89. That is, if you take Mussina's RA for every one of his 33 starts, and you average all those numbers out, you get 4.89.

Why? Because the ERA calculation, the 4.30, is when you weight all Mussina's innings equally. But, when we wonder about his average ERA in a game, we're wanting to treat all *games* equally, not innings. The July 31 game, where he pitched only 3 innings and had an RA of 21.00, gets the same weight in the per-game-average as his 9-inning shutout of August 28, with an RA of 0.00.

In ERA, the 0.00 gets three times the weight of the 9.00, because it covered three times as many innings. But when we ask about ERA in a given game, we're ignoring innings, and just looking at games. So the 0.00 gets only equal weight to the 9.00, not three times.

Since pitchers tend to pitch more innings in games where they pitch better, ERA gives a greater weight to those games. And that's why overall ERA is lower than averaging individual games' ERAs.

The point is: The study is trying to predict ERA for the next game. The best estimate for ERA next game is *not* the ERA for the season. That's because, as we just saw, the overall season ERA is too low to be a proper estimate of a single game's ERA. Rather, the best estimate of a game's ERA is the overall average of the individual game ERAs.

So, in the regression, instead of using plain ERA as one of the dependent variables, why not use the player's average game ERA that season? That would be more consistent with what we're trying to predict. In our Mussina example, instead of using 4.30, we'll use 4.89.

With the exception, of course, that we'll subtract out the current game from the average game ERA. So, if we're working on predicting the game after Mussina's shutout, we'll use the average game ERA from Mussina's other 32 starts, not including the shutout. Instead of 4.89, that works out to 5.04.

That is, I again ran a regression, trying to predict the next game's RA based on:

-- pitches thrown this game-- pitcher's average game ERA this season for all games excluding this one.

When I did that, what happened?

The effect of pitches thrown disappeared, almost entirely. It went down to -.0004 in ERA, and wasn't even close to significant (p=.79). Basically, the number of pitches thrown had no effect at all on the next start.

-----

So I think what JC and Sean found is not at all related to arm fatigue. It's just a consequence of the fact that their model retroactively required all the starts to add up to zero, relative to that pitcher's season average. And so, when one start is positive, the other starts simply have to work out to be negative, to cancel out. That makes it look like a good start causes a bad start, which makes it look like a high-pitch start causes a bad start.

But that's not true. And, as it turns out, when we correct for the zero-sum situation, the entire effect disappears. And so it doesn't look to me like pitches thrown has any connection to subsequent performance.

UPDATE: I took JC/Sean's regression and added one additional predictor variable -- ERA in the first game, the game corresponding to the number of pitches.

Once you control for ERA that game, the number of pitches became completely non-significant (p=.94), and its effect on ERA was pretty much zero (-0.00014).

That is: if you give up the same number of runs in two complete games, but one game takes you 90 pitches, and the other takes you 130 pitches ... well, there's effectively no difference in how well you'll pitch the following game.

That is strongly supportive of the theory that number of pitches is significant in the study's regression only because it acts as a proxy for runs allowed.

Monday, August 02, 2010

A "Canseco Effect" study

Did Jose Canseco influence his teammates to use steroids? Canseco loudly claimed that he did, and the claim seems plausible. But do the numbers back him up?

A couple of years ago, in an academic paper, two Israeli economists said they do. Eric D. Gould and Todd R. Kaplan ran regressions on individual player-seasons, and claimed that after playing with Canseco, players in general hit for more power. Furthermore, the result didn't hold for other players. For instance, after playing with Ken Griffey Jr., Junior's teammates' home runs actually fell, instead of rising.

The paper was recently reviewed -- uncritically -- in Slate. In response, J.C. Bradbury reposted his original critique of the study. JC's response is right on the money -- he shows that the results aren't very robust, and wonders why the authors used raw home run totals instead of rates (which is especially egregious considering that their minimum was only 50 AB). I suggest reading JC's post before continuing here.

The study's most important finding is that, after Canseco left a team, his average ex-teammate improved by almost three home runs a year. That's a large improvement (and statistically significant at 4.5 SDs).

But, for the most part, the reason they hit more home runs is just that they got more playing time! In Table 7, the study shows that those players who hit 3 more HR also had 53 more AB. You might have expected those guys to hit, what, maybe 2 HR in the extra 53 AB, instead of 3? So the difference Canseco makes is ... one home run. Two HR due to playing time, but only one that could possibly be attributed to a change in performance level.

What caused that extra home run? It's a long stretch to insist it's due to steroid use, and it's almost as long a stretch to even say that it's something unique about Canseco. Because, look at the numbers for some of the other players in the same chart.

Ken Griffey looks like the anti-Canseco. After playing with Griffey, his ex-teammates wound up with 46 fewer AB than before, and their homers dropped by 3.4. But with 46 fewer AB, their expected drop should only have been about 2 HR, not 3.4. So it looks like Griffey reduced his former teammates' output by about 1.4 home runs.

If you want to believe that the extra 1 HR is due Canseco pushing his teammates to try steroids, does that mean that Griffey's negative 1.4 HR must have been because he pushed his teammates to *quit* steroids, and he did it 30% better than Canseco did?

Or, look at Ryne Sandberg's teammates. Those guys also lost playing time after their Sandberg teammateship, by 24 AB. But even with fewer at-bats, they *gained* 1.6 HRs! So Sandberg's effect on his teammates was more than two-and-a-half times as large as Canseco's. So, did Sandberg push PEDs 250% as hard as Canseco did?

So far, I think it's clear that these numbers don't necessarily have anything to do with steroids. Especially because there's a better explanation, at least for the changes in AB.

Ryne Sandberg played almost his entire career for the Cubs. If you want to be an ex-teammate of Sandberg's, and wind up in this steroid study, you have to wait. Either you have to be traded, or you have to wait until Sandberg retires. If you wait until Sandberg retires, you're going to be an OLD ex-teammate. And old players tend to get fewer at-bats than they did when they were younger.

On the other hand, Jose Canseco played for eight teams in his career. For the most part, if you wanted to be an ex-teammate of Canseco's, you just had to wait for him to move to another team. In the 1990s, that meant a maximum wait of two years, and an average wait of probably one year. That means it's very easy for you to wind up a young ex-teammate of Canseco's than a young ex-teammate of Sandberg's. And young players tend to get more at-bats as they get older and better.

Canseco's ex-teammates are younger than Sandburg's ex-teammates. That would explain whey they get more playing time, and that would also explain why they get better in subsequent seasons -- because they're still young enough to be improving, rather than declining. I can't prove it, but I bet that almost all the effect would disappear if the study properly controlled for age.

What this study actually demonstrates, I think, is that Jose Canseco switched teams a lot.

UPDATE: The authors have released a new version of the study, dated August 4. I haven't looked at it in depth, but one difference is that it now includes years up to 2009, instead of stopping at 2003.

One thing you'll notice in Table 7 is that 19 out of the 77 entries are significant at the 5% level. You'd have expected 4, not 19. Of course, they're not all independent, but still.

Also, Rafael Palmiero and Ken Griffey Jr. show as more extreme than Canseco, but in the other direction. That, I think, casts doubt on the hypothesis that there's something special about Canseco's numbers in this regard.