Archive for March, 2004

All of the steroids talk has stimulated a discussion on baseball’s anti-trust exemption over at Baseball Musings. A reader comments,

Look, the MLB exists only because Congress is willing to grant it an exception from anti-trust laws.

David responds,

If anything, Major League Baseball would be stronger without the antitrust exemption. It would have to compete against independent minor leagues.

While this discussion focuses on the effect of MLB’s monopoly power on the steroid issue, I want to disagree with both of the above statements on the importance of the anti-trust exemption. I believe the anti-trust exemption has virtually no effect on the structure of MLB as we see it today. If anything, the anti-trust exemption may make baseball better than the other major sports leagues, because less effort is wasted on frivolous anti-trust lawsuits that rarely solve anything. Now, I don’t wish to defend my second point at this time, so feel free to disregard. I just wanted to add my opinin on this. Sorry to drop this bomb and run, but this is a huge argument for another day. I think the important issue is that the anti-trust exemption does nothing to give MLB any protection from outside competition. Here is why:

1) The anti-trust exemption does not prevent other leagues from rising up to compete with MLB. In fact, any league that attempted to compete with MLB would receive the same anti-trust exemption. I think there is a public misperception that the exemption is a barrier to entry by rival leagues. For example, the Continental League was prepared to open for play in the early 1960s before it reached agreement with the MLB to expand. Other barriers to entry may exist, such as a natural monopoly cost structure (which I don’t buy either), but the exemption is not such a barrier.

2) When compared to the other major sports leagues in the US, MLB acts no more like a monopolist than the other leagues. A simple monopolist will restrict output, thereby raising the price. — I am ignoring the possibility of a price-discriminating monopolist here, but a price-discriminating monopolist does not produce near the harm of a single-price monopolist. — If the anti-trust exemption gives more market power to baseball its output should be less and its price higher. The following table lists the number of teams, regular season games, the average ticket price per game, and the average ticket price adjusted for the percent of the regular season viewed per game.

(*Including my hometown Charlotte Bobcats. All other data from the 2002-2003 season available from Rodney Fort.)

If anything, MLB seems to be producing more output at a lower price than the other leagues; all of which lack the anti-trust exemption.

A few things to rap this up. This does not mean that MLB does not act like a monopolist. In fact, all of the sports leagues may be acting like monopolists. However, I don’t think there is any evidence that the anti-trust exemption affects MLB’s monopoly behavior. I also realize that some sports economists disagree with this (Zimbalist), but others feel that the exemption is not important (Scully and Shughart). Also, I acknowledge that there are some problems comparing leagues with differing cost and demand structures, but I don’t think the difference are enough to simply throw out these numbers.

Last Friday I commented that despite the recent fall in ERAs in MLB since 2000, the variance relative to the mean ERA has risen. Using an assumption from Stephen Jay Gould I assumed that this increased variance in pitching performance indicated an influx of some very bad pitching into the league. However, Skip was a little more skeptical.

Time for a skewness check! If Kurjian & Bagwell are right, the skewness should show up in the right tail of the distribution. If right skewness has increased in recent years, coincident with your increase in std deviation, then they are right. In which case we won’t be seeing those 50 home run years of the recent past. I like Gould, but think Kurkjian is on to something an give him the edge here.

Great idea! While I am no skewness expert, I did run a few tests. The histograms seems to be quite informative. Here are frequency distributions for ERAs in 2000 and 2003.

Skip is right. The skewness indicates that pitchers seem to be improving. While some very bad pitchers still play in the big leagues, overall most pitchers seem to be getting better.

Skip links to an interesting article by Tim Kurkjian at ESPN on the improvement of pitchers in MLB. What caught my eye first was the quote by Jeff Bagwell.

“Pitching is so much better today,” says Astros first baseman Jeff Bagwell. “When I came up (1991), 91 (mph) is about as hard as anyone threw except for like (Rob) Dibble who threw 94. Now, almost everyone throws 94, and most of them are starting pitchers. Look at Juan Cruz (of the Cubs). He throws 97 (mph) and he can’t make their rotation.”

This puzzled me. Was he serious? If you look at the stats there is no doubt that hitters are doing much better than pitchers today when compared to 1991 with these NL stats.

But as Skip points out, Tim K. is not out to defend Bagwell’s grumpy old man syndrome. There is no doubt that pitchers now have less of an advantage over hitters than they did 15 years ago, but the recent offensive surge may be waning. I’m not so sure, and this table explains the reason for my skepticism.

This table lists the average ERA and the Coefficient of Variation of ERA for MLB pitchers facing more than 50 batters in a season. The C.V. ERA is simply the SD/Mean, which superior to simple SD which is biased by the Mean. The C.V. is important because it tells the dispersion of quality among all pitchers in the league. Because athletic performance hits a wall on the right side of the talent distribution, increased dispersion normally means an increase in lower-quality players. More lower-quality pitchers means more opportunities for very good players to take advantage of the bad players. This leads to more records being broken. (If this idea interests you, read Stephen Jay Gould’s “Death of the .400 Hitter.” in Full House.) With dispersion on the rise, I would not be surprised if some very good batters have some career years this year.

As promised in the update from yesterday’s post here are the results from the test of hitting power on walks for 2001. The format is close to the same as before, with only a few modifications. First, the coefficients I report result from regressions without the control variables. I did this because I want show you the strongest support I can find for power preceding walks (recall that I have been biased in the opposite direction). I included the controls last time, because they led to the strongest results. It is important to note that the controls change very little about the result so I think they are likely unimportant. The second change is that I was able to pull intentional walks (IBBs) out of the analysis with this data. I could not do that for the 2002 analysis. As you will see, the results do change some when looking only at non-intentional walks.

Here are the coefficient estimates when IBBs are included in the walk-rate.

SLG is not-quite statistically significant at the 5% level. Iso-Power is significant at only the 10% level. The marginal impact of power on walks is much larger in 2001 than in 2002 for SLG and Iso-Power. 2001 is the year that Bonds really, I mean REALLY, stepped it up. I remember watching that Braves series in May when he hit 6 HRs in 3 games. I am sure many pitchers took note of such power jumps (although HRs alone don’t seem to explain pitcher behavior). By 2002, Bonds’s reputation was at a whole new level, which is why the effect may not be so pronounced. In 2002 Bonds, pitchers percieved Bonds to be dangerous at all times.

But, how much did pitchers respond by simply giving Bonds a free pass, and how much of it was pitching around Bonds? That is, when pitchers pitched to Bonds, did they nibble or go after him based on his recnt history of hitting power. In this table I look at the walk-rate excluding all intentional walks.

All of the coefficients shrink some, and the results are no longer statistically significant. Thus, it seems that while there may have been some pitching around, pitchers were responding to Bonds’s power surges with IBBs. Here is a link to the 2001 game-log data. Thanks again to Doug and Alec for pointing me to it.

In a previous post I proposed that Bonds’s increased walks must be independent of power. Since that time I have come to believe that power might affect walks by causing pitchers to avoid pitching to Bonds. So, I decided to test it.

Thanks to a thoughtful reader I was able to acquire game-by-game stats for the 2002 season on ESPN.com. Unfortunately, I cannot find the game logs for 2001 anywhere online. I have tried hard, too.

Although all of the estimates are positive, none are statistically significant. I corrected for heteroskedasticity and tested for autocorrelation (none found and correcting does not affect the results). If you don’t know how to interpret the elasticity estimates, a 1% change in SLG is associated with a 0.28% change in the walk-rate, at the average. As best I can tell, there is no evidence that pitchers were shying away from Bonds when he was hot; however, Bonds was pretty hot over the entire season. I would prefer to see the estimates for 2001.

Other notes: The R-squared is about .04 for all estimates. The sample included 149 games. I tried several possible controls, these were the best and the power estimates were never statistically significant. I also tried some non-OLS estimates such as multinomial logit and poisson, but found the results were similar to OLS. I am not as familiar with these techniques so it is possible I fouled them up.

UPDATE: I just got the 2001 numbers. I have run them, and there is a difference from 2002. SLG does seem to lead the walk-rate by a larger amount, and it is statistically significant. Iso-Power is also larger and borderline significant. HRs do not seem to matter. Unfortunately, I will not be able to post until tomorrow. Thanks to Doug and Alec for the data.

Here is a lesson for newbie bloggers. If you have a brain fart over the weekend, don’t post it the day before a story breaks contradicting your hypothesis.

Thanks to all of you who have sent me comments. It is clear to me now that looking at Bonds’s walk rate you cannot show his hitting improvement is a result of his hitting discipline steroid-free. However, proving he is on the juice from the stats is equally as daunting. I would like to propose a test of whether Bonds’s walk explosion preceded his hitting explosion during the 2001 season. The only problem is that I do not have the data. I need game-by-game data in a simple spreadsheet format (not a box score). I just need to know Bonds’s hits, walks, power, and the score for every Giants game that year. Then I can run a simple regression to see if walks lead power or if power leads walks. If you have this data or know of where to get this data, please e-mail me. Or if you know of someone who has run a similar test, please let me know (I know of MNP’s analysis on Baseball Primer, but even he has backed off his claim).

Update: My thoughts have changed somewhat since I first posted this. See here, here, and here; and direct your comments to these threads.

I am sick and tired of reading about how Barry Bonds’s records, among others, are tainted due to steroids. Let’s look at Bonds’s success in his recent offensive boom. I’ll exclude his first year in the league to limit some bias to these estimates. From 1987-1999 Bonds hit .292 with a SLG of .572. From 2000-2003 these numbers rose to .336 and .774. This change has two potential causes: 1) Bonds simply became better or 2) he used steroids to become better. “The media” is very fond of the latter explanation.

There is a simple way to determine which was the cause. Steroids ought to increase a hitter’s ability to hit the ball harder. This will result in every ball he hits generating more power (i.e. outs become singles, singles become doubles, flyouts become HRs, etc.). This means both his batting average and SLG should go up with steroid use; although, I suspect the effect on batting average would be much less. But, one thing steroids should not change is Bonds’s hitting discipline, as measured by his walk rate.

In the table below I list some of Bonds’s statistics for three periods: career, 1987-1999, and 2003. The variables I use are On-Base-Percentage (OBP), OBP calculated with intentional walks excluded, non-intentional Walks per plate appearance, and Isolated Power (SLG-BA). The first three variables are good proxies for Bonds’s plate discipline. The less willing he is to swing at pitches outside the strike zone, the more likely it is he will see pitches he can hit from pitchers. The fourth variable captures his hitting power only. Steroids can influence his power, but it should have no effect on his plate discipline. While hitting discipline can increase his power.

Clearly, Bonds is not just hitting the ball harder, he is improved his ability to wait for the right pitch. And interestingly enough, Bonds’s improved power is consistant with his growth in OBP. Using an OLS regression, I estimated the effect of OBP on Iso-Power for all teams from 2000-2003. I found that every OBP point is worth about .86 Iso-Power points; hence, .86*(.517) = .445. That is pretty darn close to his actual Iso-Power average.

This leads me to conclude one of the following must be true.
1) Steroids increases hitting discipline as well as muscle mass.
2) Bonds’s success is likely a product of becoming a better hitter, not using steroids.

The Sports Economist links to an article in the Financial Times about University of Chicago economist Steven Levitt. Levitt, the most recent winner of the prestigious John Bates Clark Medal, is a very interesting fellow. If you had me for senior seminar, you know how much I admire Dr. Levitt’s work. But this quote at the end of the article surprised me.

There has been much hype recently about baseball clubs finding statistics to identify good players. Levitt read Michael Lewis’s book Moneyball about the supposed innovators, the Oakland As, and is unimpressed. “If you look at all the stats they say are so important, the As are totally average! There’s very little evidence Billy Beane [the club’s general manager] is doing something right.”

Really? So I looked at the numbers, and he is right (at least on offense). The table below (oh you know I’m going to overuse this little tool) lists the Mean OBP, SLG, and OPS for the Oakland A’s and the American League (AL) from 2000-2003. I also include the SD of the variables for the AL.

Note to self, never doubt Steven Levitt. Wow! So what does this mean? Maybe Billy is not so innovative after all. I think I’ll submit the article over as a Clutch Hit to Baseball Primer to see what the Primates have to say about it.

Update: I sent David Pinto of Baseball Musings an e-mail about this. He kindly responds to me that while the A’s offense may not be all that impressive, the pitching probably is. You have to look at both sides. I don’t have the pitching statistics handy, but I would suspect he is right. Also, David points out that Beane is getting his average talent for less than the average price. Thanks David!

Another Update: Baseball Primer has started a discussion thread on this article. There’s nothing like getting that Primer pointer glory.

Even More: I could not resist. I had to check out a sabermetric-approved pitching statistic to examine the claim that the A’s might be winning with excellent pitching and average hitting. Luckily, Futility Infielder calculates the DIPS-corrected ERA (dERA) for all pitchers in 2003. The site has all of the relevant information about the DIPS correction. The average dERA for the AL in 2003 was 4.97 with standard deviation of 1.95. The average dERA for the 2003 A’s was 4.43. It is lower, but not by much. I think Levitt is onto something. Even if the A’s are paying fewer dollars per OBP-point (for example), that should only make the A’s an average team with a below average payroll. This means that the Moneyball-approved statistics may not fully explain what has caused the A’s to succeed in the past few years. I am beginning to see why Beane may have been so loose-lipped in the company of Michael Lewis.

And Finally (I think): I want to say that I disagree with Levitt’s statement that there is no evidence that the A’s are doing anything right. Clearly, Bean & Co. have done a very good job of winning over several seasons. I am concentrating on Levitt’s statement that “If you look at all the stats they say are so important, the As are totally average!” I take this to mean that the stats hyped in Moneyball are not explaining much of Oakland’s recent success. A more sabermetrically-inclined friend tells me that this is old news. I think much of what Bean didn’t say — valuing defense and developing new tools for measuring process instead of outcomes — has helped the A’s win. Levitt is pointing out that the main metrics do not tell us much about the success of the A’s, which is something I did not realize. Sabermetrics is as much about a philosophical approach to analyzing the game as it is about using data analysis. There is no doubt the A’s are winning and performing well in some less sophisticated measures of performance (e.g., the A’s pitching led the league in regular old ERA), and I think the Beane model is contributing to that success. I am not sure if Levitt would agree with me on this, because he does clearly state that “There’s very little evidence Billy Beane [the club’s general manager] is doing something right.”

It now seems that the only criterion needed to prove a player is on steroids is for a player to have one season out of the ordinary. The latest victim of this new metric is Brady Anderson, who hit 50 home runs in 1996 — about double the amount he hit in any other season. This week, Jim “Head and Shoulders” Palmer used this standard to accuse Anderson of using steroids during the 1996 season. It is a stupid standard and Palmer ought to be ashamed of himself for saying it, but I thought I would run the “Caminiti test” on Anderson’s 1996. I want to compare Anderson’s improvement to Ken Caminiti’s improvement, which coincidently both happened in 1996. Caminiti admitted that his career year in 1996 was fueled by steroids, so I think it would be interesting if the improvement was similar.

Both players improved, but in different ways. The improvements in isolated-power are big jumps for both, but Brady’s improvement was much larger. In terms of on-base-percentage, both improved, but Caminiti’s improvement was nearly twice Anderson’s.

I want to remind my readers that this is really a crappy test that I am doing mostly for fun, and there is not much information here. BUT, if you are one of those people who thinks the Palmer standard is a good one, then you would have to say that if Anderson had some help from steroids, he also had to improve quite a bit on his own. His hitting power grew a lot more than Caminiti’s, and we know he was on the juice.

Addendum: Further investigation has revealed that data from the strike season included in the following study is responsible for the results. When corrected, batting average is no longer a better predictor of offense than RC. Thanks to Phil Birnbaum.
———

Economists are baffled at Billy Beane’s discovery of Bill James. James publishes his first book in 1977 but few in MLB take him seriously until the mid-to-late 1990s in Oakland. As I discuss in a previous post, maybe James’s ideas got pushed aside due to some bad luck. Mainly, maybe some of his good ideas looked bad due to anomalies in baseball that occurred at the time he came on the scene. So I decided to look at one of James’s first discoveries, Runs Created (RC), which was published in 1979.

RC = [(H + BB)*TB]/(AB + BB)

In the 3 seasons that followed RC’s introduction (1980-1982) the R-squared of the linear regression estimate of RC on Runs per game (RG) was .44. For the effect of BA on RG the R-squared is .64. If I’m a GM, who do I listen to? The guy from the pork and beans factory or my baseball scouts? In hindsight, it is easy to see what a truly great man James is. But such an observation may not have been so easy 20 years ago.

P.S: James tested the RC model on the 1975 season (according to Moneyball), for which the model predicts with an R-squared of .85. Today RC has been modified to where it is an excellent predictor of offense.