Sabermetric Research

Phil Birnbaum

Saturday, September 29, 2007

Why are the Diamondbacks outperforming Pythagoras?

As of yesterday, the Arizona Diamondbacks were 89-70. However, they have scored 11 fewer runs than they've allowed. According to the Pythagorean Projection, they should be 78-81. That's a difference of 11 games.

Normally, deviations from Pythagoras are just luck. The standard logic says that a team that "should" go 78-81, but goes 89-70 anyway, is probably really a 78-81 team that got lucky. You shouldn't expect that luck to continue.

" ... the Diamondbacks are clearly not as good as their record, but they’re not as bad as Pythagoras would have you think."

Rosenheck describes two reasons for the D-Backs' outperformance. First, their clutch hitting has been excellent. Here, courtesy of Baseball Reference, is their OPS when the score is tied, followed by when the score is one run difference, then 2, 3, 4, and more than 4:

.748 / .759 / .745 / .742 / .736 / .729

They hit much better when the game is close than when it's a blowout. The MLB averages show no such pattern:

.762 / .760 / .761 / .759 / .760 / .754

The Diamondbacks have hit well in the clutch. And (as Rosenheck acknowledges), since clutch performance is almost certainly not an innate ability, the "clutch" part of Arizona's discrepancy is probably random chance.

However, the flip side of clutch hitting is clutch pitching. On this, Rosenheck argues, manager Bob Melvin has expertly figured out how to reserve his best pitchers when the game is on the line, saving his worst pitchers for blowouts when the runs they give up don't matter much. "Of course," he says, "all teams pursue this strategy, but Melvin has done so more effectively."

This I'm not sure about. Here is Arizona's "clutch" line for pitching:

.726 / .735 / .738 / .743 / .743 / .807

Again, they're clutch, clutchier than average -- their pitchers are much better when the game is close, which again contributes to making them more successful than their Pythagorean projection. But should this really be attributed to Bob Melvin? If he were doing something different from other managers – say, using a mediocre pitcher in a 3-run save situation, but maximizing Jose Valverde's leverage by using him in an eighth inning tie game – I might buy it. But the game log (for Valverde, at least) doesn't show anything unusual.

One thing that does stick out is the .807 at the end of the pitching line. Indeed, Arizona's pitchers are particularly mediocre once the game is out of hand. That might be a real effect, rather than luck: Melvin might be using really crappy pitchers, or just telling them to go easy on their arms in blowout games.

But even so, that category is only 836 plate appearances. If I've done the math right, the difference between .743 and .807, in that number of PA, is only in the range of 10 runs. Let's say we double that, assuming that Melvin's strategy has an equal influence in 3- or 4-run games. That's still only 20 runs, or two games.

My best guess is that it's all luck, except for a couple of games. I'd bet that the D-Backs are, in talent, around a .500 team.

Tuesday, September 25, 2007

The updated "Wages of Wins"

My copy of the updated paperback edition of "The Wages of Wins" arrived today. A list of differences between the old edition and the new can be found at the authors' website here; generally, some of the tables are updated for the 2006 seasons, but the content is almost all the same.

The most significant difference is that the authors changed the formula for QB Score. An interception used to be worth –50 yards; now it's only worth –30. Their explanation is that when they reran their regression with better data – specifically, starting field position of drives – the coefficient came out different. Actually, it came out to –35 yards, but the authors use –30 because it's easier to remember.

Another small change I happened to notice is in the discussion of NBA players' playoff performances. The first edition of the book noted that "Win Scores" drop in the playoffs because of team defense improves in the post-season. I argued that the difference was probably due to the fact that playoff teams face only other playoff teams; the absence of games against cellar-dwellers would reduce offensive stats. The authors apparently agree – they now added a line to say that "teams in the playoffs tend to be better, with better defenses." However, a few sentences later, they let stand an assertion that teams play better defense in the post-season. That implies that the defense of an individual team improves. I don't know if they still think this is true, or if they've changed their minds about it.

While I'm here, one other thing I noticed (in both editions) that I hadn't mentioned before.

At one point, the book looks at Michael Jordan's year-by-year record in the regular season, and compare it to his playoff performance. It turns out that in 11 of the 13 seasons, his performance declined. The authors note, of course, that performance does decline in the playoffs, by an average .03 units of "Win Score per Minute." But if you adjust by the .03, Jordan still only goes from 2-11 to 5-8.

However, the decline of .03 is an overall average for all players. But Jordan's overall score is more than twice the average. If that's the case, shouldn't his decline also be twice the average -- .06 instead of .03? That is, the average player drops by 23% (.03 divided by the average .128) because of the better competition. Shouldn't Jordan drop by the same percentage, rather than by the same raw number?

If you adjust his numbers by .06, instead of .03, Jordan now goes 10-3, which now coincides with his reputation as a clutch playoff performer.

Now, I don’t know for sure if you should indeed adjust Jordan by .06, or .03, or some other number entirely. You'd probably want to examine players with high Win Scores to see what the typical decline is. In any case, without knowing the right number, it's hard to analyze anyone's clutch performance this way.

An oversimplified competitive balance study

The authors, Eli Ben-Naim et al, start with a simplified model. Given any two teams, the worse team has a probability "q" of beating the better team. The variable q is obviously less than 1/2, but doesn't vary based on the skills of the teams. So a .200 team has the same probability of defeating a .770 team as a .480 team has of beating a .520 team.

Given that model, the authors then find the value of q that makes the model come closest to the historical distribution of winning percentages in each sport. The results are:

Their conclusion: soccer is the most competitive sport, while NFL football is the least competitive.

To check these predictions against the empirical results, the study looks at actual historical game results to see if the underdog winning percentages match the ones above. But which team is the underdog? Instead of choosing the underdog based on the full season's eventual record, the authors choose it based on the season's record to date. That means that on the third day of the season, if the Yankees are 1-1 but the Devil Rays are 2-0, the Rays are still the favorite. Using that process, the authors find that the model underestimates the chance of the underdog winning. For MLB, to take one example, the actual is .443, significantly higher than the .413 predicted.

The authors note this systematic bias, but they don't explain it. They do mention, in the same paragraph, that there have been changes in "q" over the eras, but it's unclear whether they think that explains the bias or not.

My feeling is that the bias is caused by the oversimplified model. First, q is not the same for all games. Second, the study's method of choosing the underdog in early-season games adds randomness and causes the "actual" number to be too high. Finally, home field advantage has a large effect on which team is actually the underdog, and that, too, would make the "actual" numbers turn out too high.

The authors briefly address all three issues. They say that if they look at only games in the second half of the season, "remarkably," this changes the upset frequency by less than .007. (I agree that this is remarkable; I would have expected a lot bigger a difference.) Also, when they ignore all games where the teams' winning percentages are less than .050 apart, the upset frequency changes by "less than .005." Again, I would have expected a larger difference. Finally, they mention home field advantage but make no predictions about its effects.

In a subsequent section of the paper, the authors run the same calculations on team's all-time records, calculating q so that the theoretical all-time winning percentages match those actually observed. I'm not sure this is a good idea at all – team X might have a higher 50-year winning percentage than team Y, but that doesn't mean that it's possible to know which was the underdog when X and Y played each other in May of 1953. But the authors nonetheless conclude that

"The fact that similar trends for the upset frequency emerge from game records as do from all-time team records indicate that the relative strengths of clubs have not changed considerably over the past century."

That doesn't make much sense to me.

In any case, there are better methods for figuring out the relative single-game competitiveness levels of different sports. One much easier way, that I suggested in my review of "The Wages of Wins," is to simply look at home field advantage. Start with the assumption that HFA is, in some physical sense, the same for every sport (as Tango once suggested in a comment somewhere that I can't find, maybe every athlete performs 1% better at home, and the rules of the sport turn that 1% into an increased chance of victory). If you look a very large number of home games, the skill would even out in all but home field advantage. In that case, the more likely the home team wins, the more likely the *better* team wins ("better" by an average of the HFA). This method avoids all the problems of this study – failing to account for HFA, failing to account for different team skills, and failing to accurately note which team is actually better.Indeed, the home field advantage in the NBA is much higher than in MLB. I'd bet that if you ranked all the sports, the results would look roughly the same as the chart above. Another method is to figure out the numbers directly. Start with Tango's method to figure out the league's talent distribution. For instance, in MLB, the SD of actual team wins is about 11.6. The theoretical SD due to luck is 6.3 games. Therefore, the SD of team talent is 9.7 (the square root of 11.6 squared minus 6.3 squared).

That means that the talent difference between two randomly selected teams is about 14 games (9.7 times the square root of 2). That's about .080. So the underdog team should win about 42% of games. (You can use log5 to be a bit more accurate if you like, but I think it's still about 42%.) Again, if you repeat this for all five sports, I'd be willing to bet you'd get results similar to the chart found in this JQAS study: similar, but without the flaws resulting from the oversimplified method.

Friday, September 21, 2007

Can steroids increase HRs by 50%? (Update)

This is a follow-up to my previous post on the Tobin steroids paper. Thanks to Joe P. and John Matthew for letting me know that R. G. Tobin's steroids paper is now available online, at Alan Nathan's site, here.

Tobin starts by quoting a study that found that weightlifters given steroids showed a 9.3% increase in muscle mass (compared to 2.7% for non-users following the same training regimen). He therefore assumes a steroid-induced 10% increase in muscle mass. This corresponds to a 10% increase in *cross-sectional* muscle mass (I presume this is because muscles don't grow in length, just width). It is "well established" that a 10% increase in cross-sectional mass leads to a 10% increase in the force the muscles can exert. A 10% increase in force means a 10% increase in energy. And a 10% increase in energy leads to a 5% increase in bat speed (Tobin doesn't say, but I assume this is because energy is proportional to the square of velocity).

Then, after making some assumptions about the physics of the collision and the ball's travel, Tobin calculates that after a 5% increase in bat speed, the percentage of home runs per ball in play would increase from 10% to 16.6%, a 66% increase.

All this seems perfectly plausible to me, except that I'm not sure it matches the empirical home run data. Tobin shows a historical chart of home runs (as a percentage of balls in play) by top sluggers over the years. There is a significant increase starting in 1995. But Tobin argues that there is a significant *decrease* starting in 2003, the year MLB steroid testing was introduced. And, yes, there is a drop between 2002 and 2003, but an increase in the following years, so that 2005 is the fifth highest ever (and one of the top four is 1961!). So it would seem there's something happening other than steroids – and if steroids have indeed increased users' HR rates by 50%, we should have expected to see a much larger drop.My feeling is that there must be other reasons than steroids -- or at least *additional* reasons -- for the recent power increase. Tobin argues against that:

"Such dramatic changes in performance over a short period of time are rare in well-established sports."

I'm not so sure that's true. NHL offense was at its highest level ever in the early 1980s, but close to its lowest ever only 15 years later. Fifty-goal scorers were rare until about the mid-1960s, when suddenly they became commonplace. I don't know enough about other sports, but I'd bet that there were similar changes in football and basketball, too.

Tobin notes, correctly, that a small change in the distance a ball travels can lead to a large change in the number of home runs hit. But even if the change resulted from something other than steroids, players would notice the change, and hit more fly balls in order to take advantage. (Power-hitting players would also become more valuable, and therefore there would be more of them signed to contracts – but since Tobin concentrates mostly on the leagues' top sluggers, this doesn't affect his conclusions.) So if physics suggested a 50% increase in home runs, you'd expect empirical results to be even higher: maybe, say 75% higher, 50% from physics, multiplied by another 17% increase from players trying to hit more fly balls than usual.

I guess my bottom line is that I'm willing to accept that 10% more muscle mass means 50% more home runs. But I'm skeptical that we're actually seeing the effects of a 50% increase. If we're not, that means that players on steroids are gaining less than 10% muscle mass.

An alternative is that *some* players are gaining 50% more home runs, but not *all* of the top sluggers. But, in that case, where are those other top sluggers getting their power? It must be from something other than steroids.

Finally, Tobin gives a little bit of attention to pitching. Just as a 10% increase in muscle leads to 5% more bat speed, it would also lead to 5% more velocity on the pitch. That correlates to an ERA improvement of 0.5 runs per game. As Tobin points out, that's not much compared to the effect on home runs. But it's still huge from a baseball standpoint. You'd expect extra velocity to lead to an increase in strikeouts (given that the range of ability in terms of DIPS is small). And that's what we've seen in recent years. But, again, couldn't the change have also been caused by other factors?

So this study points out that steroids can increase performance substantially. And, recently, we have indeed seen substantial performance increases. But does that mean that steroids caused them? The empirical record fails to convince me.

Can steroids increase HRs by 50%?

This press release from Tufts University promotes a forthcoming paper from a physicist that claims that steroids can increase the frequency of home runs by 50 percent.

"A change of only a few percent in the average speed of the batted ball, which can reasonably be expected from steroid use, is enough to increase home run production by at least 50 percent," [Roger Tobin] says.

My impression is that there are two parts to the paper: first, figuring out how much bat speed steroids can add; and, second, figuring out how those extra miles per hour can increase home run levels.

But 50% sounds like an overestimate. If the number were that high, wouldn't we have seen a very substantial drop in home run rates (if only among certain players) once MLB started steroid tests? Or maybe we have, and I didn't notice.

And maybe there are more qualifications in the actual paper than in the press release. We'll have to wait for the paper, I guess.

JQAS study on the effects of MLB expansion and relocation

Quinn and Bursik start by running straightforward graphs of MLB trends from 1950 to 2004. These are as expected. Attendance rises, run scoring drops then rises again, fielding percentage improves, and so on. (The graph of park factor (bpf) is fairly flat, as you would expect; the authors seem unaware that bpf is relative to other teams in their era, and should stay roughly around 100 for all timeframes.) Competitive balance is the most interesting; it's highest in the early 1950s and lowest in the late 1950s. It then goes higher in the early 1960s, and declines irregularly until 2000, rising slightly after that.

The authors then run regressions to predict attendance changes based on these factors. They use time series analysis, which (I think) includes prior year attendance trends in the regression. This automatically corrects for long-term trends, if I remember the Time Series course I took so long ago.

Again most of the results are what you would have guessed: the higher the change in population in MLB cities between one year and the next, the bigger the corresponding jump in attendance. In the year following a strike, attendance falls (relative to the trend). And so on.

There are two results that surprised me. In expansion years, attendance drops among existing teams to a statistically significant degree – about 1000 per game, when adding two teams. (Looking at their graph (Figure 5), there was a big drop in 1962, but none of the other expansion years show large effects.) Also, one team relocating cities leads to a drop of about 600.

The authors hypothesize that fans are not as attracted to games featuring expansion teams, which is why attendance drops; or that the temporary changes in competitive balance reduce fan interest. Also, they suggest that perhaps relocations reduce natural rivalries (Dodgers and Giants, say), and attendance drops for that reason.

Another regression predicts compeitive balance (with a given season) based on some of these variables. Not surprisingly, expansion reduces balance. But new stadiums decrease balance to a statistically significant extent, and I don't know why that would be.

They also try to predict runs per game, and there is no surprise there: the only significant variables are DH and an indicator variable for 1969. Fielding percentage is reduced by expansion, but also (at the 10% level) by bpf; the bpf finding is probably random noise. Also, if there is a jump in the number teams that use the DH, more errors are committed. This is probably related to expansion in some way.

My overall feeling is that this study breaks no new ground, but does find some unexpected effects. I don't understand time series enough, or AR(1) or MA(1) models, to know if these effects are artifacts of the method, and I wish the authors had looked into some of them a bit deeper, or using different statistical techniques.

Just as for the NHL, the value of football teams is highly inflated given the amount of earnings. The mean team operating income in the NFL is $17.8 million, which is only a 1.85% return on the $950 million average market value. Put another way, the “enterprise value ratio” of the average team is over 50 (950 divided by 17.8). Typically, publicly traded businesses are around 10.

I argue that teams are inflated because they are toys for the rich. If that’s the case, we can figure out how much those toys are worth. If the team owner invested his $950 million in an investment earning, say, 8%, he would have made about $76 million. Instead, he made only $17.8 million. So rich old football fans are willing to spend $56 million a year to own a team.

In hockey and baseball, though, the figures are much lower. NHL teams earn 2.3%, and so the cost of ownership is 5.7% of their $180MM value. That’s only $10 million. In baseball, the average team (I eyeballed the chart) seems like it’s worth about $400 million, and earns $16.5 million. That means the cost of ownership is $15.5 million a year.

My theory, that sports teams are like Picassos – owned for the pleasure of ownership – suggests that the cost of NFL and MLB teams should be closer than they are. You can own an MLB or NHL team for less than a third the annual cost of owning an NFL team. Why should that be so? Are NFL teams so much more fun to own? Is there really so much more demand that prices should be three times as high?

Another theory, from David Gassko (see comments), is that owners know they will eventually be able to sell teams at a hefty profit, and so don’t care so much about operating earnings. But that still doesn’t explain why NFL teams should be worth so much more, does it?

Monday, September 10, 2007

Consumer Reports is starting to bug me

Consumer Reports magazine has been frustrating me recently. They seem to know what they’re doing when it comes to testing products, but in anything involving numbers or analysis – consumer sabermetrics, say – they come up short.

Here are three examples from the October, 2007 issue, which arrived in my mailbox last week. These aren’t the only things in the issue that bother me, but they’re the three biggest.

----

Jack is a hypothetical retiree in a sidebar in CR's "Your Money" column. The question: should Jack start collecting Social Security at age 62, or should he wait until age 70?If he starts at 62, he'll receive $19,320 a year (adjusted annually for inflation). Or, if he waits until age 70, he'll start receiving $35,240 per year. Which is better, assuming Jack will live to age 90?According to the "CR Money Lab," the first option gives Jack total payments of $542,570 (in 2007 dollars). That's 28 years times $19,320. The second option gives him 20 years times $35,240, which works out to $709,745. (These numbers aren't exactly correct; I assume there are rounding errors somewhere.)28 years times $19,320 = $542,57020 years times $35,240 = $709,745Therefore, CR says, the second option is better.Has CR never heard of the time value of money? A dollar today is worth more than a dollar eight years in the future, because of interest. CR’s calculation treats them as equal, in violation of basic principles of Finance 101. As Steven E. Landsburg writes (perhaps a little too harshly): "when college sophomores treat a dollar paid 20 years from now as the equivalent of a dollar paid today, we usually advise them that they have no talent for economics."Assuming a real discount rate of 3%, the first option gives Jack $381,842 (in 2007 dollars, adjusted for inflation and time). The second option is worth $314,033. It’s actually the first option that’s better!The higher the discount rate, the better the first option looks. If you use 4%, it wins by $341K to $237K. To get the two options equal, you have to lower the discount rate to about 1.8%.Which option to choose depends on your personal discount rate, which is another way of saying it's your personal situation and preference. As CR does admit, your economic circumstances are a factor in which option is better for you.Here’s one last way of looking at it. Suppose you don’t need the money until age 70, but decide on the first option anyway. From age 62 to 69, you invest all your benefits at a real rate of 4%. At age 70, those eight years of benefits will have grown to $185,139.Now, even after 70, you’ll still be receiving only $19,320 a year, as opposed to the $35,240 you would have got had you waited. Your shortfall is $15,920 per year. But the lump sum of $185,139 should cover the shortfall for quite a few years, especially if invested well. You could probably even buy an annuity with that $185K that would top up your annual benefit, bringing it close to what you would have got by waiting.Indeed, this kind of actuarial logic is probably how the government sets the benefits in the first place: so no matter which of the two options you take, the expected total payout, adjusted for time value, is roughly the same. CRs oversimplified calculation is simply not correct.

----

In the October issue’s letters section, a doctor writes in to explain that when drug companies send him free samples, he uses them "to let patients try the medication for free to see if it is effective and tolerable."Sounds like a win/win, right? Not to CR. An unnamed editor replies,

"Free samples are not free. They are part of the drug company’s advertising budget and contribute to the high cost of drugs. The free sample is a tool to tune the patient in to brand-name recognition, so that when it runs out they will stick with the same brand, despite the expense. There might be less-expensive drugs that are just as effective."

This is wrong.

First, the free samples might actually lower the cost of the drug. Most of the drug companies’ costs are in R&D; it can take many millions of dollars in research to come up with one marketable drug. But once the drug is approved, the cost of actually making the pills is minimal. So the more patients taking the drug, the lower the drug company can price it and still recoup their costs. If the free samples prove to patients that the drug will help them, the drug sells more, and prices can come down.

Suppose that without free samples, the demand curve is such that 1,000 people will use the drug at $100, but 1,500 will use the drug at $60. The company would make more money pricing the drug at $100. Suppose that after giving out samples, 2,000 will use the drug at $100, but 5,000 will use the drug at $60. Now, the company will drop the price to $60, so it can earn $300,000 instead of $200,000.

CR should look at it this way: if I were the only reader of their magazine, could they afford to sell it for $3 a copy?Second, we patients aren’t as dumb as CR thinks we are. If we like our free sample of drug X, we might stick to it even though we know that there might be cheaper drugs that work too. But, in most cases, it’s just not worth the risk. If drug X is $100 a month, and drug Y is $80 a month, it’s perfectly reasonable to pay the extra $20 for a drug you know will work, instead of taking a chance on an untried alternative. (This arguably doesn't apply to generics, which are actually the same compound as the brand name. But CR isn’t talking about generics.)Medicine is not a game, where the objective is to experiment with treatments to make the drug companies’ profits as low as possible. Switching to a new drug, if the old one works just fine, can be rather unpleasant. If a $1 pill keeps my diarrhea under control, I ain’t switching to a different compound just because it only costs 90 cents. At least not before a big date, or a job interview.And keep in mind that these are prescription drugs we’re talking about, drugs the patient can’t get without a doctor’s approval. If there are cheaper alternatives, I expect my doctor to recommend them. If my insistence on the same drug doesn’t make medical sense, I expect my doctor to try to talk me out of it. But if I’m happy to get a free sample, and the doctor is happy to offer it to me, and my doctor agrees with my decision to stay on it after the sample runs out - well, who is Consumer Reports to overrule my relationship with my doctor? It’s my money and my suffering, and not CR’s.Indeed, can you imagine what CR would say if HMOs started forcing patients to give up their preferred medication for a different, but cheaper, one?Strangely enough, CR’s argument doesn’t seem to apply to their own products. In this very same issue, on page 38, they offer me a "risk-free sample issue" of "CR on Health." If I like it, I have to pay for more.What makes CR’s offer so much more reasonable than a drug company’s? Isn't it true that free issues are not free? Once CR stops with the expense of giving away free issues, won't the price of a subscription drop? And how come they're not worried that "when my free sample issue runs out I will stick with the same magazine?" After all, there are many less-expensive alternatives than $39 a year.

----

In an article entitled “Make your car last 200,000 miles” (subscription required), CR compares the costs of keeping a new car 15 years, as opposed to replacing your new car every five years. Here are their numbers. The first column is one car over 15 years; the second column is three cars over 15 years.

What CR is saying is that, by keeping your Civic for 15 years instead of trading in every five years, you can save $20,500. Plus, if you invest that $20,500, you can earn an extra $10,300 from the interest (at 5% per annum). So the total savings are $30,800.

But there are a couple of things here that don’t make sense. First of all, CR claims that if you pay $19,000 for a Civic, and keep it for 15 years, it will only depreciate $14,900 – which means it’s still worth $4,100. Could that be right? A lot of Civics will rust out over 15 years, and be worthless. Even if similar cars for sale right now are worth $4,100 (or about $2,600 before 15 years’ inflation), there’s a selection bias – most ‘92 Civics are long dead, especially around here with our salty winter roads. (I can tell you for sure that my 8-year-old Sunfire is going to be worth a lot less than 21% of its original price by the time it hits 15.)

Second, where do the investment returns come from? CR is calculating $10,300 in interest on $20,500 principal, which is a 50% return. At 5%, the average dollar of the $20,500 savings would have to compound for 8 years.

But that’s not the case. Over the first five years, both options are absolutely identical, so there’s no savings then. All the savings accrue over the next ten years. If you assume the money is saved equally over each month of those ten years, the average compounding time is five years. For repairs, the savings might slightly skew to the middle five years, since breakdowns are probably more frequent in years 11-15. On the other hand, insurance is lower in years 11-15. Also, the absolute numbers are higher in years 11-15 because of inflation. All these seem to roughly balance out, so let’s assume that our orignal 5-year number is correct.

Five years compounding brings the interest only to about $5,700, far less than the $10,300 claimed.

(Update: this method is an approximation: a better method gives $5,921 instead of $5,700. See comment 5 to this post.)I could be wrong about all this, of course, but if I’m right, CR is making overstated financial projections that no used-car dealer would be allowed to get away with.

-----

So what’s the deal with CR? They seem to have experts who understand how to deal with testing and evaluating real products, but when it comes to finance and economics, where you have to use logic instead of instruments ... well, if there were a Consumer Reports for Consumer Reports, I think they’d rate a “Not Acceptable.”
-----

UPDATE, 9/12Here's one more.In their “Claim Check” feature this month, CR investigates a claim by Tide that you can save “up to $65 a year” in energy costs by using its Tide Coldwater detergent. The savings accrue because the product lets you wash in cold water instead of warm.

“But hold on. With the qualifier 'up to [$65],' the claim is valid even if you save a penny a year.”

That’s literally true, but any reasonable consumer knows what “up to” means. It means "as much as, under the right conditions, but yours will probably be less." Here, the conditions are very reasonable: indeed, if you live in a city with higher-than-average electric rates, you’ll save *more* than $65.

Besides, if the claim was way off, and P&G defended it on the "penny a year" grounds, CR would be all over them, accusing them of misleading the public.

So I would have thought this to be a perfectly legitimate and ethical use of “up to.” Sniping at Tide here is completely inappropriate.

Speaking of energy savings: three pages later in the magazine is an ad for a new book, “Consumer Reports Complete Guide to Reducing Energy Costs.” Why should you buy this book? Because CR says you can

"SAVE UP TO $1000 A YEAR."

Seriously. Three pages after gratuitously bitching about "up to," they use it in an ad for their own product!

Sunday, September 09, 2007

An NFL over/under system that beat the Vegas line

Here’s the system. If a team is predicted to have 9.5 wins or more (that is, go 10-6 or better), take the under. If they’re predicted to go 6-10 or lower, take the over.

If you did that over the last two seasons, you would have gone 24-10. That is, needless to say, very impressive.Now, it’s possible, and even easy, to mine the data and come up with formulas that pick winners in retrospect. So it could be coincidence that this method worked so well in 2005-06. Indeed, there's some selection bias, because if it hadn’t worked well, Brian wouldn’t have posted it.

But, I think this one might actually have some merit, because it takes advantage of two biases that bettors have. The two principles that fans have trouble with, according to Brian:

1. The NFL is impossible to predict before the season starts. And,2. Regression to the mean rules the day.

Basically, a lot of what happens over a short 16-game season is luck. And so, consider a team that goes 10-6 last year. It could be that this team is an average 8-8 team that got lucky, or a really great 12-4 team that got unlucky. There are so many more average teams than great teams, though, that the odds greatly favor the “lucky team” hypothesis. But fans have trouble grasping that; they see the actual 10-6 record, and think the team is likely to repeat that performance.

The same is true, in reverse, for the teams with bad records.

So that's a decent explanation for why Brian's system works (if it does): bettors underestimate the amount of luck, and therefore think teams are less average than they actually are.

Still, it’s dangerous to suggest that a strong and liquid betting market is getting these things wrong. All I’m saying is that, compared to other betting systems based on historical results, at least this one has a very plausible explanation for why this happened.

And, intuitively, it just seems right. As of right now, TradeSports is saying that the New Orleans Saints (who are now 0-1) have a 42% chance of going 10-5 over the rest of the season. Given that they were only 10-6 last year, and 3-13 the year before, doesn’t that sound a little optimistic? There are probably personnel changes that are driving some of this enthusiasm (I don’t follow the NFL much, so Drew Brees is the only one I know), but I’d still be taking the under here.

As I said, I hesitate to suggest that sabermetricians are smart enough to beat the Vegas line. But still, if you pointed a gun to my head and forced me to choose one system to play, this would be it.

Tuesday, September 04, 2007

Track and field: wind-adjusted 100-metre records

According to the article, the IAAF won’t recognize a world record in this event if it was run with a tailwind of more than 2 meters per second (which is a gentle wind indeed, at less than 4.5 mph). Is that an overreaction? Just how much does the wind assist a runner in this event?

Duffy refers us to a couple of studies by J. R. Mureika (unfortunately, the links to those studies are no longer valid). He then uses Mureika’s method to correct every annual top-20 performance from 1998 to 2002.

From the charts provided, it seems like every 1 m/s of wind speed is worth about .05 seconds. Maurice Greene’s 1999 world record, 9.79 seconds, was run with the assistance of an 0.1 m/s wind. That means his “adjusted” time is about 9.80 seconds (probably slightly less, but apparently high enough to round to 9.80). That’s still good enough for the best adjusted time ever.

Going by the data provided, it seems the method does indeed have something going for it. Duffy lists every 100-metre race that clocked under 10.00 seconds, and does all the adjustments. And it turns out that the best times did in fact occur in favorable wind conditions.

If wind had nothing to do with speed, you’d expect the best “adjusted” times to be better than the unadjusted times. That’s because if the adjustment is random, then, just by luck, you’d have expect at least some of the best times to have adjustments that make them even better. But adjusting all the times actually makes the best performances look worse – and, in fact, it reduces the number of “extremely good” times.

With the adjustments, there were only 19 times at 9.89 or better. However, if you look only at the raw, official times, there were 32 such races. This difference would be quite unlikely if the wind adjustment wasn’t measuring something real. (Caveat: since Duffy lists only the top-20 times for each year, it’s possible that there’s some selection bias affecting the “adjusted”count – a mediocre time run into a strong headwind might not have been good enough on a “raw” basis to make the list. But that’s fairly unlikely, since the list goes pretty far down, and contains no winds strong enough to cause that large an effect.)

Still on that list of the best 100 races, Duffy reports that 75% of them were run with a tailwind. Again, that strongly suggests the wind is a major factor.

Up until 2002, there were three instances of races won in less than 9.80 seconds: Ben Johnson (9.79 in 1988), Maurice Greene (9.79 in 1999), and Tim Montgomery (9.78 in 2002). All those adjust to worse figures (9.85, 9.80, and 9.89 respectively) when you consider wind. Since 2002, there have been four times of 9.77. All of those were run with tailwinds of at least 1.0 m/s, which would result in an adjustment to at least 9.82 seconds.

So it looks like the 9.80 mark is yet to be broken, on a wind-adjusted basis.

New edition of "The Wages of Wins" out soon

"The Wages of Wins" co-author David Berri (along with Martin Schmidt and Stacey Brook) reports that an updated paperback edition of the book will be out in a couple of weeks.Berri links to his list of updates here. Book available at amazon.com here.