Sabermetric Research

Phil Birnbaum

Tuesday, September 27, 2011

How good are sports pundits' predictions?

According to this Freakonomics post, the experts who make NFL predictions aren't very good at it. Freakonomist Hayes Davenport checked the past three years' worth of predictions from USA Today, Sports Illustrated, and ESPN. He found that the prognosticators correctly picked only 36% of the NFL division winners.

That doesn't seem that great. Picking randomly would get you 25%. And, as the post points out,

"if the pickers were allowed to rule out one team from every division and then choose at random, they’d pick winners 33% of the time. So if you consider that most NFL divisions include at least one team with no hope of finishing first (this year’s Bengals, Chiefs, Dolphins, Panthers, Broncos, Vikings, and Manning-less Colts, for example), the pickers only need a minimum of NFL knowledge before essentially guessing in the dark."

Well, it sounds right, but you have to look deeper.

Winning depends on two things: talent, and luck. Since luck is, by definition, random, when you predict a winner, the only thing you can do is pick the team with the most talent. And, despite the 36% figure, there's no evidence that the pundits misjudged the talent. Because, just by luck, sometimes the best team won't win, and that's unpredictable.

Two days ago, the Buffalo Bills upset the New England Patriots, despite being 7:2 underdogs (I'm estimating 7:2 based on the 9-point spread). What percentage of experts would you expect to have got that right? Your answer should be zero percent. Nobody with any knowledge of football should have thought the Bills had a better than 50 percent chance of winning. On the off-chance that you DO find someone who picked the Bills to win, he's probably a crappy predictor -- maybe he just flips a coin all the time.

In the case where the underdogs wind up winning a game, or finishing first in their division, the truth is the opposite of what Freakonomics implies. In that case, the higher percentage of correct predictions, the WORSE the pundits.

So what does that 36% figure actually tell you? By itself, absolutely nothing. You have no idea, looking at that bare number, how good the pundits are. It depends. If it was *all* bad teams that won, a number as high as 36% means the experts are wrong a lot, but 36% of their bad picks happened to turn out OK. If it was all good teams that won, a number as low as 36% means the experts are wrong a lot -- they must have picked 64% bad teams. And, if it was exactly 36% of the best teams that won, but those aren't the cases where the experts were right, then, again, the experts are wrong a lot.

But, if it was exactly 36% of the best teams that won, and those are exactly the cases where the experts were right ... then the experts are perfect predictors.

So you can't just look at a number. 36% may be bad, but it might be awesome. It depends what actually happened.

-------

However: while this logic applies to picking outright winners of games or divisions, it doesn't apply to picking against the spread. Why not? Because, against the spread, the presumption is that the odds are close to 50/50.

In the Patriots/Bills game, the odds were roughly 77/22. Some experts might have pegged the Bills as having a 25% chance of winning, while some may have estimated only 20%. Still, both pundits would have obviously still bet on the Patriots. The fact that New England ended up losing isn't really relevant.

But against the +9 spread, you might have a reasonable difference of opinion. One expert might figure the true spread should be +8.5, and another might figure +9.5. So the first guy takes the Bills, and the second takes the Patriots.

On bets that are approximately 50/50, reasonable experts can disagree. On bets that are 77/22, they cannot. So, when it's 50/50, a higher percentage could, in fact, mean better predictions.

Still, there's lots of luck there too. If a pundit predicts all 256 games in a season, the (binomial) standard deviation of his success rate will be a little over 3 percentage points. So one predictor in 20 will be over 56%, or under 44%, just by luck.

That means it's still hard to figure out who's a "better" expert and who's not.

--------

The post goes on to criticize the pickers for risk aversion. Why? Because, it seems, the experts tended to pick the same teams that won last year.

Um ... why is that risk aversion? It stands to reason that the teams that won before are more likely to still be pretty good, so they're probably reasonable picks. But, the author says,

"Over the last fifteen seasons, the NFL has averaged exactly six new teams in the playoffs every year, meaning that half of the playoff picture is completely different from the year before. ... Given that information, a savvy picker relying on statistical precedent would choose six new teams when predicting the playoffs."

That doesn't follow at all! Just because I know six favorites will lose doesn't mean I should pick six underdogs! That would be very, very silly.

It's like predicting whether John Doe will win the lottery this week. The odds say no, and that's the way I should bet. And it's the same for Jane Smith, or Bob Jones. If there are a million people in the lottery, I should pick them all to lose. I'll be right 999,999 times, and wrong once.

But, according to Freakonomics, I should arbitrarily pick one person to win! But that's silly ... if I do that, I'll almost certainly be right only 999,998 times!

It's not exactly the same, but this logic reminds me of the Jeff Bagwell prediction controversy. In the fall of 1990, Bill James produced forecasts of players' batting lines for 1991, and it turned out that Bagwell wound up with the highest prediction for batting average. It was said that Bill James predicted Jeff Bagwell to win the batting title.

But, obviously, he did not. At best, and with certain assumptions, you might be able to say that James had Bagwell with the *best chance* of winning the batting title. But that's different from predicting outright that he'd win it.

Back to the lottery example ... if I notice that John Doe bought two lottery tickets, but the other 999,999 people only bought one, I would be correct in saying that Doe has the best chance to win. That doesn't mean I'm *predicting* him to win. He still only has a 1 in 500,000 chance.

--------

Finally, in their introduction to their post, Levitt and Dubner say,

" ... humans love to predict the future, but are generally terrible at it."

I disagree, especially in sports. Yes, very few people can beat the point spread year after year. But that doesn't show that the experts don't know what they're doing. It shows that they DO! Because, after all, it's humans that set the Vegas line, the one that's so hard to beat!

I'd argue 180 degrees the opposite. In sports, humans, working together, have become SO GOOD at predicting the future, that nobody can add enough ingenuity to regularly beat the consensus prediction!

Thursday, September 22, 2011

The Bayesian Cy Young

At Fangraphs, Dave Cameron and Eric Seidman have a nice discussion (hat tip: Tango) on who's the better Cy Young candidate: Clayton Kershaw, or Roy Halladay?

Part of the discussion hinges on BABIP: batting average on balls in play. As Voros McCracken discovered years ago, pitchers generally don't differ much in what happens when a non-home-run ball is hit off them. Most of the overall differences between pitchers, then, are due to the fielders behind them, but mostly due to luck.

So far in 2011, Clayton Kershaw has a BABIP of .272, which Eric decribes as "absurdly low." Still, Eric thinks it might actually be skill rather than luck, since since .272 it's not that much different than Kershaw allowed in previous years. Dave argues that Kershaw's three seasons is still a fairly small sample size, and points out that most of his BABIP advantage comes from his record at home (he's about average on the road).

Anyway, my point isn't to weigh in to which one is right -- they do a fine job hashing things out in their discussion. What I want to talk about is something they both seem to agree on: that it's important whether the BABIP is luck or skill. If it's luck, that reduces Kershaw's Cy Young credentials. If it's skill, he's a better candidate.

Seems reasonable, and I don't necessarily disagree. But let's see where that logic leads.

Because, there are other kinds of luck, or factors that pitchers can't control. For instance, there's park (which is usually already adjusted for in WAR, the statistic Eric and Dave cite most in this debate).

There's also quality of opposition batting. It's probably not too hard, if you have good data, to figure out how much either of the pitchers gained by being able to pitch to inferior hitters. You could also check if one of them had the platoon advantage more often. And, if one of them pitched more at home than the other one did.

We'd probably all agree, right, that you'd want to adjust for those kinds of things if we had the information? To be clear, I'm not criticizing Dave or Eric for not spending hours figuring this stuff out. I'm just saying that if you have the data, it's relevant in comparing the pitchers.

There are other things too, that eventually we'll be able to figure out, that we can't right now because (as far as I know) the research hasn't been done. Suppose Kershaw throws a pitch at a certain speed, with a certain break, on a certain count. And, someday, we'll know that kind of pitch is swung on and missed 30% of the time, called a ball 5% of the time, called a strike 10% of the time, fouled off 10% of the time, and hit in play 45% of the time with an OPS of .850. Maybe, overall, that pitch is worth (say) +0.05 runs (in favor of the pitcher).

Once we have that kind of information, we can check for "batter swing luck". If it turns out that batters just randomly happened to go +0.03 on that pitch from Kershaw this season, instead of +0.05, we should credit him the extra 0.02, right? He delivered a certain performance, and the batters just happened to get a bit lucky on it, as if his BABIP was too high. (This measure would probably substitute for BABIP: it includes balls in play, but also home runs, swings-and-misses, and walk potential.)

So we'd adjust Kershaw and Halladay for how lucky the batters were on those swings.

That's not unrealistic, and it'll probably eventually happen, to some degree of accuracy. Here's one that probably won't, at least not for a few decades, but it works as a thought experiment.

Imagine we hook a probe to every batter's brain, so on every pitch we can tell if he's guessing fastball or curve, and if he's guessing inside or outside. After a couple of years of analyzing this data, we figure that when he guesses right, it's worth +0.1 runs (for the batter), when he guesses half-right, it's worth 0, and when he guesses wrong, it's -0.1.

That again, is something out of the control of the pitcher (especially if both batter and pitcher are randomizing using game theory). So you'd want to control for it, right? If Halladay is having a good year just because batters were unlucky enough to guess right only 23% of the time instead of 25%, you have to adjust, just like you'd adjust for a lucky BABIP.

This will change the definition of "batter swing luck," but not replace it. First, the batter may have been lucky enough to guess right, which is worth something. Then, he might have been lucky enough to get better than expected wood on the ball even controlling for the fact that he guessed right.

You'd want to adjust for all of these. Right now, as I understand WAR, we're adjusting for park and BABIP.

What about the others? Well, we can't really adjust for those. We *want* to, but we can't.

So, we make do with just park and BABIP. Still, no matter how many decimal places we go to with the debate on Kershaw/Halladay, we're still only going to have our best guess.

At least we can argue that if all the other things are random, we should still be unbiased. Right?

Well, not really. From a Bayesian standpoint, we have a pretty good idea who had more luck. It's much more likely to be Kershaw.

Why? Because Halladay's performance is much more consistent with his career than Kershaw's. Kershaw's a good pitcher, but wasn't expected to be *that* good. Halladay, on the other hand, is having a typical Halladay season. Well, a bit better than typical, but not much.

I'd be willing to bet a lot of money that if you found 50 pitchers who had a better-than-career season, by at least (say) 1.5 WAR, you would find that those 50 pitchers had above-average BABIP luck. It stands to reason. I won't make a full statistical argument, but here's a quick oversimplification of one.

A pitcher can have his talent go up or down from year to year. He can have his luck go up or down from year to year. That's four combinations. Only three of them are possibly consistent with a big improvement in WAR: talent up/luck up; talent up/luck down; talent down/luck up. Two of those have his luck going up. So, two times out of three, the pitcher was lucky.

The argument applies to *all* sources of luck. Even after taking BABIP into account, if a pitcher's adjusted performance is still above his career average, he's still more likely to have had good luck than bad, in other ways (batter swings, say).

I don't have an easy way to quantify this, but still I'd give you better-than-even odds that, stripping out all the above, Halladay is performing better than Kershaw -- even after adjusting for park and BABIP.

If you have two players with similar, outstanding performances, the player with the better expectation of talent is probably the one who's actually having the better year. To believe that Kershaw was really likely to have had a better year than Halladay, you really need him to have put up *much* better numbers. Either that, or you need a way to actually work out all the luck, and prove that the residual still favors Kershaw.

I should emphasize that I am NOT talking about talent here. I think most people would agree that Halladay is still more talented than Kershaw, but would nonetheless argue Kershaw might still be having the better season.

But, what I'm saying is, no, I bet Kershaw is NOT having a better season, even if his numbers look better. I'm saying that it's likely that Kershaw *is actually not pitching better*. If we had the data, it's more likely than not that we'd see that batters are just having bad luck -- not only are they (perhaps) hitting the ball directly to fielders, as BABIP suggests, but they're probably swinging and missing at hittable pitches.

---------

Another way to look at it: if two pitchers have mostly the same results, but one has better stuff, what does that mean? It means that the pitcher with the better stuff must have been unluckier than the pitcher with the worse stuff. In other words, the batters facing the better stuff must have been luckier.

We don't know for sure, of course, that Halladay had better stuff than Kershaw. But history suggests that's more likely. And so, the odds are on the side of Kershaw having been luckier than Halladay. How much so?

I don't know. One mitigating factor is that Kershaw is young, so you'd expect more of his improvement to be real. But, still, a small improvement is more likely than a large improvement, so the odds are still on the side of postive luck over negative luck.

---------

Does that take some of the fun out of the Cy Young? I think it certainly does make it a little bit less entertaining, at least until we have better data. That's because, as long as we remain ignorant of a significant amount of luck, it requires a much bigger hurdle to award the honor to anyone other than Halladay.

This is a bit counterintuitive, but it's true. Suppose a good but not great pitcher -- Matt Cain, say -- has almost exactly the same stat line as Roy Halladay, including BABIP, but is actually better in some categories. Perhaps he a couple of extra strikeouts, and a couple fewer walks.

From the usual arguments, there would be absolutely no debate that Cain's season is better, right? He's better than Halladay in some categories, and the same as Halladay in all the others.

But ... if you're trying to bet on which player actually pitched better after removing all the luck, you'd still have to go with Halladay.

-----

UPDATE: on his blog, Tango writes,

Aside to Phil: Marcel had Kershaw with a 3.07 ERA for 2011, and Halladay at 3.04. So, while you make great points in your article, you didn’t have the right examples! Sabathia and Verlander would have been better examples.

Sunday, September 18, 2011

Stock market integrity and the OSC's bizarre Catch-22

Warning: non-sports, non-numbers post. Has to do with securities regulation and put options and bureaucratic illogic. Still, should be comprehensible to all.

-------

The Ontario Securities Commission (OSC) is a government body that regulates capital markets (i.e., stocks, bonds, options, etc.). It declares, among others, a responsibility to "foster fair and efficient capital markets and confidence in the markets." Recently, it made a decision that seems so obviously unfair and wrong that it has the opposite effect -- I am now materially *less* confident of the integrity of the market than I was before. It's not just the precedent this decision sets, but my fear that the OSC just doesn't get it. What unfair decisions will they make next, and is my retirement portfolio in jeopardy?

Of course, I might be wrong in my logic. Please correct me if I am. If you're more up-to-date than I am in how securities regulations work, let me know and I'll post corrections.

I'm going to start with an analogy that illustrates the issue.

--------

I buy a house and a piece of land, for $400,000. I insure the house. There is a regulation on the books, quite reasonable, that it is not legal to sell land that is known to be contaminated, or to sell a house that is known to be uninhabitable. But the house and land are fine, and the sale goes through.

Later, an arsonist burns down half the house and contaminates the land. The state comes in and begins an investigation.

I contact the insurance company. They agree I'm covered for $400,000. They prepare to cut me a cheque.

But, before anything else can happen, the regulator steps in. "You can't do that," they say. "When you settle an insurance claim, it means the house and the land transfer to the insurer. But the house is uninhabitable, and the land is contaminated. So, you can't transfer the house. Therefore, the settlement is illegal."

In any case, the insurer doesn't have to pay. He walks away happy, even keeping my premium, and I'm stuck with the loss.

It's a kind of Catch-22.

---------

Not fair, right? And *obviously* unfair.

What's happened is that the regulator is blindly sticking to a regulation that's not always right. Sure, it might be a good idea to prohibit the sale of contaminated land *as a general rule*. But there are exceptions. This is an exception. In fact, it's an exception where it's exactly the opposite -- where it's absolutely WRONG to prohibit the sale.

It's like, "don't jump out of the fifth floor window, you'll die." Sure. But if the building is on fire, and the smoke is choking you, and the firemen are holding a net below and yelling at you to jump ... then the rule reverses. "Don't NOT jump out the fifth floor window, you'll die."

---------

So here's the real story, which follows the analogy quite closely. Some background first.

Sino-Forest is a Canadian forestry company that does all its business in China. Its stock went from $1 to $24 over the last ten years or so, as it grew and bought forests in China for harvesting. In June, a small company called "Muddy Waters," run by a man named Carson Block, put out a report alleging, with evidence, that Sino-Forest was a fraud -- it didn't really own the forests it claimed it did. The stock dropped immediately from $24ish, and fluctuated between $5 and $8 for the next two months.

The company claimed innocence and hired independent auditors, but no information was forthcoming and the company cited documentation delays. The stock dropped further. At one point, the OSC said it was investigating.

Finally, in August, the OSC claimed there was evidence of fraud. They did not give details, and speculation is that they got their information from the auditors, and from Muddy Waters. The OSC immediately prohibited further trading in the stock, and ordered the CEO to resign.

A few hours later, the OSC was told it didn't have the right to order any resignations. Belatedly realizing it had overstepped its authority, the OSC retracted that part of its order. Nonetheless, the CEO voluntarily stepped down a few days later.

In addition to shares of stock, there were also "put options" trading on Sino-Forest. A put option is a contract between two parties. One party pays the other some money -- say, $1 per share -- and, in return, receives the right (but not the obligation) to force the other party to buy his shares by a certain date, at a certain price. Say, $20 by August 19.

The idea is that you can use a put option as insurance. If you own 100 shares, with a value of $2,000, and you're scared the price will drop, you can buy 100 put options for $100. Then, even if the stock drops, you know you can still get $2,000 for them on August 19.

It's exactly like insurance on a house. You pay $100 for the insurance, and if anything bad happens to your house between now and August 19, the insurance company will take the house away and give you a $2000 settlement.

So, at this point you can guess what happened next. The OSC prohibited the contracts from being exercised. The OSC said, "I don't care if you bought the insurance. Settlement means that you would have to sell the shares to someone else, and we've prohibited you from doing that."

And, of course, August 19 has come and gone. (There are contracts with other expiry dates too -- over different months -- but August 19 was one of them.) The contracts have expired and are now worthless. The OSC blindly followed the rule "it's bad for markets if shares of a fraudulent company are bought." That's not always true. In this case, it's WORSE for the market if the shares are NOT bought.

Normally, people don't want to buy a company when it might be a fraud. In this special case, people DO want to buy a company ONLY when it's a fraud.

This is so obviously wrong that anyone should understand that it's unfair. But, especially, the OSC, which is the regulator, and supposed to be an expert in markets, and how they work, and investor confidence ... how did they make a mistake like that?

Not only is it unfair, but ... if this precedent holds, the entire market for put options falls apart. How do they not get that?

Unless it's me that doesn't get it. Which is certainly possible. If you're a Bayesian, or even if you have normal common sense, you're probably asking yourself: who's more likely to be grossly wrong: Phil, the amateur investor, or the expert regulators at the OSC? If you're Bayesian, you should probably figure that must be me who's wrong, especially when I tell you that I was unable to find anyone in the financial press complaining about any of this. To my knowledge, I'm the first and only one.

But ... I just can't see how this could be right.

---------

Well, this past week, they held a hearing to revisit the decision. I thought they'd say, "oops, sorry, we screwed up," and fix it. But confronted with all the arguments (as I presume they were), they STILL didn't get it. They only "fixed" part of it.

What they did was to say, if you already own the shares, then, OK, you can sell them to the other party for the $20 to complete the transaction and collect on your insurance. But if you happen to have the insurance contract, but you don't have the shares, because you were meaning to buy them later, then you're still SOL. You can't go out and buy the shares from someone else, so that you can collect on your investment. Instead, you have to let your options expire worthless.

Their logic appears to go something like this: "The put option is a contract to sell shares. So we'll make an exception and let you sell shares if you have a previous contact. But we won't let you BUY the shares to sell, because you only have a contract to sell, not to buy. Besides, if you bought a contract to sell, but didn't own any to sell, you're just speculating, so we don't feel much sympathy for you."

But, that's ridiculous. It's a common investment strategy to buy put options on stocks you don't own, if you expect them to drop. Sometimes it's straight speculation that there's fraud, but sometimes it's part of a more complex hedging strategy. Maybe you own a business in China, and you want to insure against a bad Chinese economy, and the easiest way is to buy puts on Sino-Forest. If China goes downhill, and Sino-Forest with it, you buy the worthless Sino-Forest shares, and sell them according to your contract, which gives you the insurance money you need.

(* In any case, since when is speculation something that anyone should be trying to avoid? Speculation is a good thing, as economists will assure you. And, securities regulators, being experts in how capital markets work, know that. Speculators keep the market liquid and efficient, moving prices closer to their true value. I personally would be hesitant to invest without speculators. Right now, I can be pretty sure that I'm paying a fair price for any stock I buy -- if the price were too high, speculators would have stepped in before and sold short to push the price down. Without speculators, I'd be more likely to be getting ripped off.

( But I digress. Oh, and while I'm digressing, a disclosure: I own shares of Sino-Forest, but have never had any Sino-Forest option positions.)

What the OSC has done with its fix is actually worse than what it did originally. It said, "we'll let you enforce your contract if we approve of your investment strategy, but we will screw you around if we don't." That's something the OSC has no business doing, favoring some parties but not others based on the capricious illogic of its bureaucrats. It's also the worst thing you can do for market confidence -- signalling to the world that the rules are unpredictable based on how the regulator feels about you.

For my part, I have bought put options before, on companies I thought were grossly overvalued. I'll be damned if I'm going to do that again, at least in Canada.

Friday, September 16, 2011

Bob McCown on "puck luck"

" ... hockey is enveloped by a culture that demands that everything be rationalized or explained ...

... it's hilarious the way fans react when their team loses a close game. You'd swear the players couldn't do anything right. And yet, when the same team wins a game by a one-goal margin, it's showered in platitudes.

So here's an experiment I'd love to perform sometime.

Let's take the tape of a five-year-old NHL game -- any game -- in which the score ended 3-1. Now, let's edit out the goals and leave all the rest, so that about 59 of the 60 minutes are there to watch.

Now show it to an audience of hockey fans and see if they can guess who won.

I bet they couldn't, because aside from the moments in which the goals are scored, an awful lot of hockey games are nothing but back-and-forth flow, the trading of chances and puck luck.

To have some fun, let's try the same experiment with a bunch of reporters. Then, let's show them the stories they wrote about that exact game.

Most nights in hockey, both teams skate hard, check hard, and go to the net ... And one of them has a puck hit the post and bounce into the net. And the other hits a post and watches it bounce wide. On more nights than you'd believe, the difference is as simple as that ...

In fact, I would say that puck luck, as it is often called, decides roughly half of the close games in the National Hockey League."

Absolutely right.

I'm looking forward to the rest of McCown's book. I'll probably find more things to post about later, if the quality of the first chapter is any indication.

-----

Well, one picky point on McCown's essay: I don't know what "decides roughly half of the close games" means. If a team wins 2-1, what does it mean that luck decided it, or not? That's a bit vague. I know what McCown means to say, and I agree with it on a gut level, but ... I'm not comfortable with phrasing it that way, because I like to have a precise definition.

So let's arbitrarily make one up.

Suppose you did something like what McCown suggested -- you edited a tape of the game to remove the results all the shots and "dangerous" scoring chances (that might or might not have resulted in shots). Then you somehow computed the win probability based only on the situations that appear on tape. Maybe you give a breakaway an expectation of 0.3 goals. And for a point blank slot possession, you assign 0.5 goals. And a slapshot from the point, 0.1 goals. And so on.

You compute an expected score based on that.

Then you look at the real score.

1. If the "wrong" team won, it must have done so by "puck luck".

2. If the "right" team won, but its expectation was to win by less than one goal, then you define that as a win by "puck luck".

That might actually be possible to partly figure out. The NHL website gives all the shots, by distance and type, and Alan Ryder has done lots of research on how to get scoring probabilities for shots. However, the NHL doesn't list other kinds of scoring chances aren't listed, so you'd have to stick to *shot* "puck luck".

In any case, even if you had scoring chances, there would still be luck unaccounted for, in the development of the play. A breakaway might have itself been caused by a defender missing an easy puck. A good chance was caused by three low-odds passes that happened to click. And so on.

So, let's try again. How about, a game is decided by "puck luck" if:

You edit the game per McCown's suggestion, and show it to reporters. You make them bet their own money on who won, against each other at odds that they negotiate. If the overall odds wind up between 60:40 and 50:50, or the overall underdog won the game, then that's a game decided by "puck luck".

I'm not suggesting you actually do this, but that you do a thought experiment and estimate what would happen. There are obviously some games where one team absolutely dominates (and wins). The reporters would obviously get the right answer here ... they'd need 90:10 odds or something to back the underdog. But there are obviously games that would look like toss-ups.

Any other suggestions for how to define that in a way where we could actually talk about how to get an answer?

--------

As an aside, I think this kind of "replay" technique has all kinds of sabermetric applications. To evaluate referee performance, take a tape of the foul, do some digital processing to obscure the players and teams involved, and get referees to judge it. To scout a pitcher, you can avoid being biased by the result of the pitch (a good pitch can still be hit for a home run) by digitally removing the result (and perhaps extrapolating/animating the last few inches, if the pitch was actually contacted). And so on.

I think I proposed this thought experiment once, along the same lines. Suppose you had a time machine. You go 40 years into the future, and you go to MLB.com, and you download video of every inning of every game.

You take 10 players across the spectrum of hitting talent: the equivalent of Albert Pujols, the equivalent of John McDonald, the equivalent of Ichiro Suzuki, and so on. You carefully select 200 AB from each of them, so that those 200 AB show the same batting line for each player, and put them on tape.

Then, you bring those tapes back to the present day, and show them to all the scouts. Would the scouts be able to tell the good players from the bad players?

Tuesday, September 13, 2011

Logical thinking

"During WWII, statistician Abraham Wald was asked to help the British decide where to add armor to their bombers. After analyzing the records, he recommended adding more armor to the places where there was no damage!"

Thursday, September 08, 2011

On inequality of wealth

Note: Non-sports post.

----

In the United States, the top 10% of the population earns 30% of the income. And the top 10% of the population owns 70% of the wealth.

Statistics like these seem to be popping up all over the place lately ... someone I know posted one on Facebook a few days ago, and there was a newspaper article or two in the last month. I'm not sure what happened to bring all this up. (If anyone has links from the last week or two, let me know. I can't find them at the moment.)

A couple of years ago, taking about the Gini Coefficient, I made a bunch of arguments about why the distribution of income doesn't matter much. I think it matters a bit, but not much.

Here, I'm going to concentrate on the distribution of *wealth*. For wealth, I'm going to argue that, given a particular distribution of income, the distribution of wealth is almost completely meaningless as a moral issue, or an issue of people's well being. That is: criticize, if you want, the fact that the top 10% get 30% of the income. But given that income distribution, *it doesn't matter* how much of the wealth the top 10% own: whether it's 10%, 30%, 70%, or 99%.

-----

The difference between income and wealth is that income is a rate, how much you earn in a particular year. Wealth is the total amount that you possess at a specific time.

How does anyone gain wealth? Other than inheritance (which we'll disregard here), you have to save or invest some of your income. You can earn ten million dollars one year, but if you blow it all on cocaine and hookers, your wealth will be zero.

So your wealth is a result of three things: (1) your income, (2) the amount you save, and (3) rate of return on the amount you save. As I said, if you hold (1) as fixed, wealth is affected by only (2) and (3).

Suppose you have two people, John and Mary. They have exactly the same education, and they graduate into exactly the same job, paying $50,000 a year. John spends all his money every year. Mary saves an annual $6,000 in a retirement fund, earning 5%, and spends the rest.

What happens? After 40 years, John has $0 in wealth. Mary has $725,000.

Is it fair to complain about that? I don't think so. Sure, Mary is now (fairly) wealthy while John has to live on just Social Security. But, in the past, John lived much better than Mary, to the tune of $240,000 -- $6,000 a year for 40 years. Some people's tendency would be to take some of Mary's money and give it to John. But that wouldn't be fair. It would actually be quite an injustice. Mary deliberately lived significantly worse than John for 40 years, just so she could have a better retirement. Giving that money to John would *compound* the inequality, wouldn't it? It would take from the (formerly) poor lifestyle and give to the (formerly) rich lifestyle. It would compensate for the future where Mary spends more than John, but not compensate for the past, when John spent more than Mary.

Really, even though Mary has more money than John, over their lifetimes, they're equal. Thirty-five years ago, John spent $4,000 on a new state-of-the-art TV. He knew, when he bought the TV, that $4,000 then would be the equivalent of $22,000 at retirement. He bought the TV anyway. Nothing wrong with that. He chose, freely, to live $4,000 richer than Mary back then, in exchange for living $22,000 poorer than Mary later. Mary also knew the terms of the trade, and made the other choice.

But, over their lifetime, they are exactly equal. $4,000 can buy a lot of things: a vacation, a TV, a boat, a motorcycle, or a retirement fund of $22,000. If John had bought a TV, and Mary had bought a boat, could anyone argue that Mary is richer than John because she has a boat? Of course not -- because, by the same token, John has a TV of equal value.

The same thing applies here: if John has a TV that costs $4,000, and Mary has a $22,000 retirement fund, which also costs $4,000 ... they must be equally rich, right?

-----

When you talk about the distribution of wealth, what you're really talking about, for the most part, is the distribution of a desire to save. And there is no "proper" distribution for that, any more than there's a "proper" distribution of religious beliefs. People are diverse, and they have different tendencies. Some people like to spend, and some people are compulsive savers. Humans choose differently from each other.

Suppose the 1% of the population that owns the most rare baseball cards happens to own 70% of the rare baseball cards. It just means that the other 99% don't care as much about baseball cards. If they have the same income, they just own more other things instead.

Now, you can still argue that the reason it's a problem that the top 10% has 70% of the wealth is that the bottom 90% doesn't earn enough money to be able to save. But that argument is better made by arguing about the *income* distribution. Because, otherwise, you're combining two issues: having money, and choosing to save it. If you were to complain that the top 1% own 70% of the baseball cards *because they have a higher income*, you'd be mostly wrong. Yes, the top 1% of baseball card owners probably DO have a higher income. But that's not the main reason they own 70% of the baseball cards. The *main* reason they own 70% of the baseball cards is because they really, really like baseball cards.

-------

Here's a model, for a numerical example.

Start by assuming a population of 10,000 people. They all have exactly the same education, and they all graduate at age 25 into a job that pays $40,000 a year. They work until they're 65, at which point we measure their wealth.

But they're not all the same, because they have different personalities, and characteristics, and desires. Specifically:

1. They vary in how much money they like to spend. The mean of the population is to spend 90% of their salary and save 10%, but with a standard deviation of 15 points. Nobody saves more than 50% of their salary, or spends more than 115%.

2. They vary in how many children they want, and when. 20% of them want no kids. 20% of them want one kid early in life (age 27), and 20% want one kid later in life (age 35). 20% of them want two kids early, and 20% want two kids late. Kids cost $5,000 a year in expenses to age 18, and then $20,000 annually for the next four years, all of which comes out of saving.

3. They vary in how good they are at investing their money. Some play it safe, and some are more aggressive. Some study investing, and some don't. The average annual return is 4%, with an SD of 1.5 percentage points.

4. They vary in how much effort they put into their job, which affects their annual salary increases. The mean increase is 2% a year, with a standard deviation of 0.5%. Nobody ever gets fired or earns less than $40,000.

5. Nobody goes into debt more than $50,000. Once they reach $50,000, they cut their spending to keep the debt at $50K. All debt is paid off in the year before retirement. Debt earns interest of 10%.

Under these conditions, I ran a random simulation of the 10,000 people.

So, at age 65, what percentage of total wealth will the top 10% own? Take a guess before reading on. I'll write it cryptically so you don't see it by accident when you're thinking.

Ready?

The top 10% of these graduates own (7 * 9 - 22)% of the total wealth.

Got that? It's not as big as the real-life answer of 70%, but it's pretty big nonetheless. And it's *completely* due to the decisions of the individuals themselves. There is no inequality, no racism, no bad schools, no corruption, no government favors, no explotation by greedy employers. It's just natural variation in how human beings choose to live their lives.

----------

Some of the other results:

The top 1% had 7% of the wealth.

The top 10% had 41% of the wealth.

The top 50% had 99% of the wealth.

As you would expect, the wealthiest people were the ones who saved the most and got the highest rates of return. The wealthiest, person number 7,490, wound up with just over $4,000,000 in wealth. She saved 41% of her salary and earned 8.3% per annum. In case you think 41% is a lot ... it's not, really. There are a lot of misers in the world. At retirement, this person earned $55,551, which means she was living on around $33,000 per year. That's not unreasonable for an outlier, just over 2 SD from the mean.

Overall, it turned out that number and timing of children didn't matter much. Neither did salary (although the salaries were all pretty close). Some of the richest people earned below-average salary increases.

So what mattered is how much they saved, and how well they invested it. Of course, my model is way oversimplified, but that does correspond to my perception of how wealth happens in real life, where my sample of friends earns around the same as I do.

---------

One thing I should note is how the "top 1%" figure of 7% is way, way off the real life figure of 38%. Why is that? Well, the main reason is probably that the model didn't consider the possibility of enterpreneurs who can occasionally create a multi-billion-dollar company out of nothing. If Bill Gates and Warren Buffett were in the model, the figure would jump substantially from 7%.

What's more surprising, I think, is that the top 10% number was so high, at 41%. I expected it to be much lower, considering that there's so much less variation here than in the real model:

1. Here, everyone had roughly the same income, between $40,000 and $60,000. Real life, on the other hand, includes sports stars, CEOs, and other people with high productivity.

2. Here, everyone was 65. Older people are obviously wealthier, since they've had much more time to earn and save. If you take these 10,000 retired people, and combine them with 10,000 babies, then the distribution is much more unequal, since you've added a bunch of zeroes. Then, the top 10% jump to from 41% of the wealth to 64%.

The age thing is a big issue. Even if everyone were exactly equal in every way, following the same career path and the same wealth accumulation path, the distribution would be unequal if you take a snapshot in time. You'd be combining 65 year olds who are rich because they've been saving, to 25 year olds who WILL be just as rich, but aren't yet. (That, by the way, is why it's best to look at lifetime income, or at least age-adjusted income, instead of snapshot income or snapshot wealth.)

Oh, and by the way ... I built a certain amount of progressive taxation into the model. I assumed all salary above $40K is taxed at 30%. I also assumed that the savings rate is based on after-tax salary. And finally, I assumed that if you save more than 30% of your after-tax salary, any excess is taxed *again* at 30%. (This was easier than trying to compute tax on investment income.) The numbers above are *after* all this progressive taxation.

----------

So, my argument boils down to something like this (directed at a random skeptic):

You say that the top 10% owns 70% of the wealth, and that's too much. Why is that too much? It can't be just inequality, because here I have a model where everyone is equal, and the top 10% still owns more than 41%. Why do you think 70% is wrong, and what should the number be, and what are your assumptions?

And suppose I cornered you, and asked you to tell me exactly what your policy prescriptions are -- how much to tax the rich, what to do with the money, how to tax investment income, what loopholes to close, how to get the poor to save more, and so on. Then I would ask you, "after all that, how much of the wealth would the top 1% own?"

I'd bet you couldn't answer that. And if you don't know what the distribution of wealth would be in your ideal world, how can you possibly argue that it's the wrong number now?

Saturday, September 03, 2011

Academic editor resigns after publishing flawed study

The editor of an academic journal has resigned after publishing a study that turned out to be flawed.

There's more to it than that, of course ... it's mostly an issue of political correctness, rather than a scholarly one. The study in question was by a politically incorrect author, with a politically incorrect conclusion. The paper, it turns out, was skeptical of climate change, and there are accusations that the peer reviewers who gave it their blessing were also known skeptics.

Still, it's interesting to see how the reaction pretends that's not an issue. The resigning editor, Wolfgang Wagner, wrote:

"[The paper was] "fundamentally flawed and therefore wrongly accepted by the journal ... As the case presents itself now, the [peer review] editorial team unintentionally selected three reviewers who probably share some climate sceptic notions of the authors …the problem I see with the paper by [authors] Spencer and Braswell is not that it declared a minority view (which was later unfortunately much exaggerated by the public media) but that it essentially ignored the scientific arguments of its opponents."

So, let me get the implications straight.

1. It is a very serious matter if a flawed paper is accepted by a journal.2. If the peer reviewers agree with the author on a related scientific theory prior to the paper being published, you should find other peer reviewers.3. If a paper ignores the scientific arguments of its opponents, it should not be published.

Can these people possibly be serious?

1. Flawed papers are accepted by journals ALL THE TIME. The sabermetric community has revealed the flaws in many, many academic studies, and no editor has resigned. Indeed, on several occasions, we have revealed problems with studies when they're still in the "working paper" stage, and they get published anyway.

If an editor had to resign every time a flawed paper got published, no editor would last longer than three months in the job.

2. This only seems to become a principle when the scientific theory in question is politically incorrect. If a journal considers a study *confirming* climate change, do they really go out of their way to find climate skeptics to peer review it? If a psychology journal publishes a study documenting the negative effects of racial bias, do they insist that one of the peer reviewers be a KKK member?

3. Actually, I'm OK with this one. But I can't resist snarping a little bit. Has any editor ever been forced to resign because of the publication of an academic paper on sabermetrics that doesn't know who Bill James is, that barely cites any existing sabermetric research, and that could have been refuted by a sabermetrician in fifteen seconds?

Okay, I'm done snarping. Moving on now.

-------

Have you ever noticed what a big deal it is in academia whenever a study is acknowledged to be flawed? The hands wring, the "mea culpa"s flow, and everyone talks about what could have gone wrong with the process that this was allowed to happen.

It looks good at first, that academia is so concerned about getting it right that they take it so seriously when something is wrong. But, really, it's a veneer, isn't it? They're just trying to signal how serious and ethical they are to people who don't know any better.

More often than not, it doesn't work that way in real life. We've all seen and talked about studies that are obviously flawed, and we've seen examples of academics who deflect the arguments to irrelevant side issues, ignore them completely, or attack our credentials instead of the actual criticism at hand.

Sometimes the community will be a bit more subtle than that ... they won't disown the study explicitly; instead, they'll publish a rebuttal letter, or a study opposing the original. They'll try to position it as a healthy scientific debate between scientists.

But, no matter how flawed the paper, they don't normally demand that the editor resign. And there also seems to be an implicit understanding that you don't pillory the original author, even if the original study was obviously meritless. You just make sure you never cite it favorably, and everyone ignores it and gets on with their lives.

But here, they won't do that. The climate change research community seems to hold the offending author in very low regard. Normally, they'd just ignore the offending author, knowing their peers would extend to them the same courtesy. But it seems like they're just fed up with this Spencer guy. And, perhaps that's for good reason. As one professor said,

"Spencer [one of the co-authors] is well known in the scientific community for publishing high-profile papers that initially dispute global warming and only later are found to be faulty."

Still, this is not a case of academia standing up to defend its strict standards of truth. This seems to be a case of academia having decided that this particular academic is persona non grata on this particular subject, and that they're not going to let him, or his editor, get away with things other academics can.

------

And you know, this turn of events might have been totally the right thing to do. I don't know this Spencer guy. For all I know, he might be an awful scientist, blinded by his political beliefs, trying to publish bad papers anywhere he can get away with, to cast doubt on the climate change hypothesis. In that case, we might agree that because of his repeated disingenuousness, and the poor quality of his work in the past, his latest study should have been subject to extra scrutiny -- and his editor's failure do that represents a resignable offense. (Again, I don't know Spencer's work at all, so this is entirely hypothetical.)

But if that's the case, say so! Don't say "the editor was fired because the paper was flawed." It makes you academics sound ridiculous, like you consider yourselves more infallible than the pope. It sends the idea that everything that makes it into a journal is invariably 100 percent correct -- because, if it weren't, would the editor still be working here?

If you say, "the editor was fired expected to resign because the paper was flawed," you sound arrogant and silly, not to mention dishonest.

The critics should just tell the truth. They should say, "Look, there's this one guy who's acting like a dork. He's putting together these crappy studies, which have no scientific merit, and he won't do what scientists are supposed to do and look at the data objectively. He's so politically committed to his hypothesis that he doesn't care about making his studies hold together, and he gets everything wrong.

"Now, we're scientists, so we are very open to the idea that our current theories might be wrong, and there are skeptical scientists whose research is valid and whom we respect. But not this guy. His work has been so bad, for so long, that it's incumbent on any editor to double and triple-check his work to make sure he's not doing it again. In that light, when his editor failed to do that, it's akin to negligence. So it's only appopriate that he resign."

That would make sense. But, I guess, to the general public, it doesn't look as good.