Saturday, July 12, 2014

Nate Silver and the 7-1 blowout

Brazil entered last Tuesday's World Cup semifinal match missing two of their best players -- Neymar, who was out with an injury, and Silva, who was sitting out a red-card suspension. Would they still be good enough to beat Germany?After crunching the numbers, Nate Silver, at FiveThirtyEight, forecasted that Brazil still had a 65 percent chance of winning the match -- that the depleted Brazilians were still better than the Germans. In that prediction, he was taking a stand against the betting markets, which actually had Brazil as underdog -- barely -- at 49 percent. Then, of course, Germany beat the living sh!t out of Brazil, by a score of 7-1.

"Time to eat some crow," Nate wrote after Brazil had been humiliated. "That prediction stunk."

I was surprised; I had expected Nate to defend his forecast. Even in retrospect, you can't say there was necessarily anything wrong with it.

What's the argument that the prediction stunk? Maybe it goes something like this:

-- Defying the oddsmakers, Nate picked Brazil as the favorite.

-- Brazil suffered the worst disaster in World Cup history.

-- Nate's prediction was WAY off.

-- So that has to be a bad prediction, right?

No, it doesn't. It's impossible to know in advance what's going to happen in a soccer game, and, in fact, anything at all could happen. The best anyone can do is try to assign the best possible estimate of the probabilities. Which is what Nate did: he said that there was a 65% chance that Brazil would win, and a 35% chance they would lose.

Nate said Brazil had about twice as much chance of winning as Germany did. He did NOT say that Brazil would play twice as well. He didn't say Brazil would score twice as many goals. He didn't say Brazil would control the ball twice as much of the time. He didn't say the game would be close, or that Brazil wouldn't get blown out.

All he said was, Brazil has twice the probability of winning.

The "65:35" prediction *did* imply that Nate thought Brazil was a better team than Germany. But that's not the same as implying that Brazil would play better this particular game. It happens all the time, in sports, that the better team plays like crap, and loses. That's all built in to the "35 percent".

Here's an analogy.

FIFA is about to pick a random number of dollars between 1 and 1,000,000. I say, there's a 65 percent chance that the number drawn will be higher than the value of a three-bedroom bungalow, which is $350,000.

That's absolutely a true statement, right? 650,000 "winning" balls out of a million is 65 percent. I've made a perfect forecast.

After I make my prediction, FIFA reaches into the urn, pulls out out one of the million balls, and it's ... number 14.

Was my prediction wrong? No, it wasn't. It was exactly, perfectly correct, even in retrospect.

It might SEEM that my prediction was awful, if you don't understand how probability works, or you didn't realize how the balls were numbered, or you didn't understand the question. In that case, you might gleefully assume I'm an idiot. You might say, "Are you kidding me? Phil predicted you could buy a house for $14! Obviously, there's something wrong with his model!"

But, there isn't. I knew all along that there was a chance of "14" coming up, and that factored into my "35 percent" prediction. "14" is, in fact, a surprisingly low outcome, but one that was fully anticipated by the model.

When Nate said that Brazil had a 35 percent chance of losing, a small portion of that 35 percent was the chance of those rare events, like a 7-1 score -- in the same way my own 35 percent chance included the rare event of a really small number getting drawn.

As unintuitive as it sounds, you can't judge Nate's forecast by the score of the game.

-------

Critics might dispute my analogy by arguing something like this:

"The "14" result in Phil's model doesn't show he was wrong, because, obviously, which ball comes out of the urn it just a random outcome. On the other hand, a soccer game has real people and real strategies, and a true expert would have been able to foresee how Germany would come out so dominating against Brazil."

But ... no. An expert probably *couldn't* know that. That's something that was probably unknowable. For one thing, the betting markets didn't know -- they had the two teams about even. I didn't hear any bettor, soccer expert, sportswriter, or sabermetrician say anything otherwise, like that Germany should be expected to win by multiple goals. That suggests, doesn't it, that it was legimately impossible to foresee?

I say, yes, it was definitely unknowable. You can't predict the outcome of a single game to that extent -- it's a violation of the "speed of light" limit. I would defy you to find any single instance where anyone, with money at stake, seriously predicted a single game outcome that violates conventional wisdom to anything near this extent.

Try it for any sport. On August 22, 2007, the Rangers were 2:3 underdogs on the road against the Orioles. They won 30-3. Did anyone predict that? Did anyone even say the Rangers should be heavy favorites? Is there something wrong with Vegas, that they so obviously misjudged the prowess of the Texas batters?

Of course not. It was just a fluke occurrence, literally unpredictable by human minds. Like, say, 7-1 Germany.

Huh? [Nate Silver] says his prediction “stunk,” but it was probabilistic. No way to know if it was even wrong.

Exactly correct.

--------

So I don't think you can fault Nate's prediction, here. Actually, that's too weak a statement. I don't mean you have to forgive him, as in, "yeah, he was wrong, but it was a tough one to predict." I don't mean, "well, nobody's perfect." I mean: you have no basis even for *questioning* Nate's prediction, if your only evidence is the outcome of the game. Not as in, "you shouldn't complain unless you can do better," but, as in, "his prediction may well have been right, despite the 7-1 outcome."

But I did a quick Google search for "Brazil 7-1 Nate Silver," and every article I saw that talked about Nate's prediction treated it as certain that his forecast was wrong.

1. Here's one guy who agrees that it's very difficult to predict game results. From there, he concludes that all predictions must therefore be bullsh!t (his word). "Why did they even bother updating their odds for the last three remaining teams at numbers like 64 percent for Germany, 14 percent for the Netherlands, when we just saw how useless those numbers can be?"

Because, of course, the numbers *aren't* bullsh!t, if you correctly interpret them as probabilities and not certainties. If you truly believe that no estimate of odds is useful unless it can successfully call the winner of every game, then how about you bet me every game, taking the Vegas underdog at even money? Then we'll see who's bullsh!tting.

2.This British columnist gets it right, but kind of hides his defense of Nate in a discussion of how sports pundits are bad at predicting. Except that he means that sabermetricians are bad at correctly guessing outcomes. Well, yes, and we know that. But we *are* fairly decent at predicting probabilities, which is all that Nate was trying to do, because he knows that's all that can realistically be done.

"To be fair to Nate Silver + 538, their model on the whole was excellent. It's how they dealt with Brazil where I (and others) had problems."

What kind of problems? Not picking them to lose 7-1?

In fairness, sure, there's probably some basis for critiquing Nate's model, since he's been giving Brazil siginficantly higher odds than the bookies. But, in this case, the difference was between 65% and 49%, not between 65% and "OMG, it's a history-making massacre!" So this is not really a convincing argument against Nate's method.

It's kind of like your doctor says, "you should stop smoking, or you're going to die before you're 50!" You refuse, and the day before your fiftieth birthday, a piano falls on your head and kills you. And the doctor says, "See? I was right!"

4. Here's a mathematical one, from a Guardian blogger. He notes that Nate's model assumed goals were independent and Poisson, but, in real life, they're not -- especially when a team collapses and the opponent scores in rapid-fire fashion.

All very true, but that doesn't invalidate Nate's model. Nate didn't try to predict the score -- just the outcome. Whether a team collapses after going down 3-0, or not, doesn't much affect the probability of winning after that, which is why any reasonable model doesn't have to go into that level of detail.

Which is why, actually, losing 7-1 loss isn't necessarily inconsistent with being a favorite. Imagine if God had told the world, "if Brazil falls behind 2-0, they'll collapse and lose 7-1." Nate, would have figured: "Hmmm, OK, so we have to take subtract off the chance of 'Brazil gives up the first two goals, but then dramatically comes back to win the game,' since God says that can't happen."

Nate would have figured that's maybe a 1 percent of all games, and say, "OK, I'm reducing my 65% to 64%."

So, that particular imperfection in the model isn't really a serious flaw.

But, now that I think about it ... imagine that when Nate published his 65% estimate, he explicitly mentioned, "hey, there's still a 1-in-3 chance that Brazil could lose ... and that includes a chance that Germany will kick the crap out of them. So don't get too cocky." That would have helped him, wouldn't it? It might even have made him look really good!

I mean, he shouldn't need to say it to statisticians, because it's an obvious logical consequence of his 65% estimate. But maybe it needs to be said to the public.

First, she argues that Nate should have paid attention to sportswriters, who said Brazil would struggle without those missing players. Researchers need to know when to listen to subject-matter experts, who knew something Nate's mathematical models don't.

Well, first, she's cherry-picking her sportswriters -- they didn't ALL say Brazil would lose badly, did they? You can always find *someone*, after the fact, who bet the right way. So what?

As for subject-matter experts ... Nate actually *is* a subject matter expert -- not on soccer strategy, specfically, but on how sports works mathematically.

On the other hand, a sociology professor is probably an expert in neither. And it shows. At one point, she informs Nate that since the Brazilian team has been subjected to the emotional trauma of losing two important players, Nate shouldn't just sub in the skills of the two new players and run with it as if psychology isn't an issue. He should have *known* that kind of thing makes teams, and statistical models, collapse.

Except that ... it's not true, and subject-matter experts like Nate who study these things know that. There are countless cases of teams who are said to "come together" after a setback and win one for the Gipper -- probably about as many as appear to "collapse". There's no evidence of significant differences at all -- and certainly no evidence that's obvious to a sociologist in an armchair.

Injuries, deaths, suspensions ... those happen all the time. Do teams play worse than expected afterwards? I doubt it. I mean, you can study it, there's no shortage of data. After the deaths of Thurman Munson, Lyman Bostock, Ray Chapman, did their teams collapse? I doubt it. What about other teams that lost stars to red cards? Did they all lose their next game 7-1? Or even 6-2, or 5-3?

Anyway, that's only about one-third of the post ... I'm going to stop, here, but you should read the whole thing. I'm probably being too hard on this professor, who didn't realize that Nate is the expert and not her, and wrote like she was giving a stock lecture to a mediocre undergrad student quoting random regressions, instead of to someone who actually wrote a best-selling book on this very topic.

So, moving along.

------

There is one argument that would legitimately provide evidence that Nate was wrong. If any of the critics had chosen to argue convincing evidence for Brazil actually having much less TALENT than Nate and others estimated, evidence that was freely available before the game ... that would certainly be legitimate.

Something like, "Brazil, as a team, is 2.5 goals above replacement with all their players in action, but I can prove that, without Neymar and Silva, they're 1.2 goals *below* replacement!"

That would work.

And, indeed, some of the critiques seem to be actually suggesting that. They imply, *of course* Brazil wouldn't be any good without those players, and how could anyone have expected they would be?

Fine. But, then, why did the bookmakers think they still had a 49% chance? Are you that smart that you saw something? OK, if you have a good argument that shows Brazil should have been 30%, or 20%, then, hey, I'm listening.

If the missing two players dropped Brazil from a 65% talent to a 20% talent, what is each worth individually? Silva is back for today's third-place game against Holland. What's your new estimate for Brazil ... maybe back to 40%?

Well, then, you're bucking the experts again. Brazil is actually the favorite today. The betting markets give them a 62% chance of beating the Netherlands, even though Neymar is still out. Nate has Brazil at 71%. If you think the Brazilians are really that bad, and Nate's model is a failure, I hope you'll be putting a lot of money on the Dutch today.

Because, you can't really argue that Brazil is back to their normal selves today, right? An awful team doesn't improve its talent that much, from 1-7 to favorite, just from the return of a single player, who's not even the team's best. No amount of coaching or psychology can do that.

If you thought Brazil's 7-1 humiliation was because of bad players, you should be interpreting today's odds as a huge mistake by the oddsmakers. I think they're confirmation that Tuesday's outcome was just a fluke.

As I write this, the game has just started. Oh, it's 2-0 Netherlands. Perfect. You're making money, right? Because, if you want to persuade me that you have a good argument that Nate was obviously and egregiously incorrect, now you can prove it: first, show me where you wrote he was still wrong and why; and, second, tell me how much you bet on the underdog Netherlands.

Otherwise, I'm going to assume you're just blowing hot air. Even if Brazil loses again today.

-----Update/clarification: I am not trying to defend Nate's methodology against others, and especially not against the Vegas line (which I trust more than Nate's, until there's evidence I shouldn't). I'm just saying: the 7-1 outcome is NOT, in and of itself, sufficient evidence (or even "good" evidence) that Nate's prediction was wrong.

48 Comments:

Even the oddsmakers failed to predict the 7-1 rout -- but, at least, they had correctly picked Germany as the favorite! That shows there must be something wrong with Nate's model ... whatever he thought he saw, that the oddsmakers didn't, must have been wrong!

That works to perhaps cast some doubt on Nate's model, but it's still quite plausible that Nate was right and the bookies were wrong. Whatever the explanation was behind the collapse, it might not be that much less consistent with a 65% probability than a 50% probability.

If you think it is, you have to prove it. Or at least, argue it.

Suppose the soccer game had been a series of nine coin tosses. Nate gave "heads" a 65% chance of winning, and the bookies said 50%. Then, the first eight tosses go 7-1 for tails. What's the chance that Nate was right?

Well, if Nate gave heads a 65% chance, you can work backwards to show that he thought every toss had an 56.3% chance of landing heads, while the bookies thought 50%.

Either way, a 1-7 coin would be rare: by my simulation, it would happen 1.36 percent of the time for the 56.3% coin, and 3.17 percent of the time for the fair coin. (I was too lazy to figure it out exactly.) So, it's still quite plausible that Nate was correct all along -- the 1-7 blowout would still happen almost 40 percent as often by his estimation than by the bookmakers' estimation.

You can't do a soccer game quite the same way, and what you get for your final answer depends on your model. But I still suspect that 1-7 isn't that much less likely by a 65% team than a 49% team. And, you have to evaluate all that along with the results of all Nate's other forecasts (since it's the same model).

If you do that, and you reach a conclusion that Nate's model is almost certainly wrong ... I'm listening. But, "he said 65% and they lost 7-1!" is nowhere near enough.

Nice post, and I agree with you. Based on your logic you can obviously argue that any forecast result is correct, which at an instant in time it obviously is.

They key, as you allude to, is how the model performs over time. Again, as you point out the model predicted 12/12 correct results leading up to rout.

Is this evidence that Nate's model is 'good'. I guess it is evidence that the model doesn't completely suck. However, all these results went with the betting markets so in that context it is hard to assert that Nate's model outperformed. (I don't know how it did in the Group stages -- although I did read a post on his site about how he has consistently outperformed the betting markets by making smart bets where there is an odds difference between his model and the markets).

Again as you point out the biggest issue with Nate's model was the Brazil win % through the tournament. This seemed a function of HFA - partly predicated on the fact that Brazil hadn't a competitive home game for close on 40. While true the reality is that Brazil don't play that many competitive home games. They play world cup qualifiers (and not for the 2014 WC as they were hosts). Until 2010 LATAM football hasn't had any great sides until Maradona's side in the late 1980s. Therefore strength of schedule has been massively in Brazil's favour. And they have only hosted Copa America once in that period (which Brazil won).

So my view is that the HFA assumption for Brazil was giving them far too much weight ...

The other factor that I mentioned on Tango's blog (which he didn't agree with because of lack of data) is whether Germany's win % estimate was lower than it should be.

The Germany team is a rare beast in that I think 8/11 starting players play in the Bundesliga and 7 play for Bayern Munich. These players know each other and play week in week out with each other. They play as a team. That is (I believe) a big advantage but not necessarily easy to prove with data. The English team would be a counter example where the opposite is true. Nate's model SHOULD account for this as it has an individual and team component. I believe Nate's model overweight the individual component, which I think it right, but Germany is an outlier and I think the team component should have more weight.

There are many good observations in this piece. It seems worth asking: "out of the last 14 matches, how many did 538 get right?" "How did 538 do compared to the Las Vegas line?" How well did Baidu do?"In short its worth looking all many predictions based on a various models (538's, Baidu's, etc.). One surprise (Brazil's loss) does not break the model. But, it is reasonable to find that most of the predictions picked eventual winners. Baidu had 14 for 14. Strangely, I don't see 538's record on 538.

Blaine - 538 hasn't tracked itself directly, but in a couple of posts Nate has linked to someone who has tracked the predictions. As Phil mentioned, they were 12-12 until the Brazil-Germany match. Now they're 12-14 with the Netherlands victory.

You are focusing on Nate's prediction that Brazil had a 65% chance of defeating Germany. You are correct that Brazil's loss does not mean that this prediction was wrong, because the prediction included a 35% chance that Brazil would lose.

The other prediction is a reasonable inference based on Nate's 65% prediction, the reasonable inference being that Nate's model predicted Brazil to defeat Germany; that appears to be a reasonable inference because, otherwise, we'd have to say that Nate's model predicted Brazil to have a 65% chance of winning but did not predict them to win. In this case, Nate is not required to formally state "My model predicts that Brazil will defeat Germany"; Nate only needs to provide a statement from which "Brazil will defeat Germany" is a reasonable inference about what his model would predict regarding the win/loss outcome. Therefore, the argument that Nate's 65% prediction was wrong goes like this:

1. Nate's model predicted that Brazil has a 65% chance of defeating Germany.2. From this, we can infer that Nate's model would predict the outcome as a Brazil win.3. The prediction based on that inference was wrong.

---

I suppose one could argue that the inference is not valid. But, in that case, it does not appear that Nate's 65% prediction can ever be validated. So instead of discussing whether Nate's 65% prediction is wrong, we should be discussing whether Nate's 65% prediction is not even wrong.

---

I realize that you claimed that there is one argument to show Nate's 65% prediction was wrong:

"There is one argument that would legitimately provide evidence that Nate was wrong. If any of the critics had chosen to argue convincing evidence for Brazil actually having much less TALENT than Nate and others estimated, evidence that was freely available before the game ... that would certainly be legitimate."

This seems to be saying that judging Nate's prediction can only be done by judging one particular variable that Nate plugged into his prediction model, the variable here being talent. But Nate's 65%-chance-for-Brazil prediction could also have been wrong if he correctly measured talent but his model incorrectly weighted talent. Maybe Nate did not correctly weight home field advantage; maybe Nate did not correctly weight the weather; or maybe the model predicted a 56% chance of winning but someone transposed the digits.

---

By the way, it appears that Nate did defend his forecast, at least with regard to the win/loss outcome: he noted that his model gave Germany a 35% chance of winning. Regarding the score, Nate invoked Nassim Taleb's "The Black Swan"; I think that it's fair to say that invoking the Black Swan when the favorite outcome in your prediction does not occur is at some level defending the prediction.

I don't see a necessary contradiction. It would be a contradiction if we interpret "Brazil will defeat Germany" as a guarantee, but there's no reason to interpret it as a guarantee.

The inference is only that Nate's model predicted Brazil to be the most likely winner in Brazil's game against Germany; that prediction is consistent with the estimation of probabilities that Nate provided; moreover, the prediction that "Brazil will defeat Germany" and the estimation that "Brazil has a 60% chance of winning" can both be correct or incorrect, so there is no contraction.

Let's say that I estimate that the Jacksonville Jaguars have a less than 5% chance of winning the next Super Bowl; then I predict that the Jaguars will not win the next Super Bowl. I don't think that I contradicted myself.

"Let's say that I estimate that the Jacksonville Jaguars have a less than 5% chance of winning the next Super Bowl; then I predict that the Jaguars will not win the next Super Bowl. I don't think that I contradicted myself."

Why not? If the chance isn't zero, how can you say they won't win it? You can GUESS they won't win it? You can BET they won't win it. But once you give a percentage, it's meaningless to say they will or they won't.

The burden of proof is on Nate or anyone else who contructs a model to prove how good it is. Anything less than that and then people are just arguing for the sake of arguing. This one game does not prove anything. I am confident that Nate's model is bad but the burden of proof is not on me. I have a feeling it was nothing more than a fun exercise and we won't see much from it.

I'm both a big fan of Nassim Taleb's, and not a fan of his snarky/extremist twitter feed. So while I don't really agree with his critique, I think it's worth considering:"@nntaleb · Jul 11Extreme score GER v BRA: "empirical' past underestimates future tail events, a property missed by mechanistic statisticians & other idiots."

And the followup tweet: https://twitter.com/nntaleb/status/487640135761395712

I post this mostly because Taleb is smarter than I am, so while this tweet doesn't make much sense to me, I'm curious what other smart people make of it.

Nice post - and I agree with you for this game. The issue to me with Nate's model (and the values published by Goldmann Sachs) awsn't the odds for this particular game, but the size of the HFA being given to Brazil throughout the entire tournament. Once Brazil had qualified from their group Silver and Sachs both had Brazil as 45+% to win the whole tournament. .82^4 = .452, so in at least one game they both had to have Brazil as >82% to win.

In the '13-14 Premiership season these were the games where the home team was an 82-84% favourite to win if we remove the draw as a result (the minimum being set in at least one of the games)

This is kind of a crude way of doing it but the home and away teams in that list finished the season with an average league position of 6th and 14th, respectively. So let's say that's the equivalent to a game between Tottenham and Sunderland (the teams who finished the season 6th and 14th, respectively).

In a neutral ground game between Brazil and Tottenham I think Brazil would be favourites, but not overwhelmingly so - Spurs have solid international players throughout their lineup. However in a neutral venue game between one of Germany/Netherlands/USA/Uruguay/other knockout team and Sunderland I think that just by considering the number of Sunderland players who made it to the world cup we should be able to say (assuming that national team coaches are even only mediocre at assessing talent) that Sunderland would be heavy underdogs.

And therein lies the problem for me. To square the circle to the extent that the two of those statements are true and yet Brazil would still be considered 82% to advance against one of the other knockout teams means that the HFA being afforded to Brazil is incredibly (and to my eye unreasonably) large.

FWIW I also wrote this prior to the tournament here http://jameswgrayson.wordpress.com/2014/05/29/a-quick-note-on-the-absurd-goldman-sachs-world-cup-predictions/

I'm not saying that they won't win it; I'm saying that my best guess is that they won't win it.

---

"But once you give a percentage, it's meaningless to say they will or they won't."

Yes, it would be meaningless to give a percentage and then say that they will or they won't. But that's not what happened in the Jaguars example, or at least that's not what I meant: the intention of that example was to give a percentage and then use that percentage to inform a best guess about what will happen.

If I estimate that there's a 90% chance of rain, it's not meaningless or contradictory to grab an umbrella: my choice is umbrella or no umbrella, so I need to translate my percentage into a dichotomous umbrella/no umbrella choice. Taking an umbrella does not mean that I put a 100% probability on rain: it means only that my estimation of the probability of rain was over threshold necessary for me to take an umbrella.

---

I think that I am operating with a different definition of "predict" than you are. You seem to be interpreting "predict" as "guarantee", in the sense that translating Nate's model's estimation of probabilities into a prediction about who will win the game requires changing Brazil's 65% probability into a 100% probability. I agree that such a change is not correct: no one should perceive that Nate's model predicted a Brazil win with 100% probability.

But I think that it is tolerable to use "predict" so that "Nate's model would predict the outcome as a Brazil win" is understood as Brazil being the model's best guess about which team would win the game; in this sense, "Nate's model would predict the outcome as a Brazil win" is equivalent to "Nate's model has Brazil as the favorite": in both cases, the model's probabilities are translated into a best guess about which team will win. (I'm not recommending this, but I think that it's tolerable.)

Now, it can be argued that "predict" in this sense can lead to confusion; that's reasonable.

It can also be argued that there is no need in this case to translate a 65% probability into a best guess; that's reasonable, too.

---

We do have points of agreement: I'd agree that it's not correct to use the outcome of the Brazil-Germany game to evaluate the model's estimation of Brazil's chances at 65% or to use the outcome in isolation to evaluate the model; if the model's underdogs never win, we should actually be suspicious that the model is underestimating probabilities for the favorites.

"Nate's model would predict the outcome as a Brazil win" is something I would not agree with.

Nate doesn't necessarily want to guess who wins. In fact, he's sure if he does, he will be wrong at least 35% of the time.

I don't agree that the percentage "informs a best guess about what will happen." The percentage IS the best guess about what will happen. Anything beyond the percentage is unpredictable (by definition, or by Nate).

The article is wrong stating the market price was 50-50, or even Germany favored 51-49%. True, market opened 50-50 and ticked a bit towards Germany early, but late money was mostly on Brazil. Closing line at Pinnacle was Brazil +161, Germany +214 Draw +219. Too bad I do not have the “to advance” closing line, but with such numbers should be about Brazil -120 Germany +110 meaning about 53% for Brazil.

"Anything beyond the percentage is unpredictable (by definition, or by Nate)."

Sorry if I am being dense (it's not intentional), but I do not understand how someone can predict a percentage but cannot predict an outcome; I understand and agree with you if your definition of "predict" is restricted to "know for certain beforehand" or "make an educated guess about something that can be known beforehand." (I think that definition is too restrictive*, but at least I'd understand your line of thought.)

* If someone says "it's going to rain tomorrow," I have no trouble calling that a prediction.

---

"Nate doesn't necessarily want to guess who wins."

That's fair, but I think that Nate would have preferred that his model had Germany as the favorite.

I can understand what I perceive to be your argument from the post, that Brazil losing was consistent with Nate's estimation of probabilities and therefore there is no reason to think that Nate got things wrong ("you have no basis even for *questioning* Nate's prediction, if your only evidence is the outcome of the game").

But, in that case, if for some reason Brazil and Germany played another game tomorrow in the exact same circumstances with the exact same players, my understanding of your argument is that Nate should still predict a Brazil win with 65% probability, because "the 7-1 outcome is NOT, in and of itself, sufficient evidence (or even 'good' evidence) that Nate's prediction was wrong." In other words, when predicting the outcome of the second game, Nate should ignore the fact that Germany beat Brazil 7-1 in the first game.

If someone says, "it's going to rain tomorrow," what they're *really* saying is, "there's a high probability that it's going to rain tomorrow." So, if Nate says, "there's a 65% chance that Brazil is going to win tomorrow," he's already said what you "want" him to say, just in more detail.It's not necessarily true that Nate wishes he had picked Germany as the favorite. If he picks Brazil to win 65% of the time, he's most accurate when Brazil wins ONLY 65% of the time. If he picks Brazil at 65%, and Brazil wins 100%, his 65% predictions are very poor.

As Tango said, "Nate Silver is more right when he's sometimes wrong." http://tangotiger.com/index.php/site/comments/nate-silver-is-more-right-when-hes-sometimes-wrong-rather-than-when-hes-nev

If Brazil had to play Germany again, Nate would update his model, and, no doubt, he'd give Brazil less than a 65% chance. But not much less, I'd think, the same way that after Baltimore lost 30-3, their odds for future games didn't change much.

I understand the point about Nate wanting his probabilities to match reality: if Nate pegs each favorite at 65%, then Nate should prefer that his favorites win 65% of the time, because that provides good evidence that Nate has correctly identified the probabilities.

Maybe I'm thinking about this wrong, but I think that Nate might prefer this scenario: Nate's 65% favorites consistently win 80% of the time; Nate adds 15 percentage points to the favorites' probabilities; Nate's 80% favorites now consistently win 80% of the time, and Nate is happier than he would be in the world in which his 65% favorites win 65% of the time.

That's not the most realistic example, but the basic idea that Nate would prefer to be wrong now if it meant a higher ceiling on how well he could estimate probabilities for World Cup games.

---

Thanks for engaging my comments: I appreciate having to think deeper about things.

Another 'semi-empirical' counter to Nate's model is this list: http://www.theguardian.com/news/datablog/2013/dec/24/world-best-footballers-top-100-list-2013-lionel-messi

It is a reasonably well respected list of the top 100 footballers in the world as of the start of the year i.e., not influenced by the WC. In the top 50 there are 9 Germans and 4 Brazilians (just) -- you need to exclude Costa as he elected to play for Spain.

That balance suggests on neutral ground that Germany should win. factoring HFA I can believe you get to a 50:50 split, which is where the markets were. I don't think HFA swings you to 65:35.

""To be fair to Nate Silver + 538, their model on the whole was excellent. It's how they dealt with Brazil where I (and others) had problems."

What kind of problems? Not picking them to lose 7-1?"

What this person is probably referring to is the HFA given to Brazil and the too-high odds assigned to them to win the whole World Cup at the beginning, and the fact that it differed from the bookmakers which led to hard-to-believe implied probabilities like 100% to win group and 82% to win each knockout stage game.

I have to disagree with the thrust of your article. I have many observations.

1) The fact that it was 7:1 is significant, because Nate himself admits his model is Poisson distribution based and treats goals as independent events. He himself admitted his model had different probabilities for specific scorelines, and this one (in terms of goal difference, and not 7-1 specifically) was 4,000:1. Sure. That can happen, after all it should happen once in 4,000 times. The point is when the model itself has very little past track record and throws up a 4,000-1 outlier, the more likely inference is that the model was flawed, not that the result was so unusual.

2. The fact that Nate's model had done so well in the knock-outs isn't so relevant, because all his model really did was identify the favourite. Over a 15 game sample when favourites did well, of course his model did too. In none of the games was it even close, so ANYONE could have identified the favourites.

3) It seems needlessly confrontational to assert that no one predicted Netherlands/Germany to win by asking them to state how much they won. For the record, I did win those bets.

4)I am disappointed that you chose to address the really weak arguments against Nate's model rather than the real criticisms which are;

a)It assumes goal distributions are independent events when they are not.

B) It uses transfer value as proxy for player skill which is hugely flawed, since that reflects age, nationality, contract status and the like to a very large degree.

C) It uses data from games far in the past when none of the players were common. Brazil was rated so highly because of the efforts of Brazilians who haven't been on a pitch in years.

1. I'd agree with you that the Poisson model is probably not appropriate for extreme scores. But, as I wrote, that doesn't necessarily mean the probability estimates are wrong. Or, at least, not wrong by much.

2. Agreed.

3. My argument is: if you think it was obvious that Brazil would lose badly to Germany, you should also think it was obvious (although less obvious) that they would lose badly to the Netherlands. I say, show me that you wrote that it was obvious, or that you placed a big bet, instead of telling us in hindsight. That's all.

4. I am not at all trying to address ALL arguments against Nate's model. Just the argument that the 7-1 score proves he's wrong.

For the record, the other arguments do make some sense to me. And, as I wrote, I believe the bookmakers' odds are closer to the true probabilities than Nate's odds. But that's not the topic of this post!

Phil, you make a very important point in your first comment that deserves more prominence in a post this long. From a Bayesian point of view, it's very legitimate to question the SPI (or SPI + Nate's modifications) more after this result than before it.

I think you get why, but let's see if I can explain it simply. Before the BRA-GER game, our prior assumption is a probability distribution of "how good" we think SPI is. To simplify, I'm going to call it a discrete probability that SPI makes a "good prediction" versus a "bad prediction." Because SPI is still very much under development, the probability of a "good prediction" is still far short of one. Let's pretend it's still a pretty high probability that SPI makes a good prediction, say 0.95.

Now, we get the crazy 7-1 result. It would be an improbable result whether SPI made a good prediction or a bad one. Let's take Nate's estimate that a result that dramatic is a 1/20,000 shot if the prediction was good. But let's say it's far less improbable, albeit still improbable, if the prediction was bad -- maybe 1/1000.

That's a far cry from our prior probability of 0.95, so we'd probably want to put in a lot more effort into verifying the SPI because of this result.

Of course these are just made up numbers, but the point is, yes, you're absolutely correct that the 7-1 doesn't "disprove" the SPI's validity. However, the intuitive conclusion, that something went badly wrong with the SPI prediction and requires attention, is perfectly justifiable.

And incidentally, your coin toss model isn't really very applicable because the 35% underdog winning a soccer game 7-1 is much much less frequent than the coin coming up seven tails out of eight.

I think the technical reason is that we need a Poisson distribution of something that doesn't happen that often per minute (a soccer goal). One intuitive reason is, soccer games very often end with 3 or fewer goals, but series of eight coin tosses don't end after three or fewer tosses.

They're totally different concepts. Consider using Nate's 1/20000, not an inappropriate binomial model.

I've always thought that's an interesting point: a Bayesian inference to estimate *whether the logic is right*. I agree with you, except that, I suspect that Nate didn't bother tweaking the model for extreme cases because he knew they didn't matter much for just the win probability. But your point stands.

I would argue that if your best estimate is that team A has a 60% chance of winning, there is NO POINT predicting who's going to win. It would be a random guess. Picking a specific winner for a specific game is NOT what Nate's model is all about.

Nate's model says (to my mind): "The best we can do is say that Brazil has a 65% chance. The rest is unknowable to human minds."

If you want a prediction after that, you're really asking for a RANDOM GUESS.

"Nate's model says (to my mind): 'The best we can do is say that Brazil has a 65% chance. The rest is unknowable to human minds.'"

Agreed. But I would say it like this:

This specific game involving Brazil, to our best estimation, belongs in a class (or category, or set) of games in which Brazil-like teams win 65% of the time. Whether this specific Brazil-like team will win this specific game (and be among the 65% winners), or lose (and be among the 35% losers), no one knows (provided the game is on the up-and-up).

The point is that the % prediction comes from the assignment of the specific game to a class of games, the overall outcome of which is X. Whether the assignment to the particular class is correct or incorrect is a different question.

I would not agree -- but I might be convinced otherwise -- that it's a random guess to pick team A to win if Nate's best estimate is that team A has a 60% chance to win. My understanding is that "random guess" implies or indicates an equal chance for team A and team A's opponent. It would be a random guess to pick heads for the flip of a fair coin; but if I am given a choice between [41 or higher] or [40 or lower] for predicting the number of a single draw of a lottery ball from a bin in which 100 lottery balls are numbered from 1 to 100, then that does not seem like a situation in which I'd want to randomly select one of those two options.

I agree that the act of drawing the ball out of the urn would be a random act, but I would not make a random guess about whether the number on the ball is [41 or higher] or [40 or lower]. I'd choose [41 or higher] because that choice has a 60% chance of being correct.

I'm using "random guess" in this sense: if there are N options, then a random guess would be a random selection of option 1 to N.

So I would not agree that, in the ball urn option, random = uncertain. Given the choice between [41 or higher] or [40 or lower], there is uncertainty about which ball will be selected but this uncertainty is not random because the ball selection is biased in favor of [41 or higher]. Random, as I understand it, indicates a lack of certainty and a lack of bias.

"I agree that the act of drawing the ball out of the urn would be a random act, but I would not make a random guess about whether the number on the ball is [41 or higher] or [40 or lower]. I'd choose [41 or higher] because that choice has a 60% chance of being correct."

Then how can you agree the draw is random? Clearly, any ball randomly drawn has a 60% chance of being numbered 41 or higher. By your definition, that would make the draw non-random.

"Random, as I understand it, indicates a lack of certainty and a lack of bias."

By "lack of bias" I take you to mean equal probabilities. Like 50:50 in a coin flip, or 1/6 equally for all outcomes of the toss of one die.

If so, that is where you're making a mistake. Equal probabilities are not required for an outcome to be random. Only uncertainty is (lack of 100% knowledge of what the outcome will be).

Consider the position your concern for unbiased-ness puts you in:

You agree the toss of one die is random, because the possible outcomes have equal probabilities (1/6).

But if we add a second die you must now say the toss is non-random, because the probabilities of the outcomes of tossing 2 dice are not equal: e.g. probability of a 7 showing is 6/36, but the probability of a 2 or 12 showing is, for each, 1/36.

The fact that 7 is favored in the toss of two dice does not detract from the randomness of the outcome. That just affects the odds of 7 showing, not the randomness of 7 showing.

If you want to know what outcomes are non-random, those are the events that no one is offering odds on.

For the ball-from-the-urn example, the draw of the ball is a random event with regard to the number on the ball, but the draw of the ball is not a random event with regard to the characteristic that we focused on [41 or higher].

---

I am using "random" to indicate uncertainty without bias, and you are correct that I am using "lack of bias" to indicate equal probabilities. But I think that you are incorrect that I must therefore say that the second die toss in your example is non-random. Each roll of each die is a random event, but the sum of the dice is not a random number (as you indicate, the sum for two rolls is biased toward 7).

You are using "random" to indicate uncertainty only, but consider two common uses of the word "random": random sample and random number generator.

Suppose that we want to draw a random sample from a group of 500 men and 500 women; my definition of "random" means that a random sample is only the sample in which each person has an equal chance of being selected; but your definition of "random" means that a random sample could include a sample in which we plan to pick 99 men and 1 woman, as long as we are not certain which men and woman will be picked.

Let's say that we are instructed to build a random number generator to select integers from 0 to 50. My random number generator will be built so that each number from 0 to 50 has an equal chance of selection, but I think that you might be fine building a random number generator in which the integers 0 to 49 each have a 1% chance of selection and the integer 50 has a 50% chance of selection.

[I edited the previous deleted comment for grammar and clarity, in places indicated by square brackets.]

Hi Anonymous,

You're more than welcome to use "random" as a synonym for "uncertain", but I don't think that that particular usage is consistent with common uses of the word "random". Here's a definition from random.org: "When discussing single numbers, a random number is one that is drawn from a set of possible values, each of which is equally probable, i.e., a uniform distribution."

Randomness tests have been designed to try to determine whether a sequence of numbers has been produced through a random process or through a non-random process, such as a person calling out numbers off the top of his or her head. For my use of "random", this test makes sense because a random sequence is one in which -- among other things -- one number from a set does not appear more or less than another number from that set, at least not at a non-trivial level; therefore, to assess randomness, it makes sense to assess whether a particular number has appeared more often than other numbers, at least at a statistically significant level.

Correct my logic if I am mistaken, but a randomness test makes no sense with your use of "random": if a person calling out a sufficiently-long sequence calls out twice as many even numbers as odd numbers and ten times as many 6s as 7s, then that is still [] a random sequence [according to your definition], as long as the person calling out numbers did not tell us beforehand which number s/he was going to call. So if "random" is only "uncertain", then what is the purpose of a randomness test?

---

Regarding your example, a craps simulator is not a random number generator because a draw of 2 does not have the same probability as a draw of 12.

I believe you are confusing the odds, the probability, of something happening with the randomness of its happening. They are two different things.

Consider: there have been 10 games played by team X versus teams A, B, C, D, E, F, G, H, I, J. Say X won 5 and lost 5 of the games. Let's try to pick one of the teams they beat.

Whichever team we pick we have a 50% chance of being right. But no matter which team we pick, it's just a random guess, no better or worse than any other guess.

The 50% represents our odds of being right.

The randomness characterizes our ignorance, lack of knowledge, lack of information, that is, our uncertainty, concerning any reason to pick any team instead of any other: D, say, instead of E, or E instead of F, or H instead of A.

Let's say X won 6 of the 10 games. Now the odds give us a 60% chance of being right, but we still have no reason to pick one team and not another. The same odds of being correct pertain to every guess equally. The 60% does not "bias" any choice over any other.

Maybe X won only 1 of 10 games. Our odds of being right have degenerated to 10%, but still there's no more reason to pick A as the team X beat then to pick B etc.

What if X won 9 of the 10 games? Now our odds greatly improve to 90% chance of being right. But our pick is still random. No matter which team we pick, we'll likely be right (there's only a 10% chance of being wrong), but that in itself is no reason to pick C instead of D, because C has the same 1/10 chance of being wrong that D does.

If there is a choice concerning outcomes (e.g., who did X beat?) and the odds of the choice being right (or wrong) apply to all our possible choices equally (this is where your unbiased-ness comes into play), than it is said that our choice is a random choice, no matter what those odds are (as long as they aren't 0% or 100%).

To make our choices, we could have scribbled the first 10 letters of the alphabet haphazardly on a piece of paper and waited for a fly to land on or near it and taken the team represented by the letter closest to the fly. Or the letter farthest from the fly. It wouldn't make any difference. And that's true whether only one of letters leads to a correct choice, or five, or six, or nine of them do.

Concerning random number generators, they can be built for different purposes, one being as in your description. Another being to simulate the fair toss of two fair dice. In that case, you want it to generate six times more 7s than 12s, but to do it randomly. That is, the 6:1 odds of 7:12 showing will be the same on every toss. The bias is simply a matter of fact, part of the description of tossing two dice. As long as we have no reason to believe those odds will be different for this toss than for that toss, we say the tosses are random.

Look at it this way: here are seven letters A, B, C, D, E, F, G. Six of them represent the number 7, one of them represents the number 12, which is which? We are uncertain.

We are certain that six of the letters represent 7 and one represents 12. We are certain that any given letter has a 6/7 bias of being a 7. But since that bias is the same for all we are uncertain of what any given letter actually is. We are not certain that A is a 7 and not the 12. And since that same uncertainty applies to any choice of a letter we make, our choice is random.

If X won exactly 5 of 10 games, then you are correct that it would be a random guess to pick the winner of any game or of each game: this is a random situation because there is no way that we can purposefully pick the winner of each game with any expectation that we would do better than the random fly that you mentioned. This reflects what I am trying to get at with "random": a lack of bias so that purposeful selection is not expected to be better than fly selection.

But let's use your example where X won 60% of its games vs. [A to J]. There are at least two dimensions of possible randomness here: one with regard to predicting which four teams beat X, and one with regard to predicting the winner for any given game or for each game.

1. You are correct that, if X won 60% of games, there is no way to know which four teams beat X. I would agree that there is randomness in that sense, because we have no information with which to distinguish teams [A to J]. Our purposeful selection would do no better than the fly.

2. But for predicting winners for any game or for each game, we would be expected to do better than the fly if we purposefully picked X to be the winner of any game or each game. In that case, purposeful selection is better than fly selection, so this is not a dimension that I would describe with the term "random" because of the bias toward X.

---

It's similar for your letter example: I'd agree that it would be a random guess to pick which six letters from A to G represent 7; we might as well let the fly pick in that case. But I'd also say that, if you asked me whether D represented 7 or 12, I'd guess 7, and I would have a better chance of being correct than the fly.

---

Imagine an urn to reflect a craps simulator. This urn has one 2 and six 7s, plus the correct number of 3s through 6s and 8s through 12s. If we are drawing balls from the urn, then I would agree that it is a random draw with respect to the individual balls; I'd also agree that it is a random draw with respect to the bias in the numbers that should be returned (7s being six times as likely as 2s, for example). But I would not characterize the draw as random with respect to the numbers on the ball (1 through 12) because the numbers are biased toward 7.

---

I think that we might be stuck on the question of whether "random" must include bias. One way to resolve such a disagreement is through consultation with third parties. I don't want to appeal to authority, so I'll note that the sources below might be incorrect or unrepresentative. But I think that we are discussing what "random" means and not what it should mean, so it might be useful to check our usage against what other people are using the word "random" to indicate.

* Like I mentioned before, random.org uses "equally probable" in its explanation of a random number (http://www.random.org/analysis/).

* Wikipedia's current definition of a simple random sample: "Each individual is chosen randomly and entirely by chance, such that each individual has the same probability of being chosen at any stage during the sampling process, and each subset of k individuals has the same probability of being chosen for the sample as any other subset of k individuals" (http://en.wikipedia.org/wiki/Simple_random_sample).

* Here's a page on the difference between random error and systematic error, defined as bias (https://onlinecourses.science.psu.edu/stat509/node/26). I don't see why it would be necessary to discuss the difference between random error and bias if randomness could include bias.

* Here's a government source on random assignment in experimental research: "This means that each individual has an equal chance of being assigned to either group" (http://ori.hhs.gov/education/products/sdsu/rand_assign.htm).

"But I'd also say that, if you asked me whether D represented 7 or 12, I'd guess 7, and I would have a better chance of being correct than the fly."

Fly's can't speak but I understand fly language. Whenever one lands near a letter, it's guessing that letter represents 7. If so, how in the world can the fly do any worse (or better) than you?

"But I would not characterize the draw as random with respect to the numbers on the ball (1 through 12) because the numbers are biased toward 7."

I repeat, you confuse the probability of an outcome with the randomness of the outcome. They are two different things.

The only thing that makes the draw random is that we don't know with 100% certainty what the outcome of the draw will be. End of story.

Another way of saying the same thing: A random event is one in which the outcome is a function of probability; i.e., the outcome is not the function of certainty.

A draw could have outcomes, all of which are equally probable, or outcomes that are not equally probable. All it takes to make either draw random is that, when a ball is drawn, we can't predict with 100% certainty what number the ball will be, no matter that number's probability.

99.9% certain is not 100% certain. It would not be inconsistent to say: We have 99.9% certainty of the outcome of a random event.

You cite references in which an equal probability of a random outcome is desired and then say random events are only those in which all outcomes have an equal probability.

This reminds me of someone who says, all fire trucks are red, therefore anything red is a fire truck.

You quote Wikipedia:

"Each individual is chosen randomly and entirely by chance, such that each individual has the same probability of being chosen…"

When it says "Each individual is chosen randomly and entirely by chance…." That's the random part. The selection is a function of probability (chance), not certainty.

But a function of what probability? Well, in this case, "such that each individual has the same probability of being chosen…" That's the probability part that you confuse with the random part.

We can easily think of a different case where we keep the random part and change the probability part.

For example: we run a boys' magazine and want to do a survey. Our readership is 40% boys under ten and 60% boys ten plus. So we say:

"Each boy is chosen randomly and entirely by chance, such that each boy under ten has a 40% probability of being chosen and each boy ten or over has a 60% chance…"

The random part ("Each boy is chosen randomly and entirely by chance") stays the same. The probability part changes, but that doesn't affect the random part.

The selection of boys is random because it's a function of probability, not certainty.