Statistical insignificance

data mining

Do you think if you flipped a coin in a mint, it would show heads more than tails? Imagine if we set up a small coin-stadium in or adjacent to the mint where the coin was made, where other coins would sit around watching the coin get flipped. Say we flipped the coin outside of the stadium first a bunch of times and showed that it was relatively 50/50 whether it was going to be heads or tails, but then we went back to this mint-stadium and flipped the coin 3,879 times, and it turned up heads 2,219 times. With a simple statistical test, you can show that the probability of a 50/50 coin giving this result in the stadium is 0.000000000256%.

Football is not a coin. However every team – no matter how good or bad – plays 16 games in the regular season: 8 of those at their own stadium and 8 of those at an opponents stadium, so a good team will play at home as much as a bad team will. Yet when you run through the stats the ‘home field advantage’, i.e that the home team are more likely to win than the away team, is more statistically significant () than the detection of the Higgs boson ().

What I’ve got: 14 years of regular season NFL data (2000-2014) – a few thousand games, half a million plays.

What I’m going to do with it: Try and find which bits of a football game are affected by ‘home field advantage’ in a (fairly) rigorous manner.

Doing stats during an election is like talking in a crowded room. Me doing stats during an election is like whispering lines from the phonebook in a crowded room while everyone else talks through megaphones giving away Amazon gift card codes. I may not have banks of telethonners calling and polling choice constituencies and asking questions like “which party leader would you most like to meet in the smoking area of a club?” but one thing I do have is 47 hours of people tweeting about hating Nigel Farage et al.

What I’ve got: 47 hours of tweets mentioning any of the parties or party leaders with the words ‘hate’ or ‘love’.

What I’m going to do with it: Construct a way of quantifying the hate of each party. Find out each party’s hate factor, each leader’s hate factor and the difference between the two. Plot a graph of hate against time over the full 47 hours.

I’ve got no business writing a blog about statistics. This isn’t going to be zeitgeisty and impactful because I’m neither of those things, and it isn’t going to ‘make statistics fun’ because statistics generally isn’t fun. However, since we’ve accepted that, we can have some fun by asking questions that the smart people don’t have time for. That’s what gutterstats is about, stupid questions with stupid answers. Today’s question is: who’s hungry? We’re gonna look at the kind of person who tweets about their stomach and see how they differ from the ‘average’ tweeter.