Category: Humour

This is a guest post by Markus Haverland, Professor at Erasmus University Rotterdam and author of a recent book on research methods.
***

Causal knowledge about the world proceeds by testing hypotheses. The context of discovery precedes the context of justification. We all know that journalists and pundits often do it the other way around: providing for an explanation after the fact.

A particularly hilarious example can be found in today’s issue of “Spits”, a Dutch daily newspaper. Anticipating that the result of the elections for the president of the US would arrive after the newspapers went to press, the newspaper prepared for both situations. It has turned the backpage into a second frontpage. Depending on the results the reader is advised to either read the frontpage or the backpage. On both pages the well-known Dutch journalists, a former correspondent in Washington, Charles Groenhuijsen analyses the results. On the “Obama wins” page he explains that it was evident that Obama would win, because he is a better campaigner and Romney’s economic program is inconsistent. On the “Romney wins” page he explains this outcome, by stating that, ultimately, the US is a conservative country, that voters were afraid of a turn to the left, laws against gun possession, and tolerance towards gay marriage, and that voters thought he was not effectively dealing with the economic crisis.

I have been busy over the last few days correcting proofs for two forthcoming articles. One of the journals accepts neither footnotes nor endnotes so I had to find place in the text for the >20 footnotes I had. As usual, most of these footnotes result directly from the review process so getting rid of them is not an option even if many are of marginal significance. The second journal accepts only footnotes – no in-text referencing at all – so I had to rework all the referencing into footnotes. Both journals demanded that I provide missing places of publication for books and missing page numbers for articles. Ah, the joys of academic work!

But seriously… How is it possible that a researcher working in the XXI century still has to spend his/her time changing commas into semicolons and abbreviating author names to conform to the style of a particular journal? I just don’t get it. I am all for referencing and beautifully-formatted bibliographies but can’t we all agree on one single style? Does it really matter if the years of a publication are put in brackets or not? Who cares if the first name of the author follows the family name or the other way round? Do we really need to know the place of publication of a book? Where do you actually look for this information? Is it Thousand Oaks, London, or New Delhi? All three appear on the back of a random SAGE book I picked from the shelf… Who would ever need to know whether it was Thousand Oaks or London in the first place? Maybe libraries, but they certainly don’t get their data from my references. Obviously, the current referencing system is a relic from very different and distant times when knowing the publishing place was necessary to get access to the book. Now, collecting and providing this information is a waste of time and space.

And yes, I have heard of Endnote and BibTeX, and I do use reference management software. But most journals still don’t have their required styles available for import into these programs. So the publisher doesn’t find it necessary to hire somebody for a few hours to prepare an official Endnote style sheet for the journal, but it demands from all authors to spend days in order to rework their references to conform to its rules?!

And why are there different referencing styles anyways? Can you imagine the discussions that journal editors and publishers have before they settle for a particular referencing style?

– Herr Professor, I must insist that we require journal names to be in italics!
– That’s the most ridiculous thing I have ever heard – everybody knows that journal names are supposed to be in bold, not in italics!
– But gentlemen, research by our esteemed colleagues in psychology has shown that journal names put in a regular font and encircled by commas are perceived as 3% more reliable than others.
– Nonsense! I demand that journal names are underlined and every second one in the list should be abbreviated as well.

And so on and so forth… To remedy the situation I boldly propose a World Congress on Referencing Styles. All the academic disciplines and publishers will send delegates to resolve this perennial problem once and for all. There will be panels like Page Numbers: Preceded by a Comma, a Colon, or a Dash, and seminars on topics like Recent Trends in Abbreviating Author Names. No doubt several months of deliberation will be needed, but eventually the two main ‘Chicago’ and ‘Harvard’ parties will reach a compromise which will be endorsed by the United Nations amid the ovations of the world leaders. The academic universe would never be the same again!

Following my recent post on the project which tries to explain why some video clips go viral, here is a report on Google’s efforts to find the funniest videos:

You’d think the reasons for something being funny were beyond the reach of science – but Google’s brain-box researchers have managed to come up with a formula for working out which YouTube video clips are the funniest.

The Google researcher behind the project is quoted saying:

‘If a user uses an “loooooool” vs an “loool”, does it mean they were more amused? We designed features to quantify the degree of emphasis on words associated with amusement in viewer comments.’

Other factors taken into account are tags, descriptions, and ‘whether audible laughter can be heard in the background‘. Ultimately, the algorithm gives a ranking of the funniest videos (with No No No No Cat on top, since you asked).

Now I usually have high respect for all things Google, but this ‘research’ at first appeared to be a total piece of junk. Of course, it turned out that it is just the way it is reported by the Daily Mail (cited above), New Scientist and countless other more or less reputable outlets.

Google’s new algorithm does not provide a normative ranking of the funniest videos ever based on some objective criteria; it is a predictive score about the video’s comedic potential. Google trained the algorithm on a bunch of videos (it’s unclear from the original source what the external ‘fun’ measure used for the training part was) in order to inductively extract features associated with the video being funny. Based on these features, the program can then score any possible video. But these scores are not normative measures, they are predictions. So No No No No Cat is not the funniest video ever [well, it might be, it’s pretty hilarious actually], it is Google’s safest bet that the video would be considered funny.

The story is worth mentioning not only because it exposes yet another case of gross misinterpretation of a scientific project in the news, but because it nicely illustrates the differences between measurement, prediction, and explanation. The newspapers have taken Google’s project to be an exercise in measurement. As explained above, the goal is actually predictive in nature. But even if the algorithm has 100% success rate in identifying potentially funny videos, that would still not count as an explanation of what makes a video funny. Just think about it – would a boring video become funny if we just put funny tags, background laughter, and plenty of loools in the comments? Not really. In that respect Brent Coker’s approach, which I mentioned in a previous post, has real explanatory potential (although I doubt whether it has any explanatory power).

So, no need to panic, the formula for something being funny is as distant as ever.

P.S. In an ironic turn of events, now that No No No No Cathas gone viral, Google would never know whether the algorithm was very good, or just everyone wanted to see the video Google declared the funnies ever. Ah, the joys of social science research!

Over the last year two major Hollywood movies that touch upon the use of big data and sophisticated data analysis hit the big screen. Which, of course, is two more than the mean (or was that the median). Moneyball shows how crunching numbers helps win baseball games and Margin Call shows how crunching numbers helps ruin financial firms. It’s kind of fun to see Brad Pitt and Kevin Spacey stare at spreadsheets and nod approvingly while being explained some statistical subtleties. But watching someone stare at somebody else’s spreadsheets quickly becomes tiresome … which probably explains why Regressing with the Stars, Dotchart Master, and America’s Next Multilevel Model haven’t yet taken over reality TV.

So I was really disappointed to see that a third 2011 movie – The Ides of March – misses a golden opportunity to show the use of big data and sophisticated analysis for winning elections. The movie revolves around the primary presidential campaign of George Clooney (pardon, Governor Mike Morris) and the dirty politics behind the scenes. But for Hollywood in 2011, electioneering is still a game of horse-trading, media spinning and good-ol’ stabs in the back. All these things about election campaigns are probably true, but I was disappointed that there were no fancy graphs plotting approval ratings and prediction market quotes, no real-time election forecasts (or nowcasts) at which George Clooney to stare and nod approvingly, no GIS-supported campaign targeting, not even focus groups, twits, facebook pages, not to speak of google circles. Now, I have never been involved in an election campaign but I would have guessed that some of what political scientists are doing to analyze election outcomes and the effects of various elements of election campaigns has filtered through to campaign managers. But according to The Ides of March, electioneering is still stuck in the 1990-s. Someone get Hollywood a subscription to Political Analysis.