This is the blog of Jamie Thomson, a data mangler in London working for Dunnhumby

Want to prove something with data? Aggregate it!

Recently Microsoft announced that they’re releasing a new XBox which was apparently big news and was reported on at length around the globe. One article in particular on the BBC News website caught my attention because it contained what I thought to be some really bogus reporting

“from first appearances tech writers were more impressed by what they saw than the public posting on social media.

Twitter: Every time you say Xbox One, you feel like to have to add the word "new" in front of it to avoid confusion. @MurraySweThe one thing I came away with was that Microsoft has trouble with numbering. @FinleyNathanAm I the only one who thinks the new Xbox One looks like a VCR? @dandahlberg“

Now perhaps its true that reaction on social media was negative but the BBC’s attempt to justify that claim by cherry-picking some random tweets from a sea of millions is, in my opinion, misleading lazy and pointless. And I said so:

So here’s an idea. Instead of picking random tweets what if the BBC instead attempted to gauge opinion by analysing the overall sentiment of those tweets instead. Its not even that difficult, there are free services such as What does the internet think that do this for you. Here’s a screenshot I took from that service at the same time as I was reading the BBC report:

What does the internet think is doing something smart but ostensibly very simple – it is aggregating tweets in order to give a better reflection of the sentiment toward “xbox”. Now imagine if the BBC reporter that wrote the above article had chosen to measure the sentiment before and after the announcement, would that not have given a better reflection of the reaction from social media rather than a few random tweets? Its certainly more interesting and newsworthy to me, I couldn’t care less about the opinions of individuals that I have never met but measuring the overall sentiment – well that means something. Moreover, I’d like to know if the announcement has affected that sentiment positively or negatively. As any data analyst (or BI guy) will tell you its not the numbers themselves that are important, its whether they are trending up or down that matters more. Now what I would love is a service that showed you the trend of sentiment on social media over a given period of time so that I could look back historically; actually, collecting that data seems like a great use case for a cloud-based ETL tool, I wonder if I could build it with Datasift?

At the time of the reveal I … said Microsoft had a store of public trust that could help develop the Xbox One market, and should be nurtured across its other brands. Microsoft has managed the Xbox reveal very well, despite criticism the following day and more recently. Optimism and the reputation for innovation have been enhanced

How did Mr Shaughnessy arrive at this conclusion? He did exactly what I suggest above, he aggregated and analysed sentiment from social media:

I arrived at that conclusion by examining big data around Microsoft sentiment. The data is drawn from tens of thousands of news and social media sources and is filtered by “emotion”.

He also used a graph to show the trend over time. As they say, a picture tells a thousand words:

The chart depicts an uptick of optimism toward Microsoft after the XBox One announcement.

Take a read of the article for more insights, it really is very interesting. I’m very impressed with what Mr Shaughnessy and Forbes have done here. They sought out evidence based on sentiment then analysed it to draw a conclusion rather than deferring to perceived popular opinion. The credibility of Forbes has risen in my consciousness (and perhaps with anyone reading this blog post too) and I’ll probably seek out Forbes articles in the future. Certainly in preference to the BBC anyway.

Comment Notification

Comments

I logged on to What Does The Internet Think and searched for “xbox” like yourself and received quite different results:

Negative: 39.5% (93k+ hits)

Positive: 54.1% (128k+ hits)

Indifferent: 6.3% (15k+ hits)

If you ran this query the same day you posted that tweet (22nd May), it means the number of tweets has increased over 5 times in 26 days.

I think this wide variation adds to your point that you can’t just cherry pick tweets, as given such swinging opinion you could probably find a tweet that supports any argument. Also when data is aggregated over large volumes of data, allowing some drill down capability would provide the reader more confidence in the results.

Ah that's interesting. I believe the screenshot above was taken the day after the XBox Reveal event, which would make it 22nd May, as you say. Now I also visited the same URL (http://www.whatdoestheinternetthink.net/xbox">http://www.whatdoestheinternetthink.net/xbox) less than a week ago (today is 6th June for anyone reading this in the future) and the numbers were exactly the same as what they were on 21st May. In other words, judging by your observation after visiting the site today there has been a massive change over the past week or so.

This suggests that http://www.whatdoestheinternetthink.net/ does not update their data on a regular cadence so perhaps its not the best source of data here. I guess that's what you get if you accept something for free. Pay peanuts, get monkeys...or something.

So true about noting it's the trend in the data, not just the numbers themselves. I've found quite a few Forbes articles to be worth reading although I only read them when they're linked to by something else (Ars Technica links to them occasionally).

We do hospital financial analysis as a mostly off the shelf product. Often it's hard to get things making sense in the client's mind until we've built up a few months worth of data so we can trend it and help them relate spikes (good or bad) to events they recall in recent memory. The numbers themselves, in the absence of anything to compare to, are rather difficult to interpret.

Similarly we've been asked to do benchmarking and problems exist there but in a different dimension (across businesses rather than time). Knowing how each business actually calculates the number is part of determining if we're really doing an apples to apples comparison. Then the numbers can be meaningful and the hospital in isolation, which may make money in spite of itself, suddenly doesn't look so crash hot when compared with other hospitals who also make money (just making a lot more) :)

Love it! I like this topic.This site has lots of advantage.I found many interesting things from this site. It helps me in many ways.Thanks for posting this again. Thanks a million and please keep up the effective work Thank yo so much for sharing this kind of info- <a href="http://www.stonesuppliesltd.co.uk/services.html">Sand Suppliers</a>