Curse of the Keywords

Almost a year ago now the people of Scotland were deciding whether to become independent from the UK. At the same time, the Bank of England (the UK’s central bank) were trying to determine whether a “yes” vote would trigger a run on Scottish banks. They decided to try and do this by analyzing social data from DataSift. You can read all about their methods, findings and recommendations on using social data for predicting real world events on their blog.

In the end, Scotland decided against independence and there was no run on the banks (sorry for the spoilers), but the team did have one jumpy moment when they noticed a spike in the volume of posts related to their keyword search. They investigated and realized that these had been driven by the term “run” being used in conjunction with “RBS”. Well, nearly. In fact it wasn’t RBS, the acronym for the Royal Bank of Scotland, that was being referenced, but RBs, which is short for “running backs” and was being mentioned during a Minnesota Vikings game. Once they discovered this, the team were able to adjust their terms to be case sensitive and eliminate this false positive, no harm done.

This has caused some wry amusement in some quarters of the press (detracting, I think, from some interesting research), but you can’t really blame the guys at the Bank of England for not being experts in what they would call American football. We are all experts in our own worlds, and all of these worlds are represented on social media – how are we to know whether the keywords we are interested in are also a town in Michigan, an Australian soft drink or an Indian cricketer before we start? And even if we do know, how do we go about writing a search string that eliminates the noise and keeps the signal?

Over the course of the last year we have made some steps forward in this area. Most recently we have developed Keyword Relationship Models, which allow you to see how language is really used on social media. Designed to help you find the similar terms you should be using to get a complete picture of the subject you are interested in, they also highlight the Minnesota Viking shaped bear traps you didn’t know were there. You can see what I mean if you enter the term “RBS” into our free explorer tool. You immediately see that most of the words that are similar to RBS are related to football – the bank’s hashtag just makes the cut as the tenth closest term.

We also recently announced VEDO Intent that allows domain experts to build machine learning classifiers without having to involve a data scientist. If they were repeating the exercise now, the bank’s team could train VEDO Intent to classify social media posts based on their own human understanding of the text. This would mean that only items that were relevant would be tagged as being related to a run on RBS, because the humans know the difference and the machine learns from them. This would rely though on the team having the time to train the classifier and having enough relevant content to work with in advance.

The other step forward we have made in the last year to make understanding large audiences easier is the introduction of Facebook topic data. Not only does it have the advantage of being a much larger and more representative data set than public social networks, but it also includes information from the Facebook Graph. The Facebook Graph means that you can filter your results by the topics they are discussing. So, if the bank had been able to access Facebook topic data back then, they would have been able to add the topic categories of “Bank/Financial Services” and “Bank/Financial Institution” to ensure that all of the results returned were applicable to banking.

On this occasion there was no need for the Bank of England to take any action based on their research, but it is great to see organizations as august as the Bank of England (established 1694) embracing and exploring the possibilities that social data present. What I take from this, as someone from a marketing background, is not that the Bank of England need to know more about US sports, but that the applications for social data analytics are only limited by your ambition and imagination.