Wednesday, 25 January 2012

Observe, don't just see, Dr Watson

The other day I was reading stories of Sherlock Holmes on the train. The following quite famous quote in which Holmes explains to Dr Watson his extraordinary inference and deduction skills caught my eyes:

Watson: And yet I believe that my eyes are as good as yours.

Holmes: Quite so. You see, but you do not observe...

In my interpretation this means that everybody is capable of seeing the same evidence, but what makes a good detective is the ability to focus on relevant aspects and putting these pieces together to reconstruct stories underlying their observations.

I started thinking about the relevance of this philosophy to data science and data-driven enterprises. Many companies, governments, etc now realise the value of data: the more data they have about their users/citizens, the better position they are in when making decisions - in theory. Take a look at for example the bazillion of startups based on the business model: "People like to engage in activity X. We will make their experience better by leveraging data from social networks". But can they really realise the potential in data? My impression is that many of them can't. These companies see; but they do not observe: they may be sitting on terabytes of data, but they're incapable of using it to tell the right story about their users.

For that, they need real data scientists, who are a scarce resource. Not every company can afford the luxury to have an experienced data science team playing with their data. Data scientists are private detectives of the modern world: they are presented with evidence (the data) and they are asked to uncover the hidden story that explains it all. In most cases, Sherlock Holmes had to find out who the murderer was and what their motivations were, data scientists (or rather, the algorithms they build) try to figure out what made you share a link on Facebook; whether your recent credit card transaction was made by you or it was fraud; or figure out from thousands of USB stick descriptions that "four gigabytes" and "4GB" mean the same thing.

For the average reader, the above examples would be somewhat less exciting than any of the murder cases Sir Arthur Conan Doyle's famous fictional consulting detective has ever worked on. But hey, maybe that's only because no-one with sufficient writing talent has picked this up yet: "Twelve Adventures of Hercule Bayes, the Consultant Data Scientist". I'd buy it.