Why small steps in big data count

Computer Weekly

19 June 2014

The hype around big data is based on the premise that there are interesting, useful and valuable correlations yet to be discovered in the large pools of data that slop around most organisations –and that are increasingly available in the public domain.

From this data it may be possible to better predict attendance at concerts, say, from the amount of related Twitter activity two weeks beforehand, or to identify those customers more likely to switch providers at the end of their contracts by taking into account the search terms they used before they signed up in the first place.

High price tag

Unfortunately, all too often the ‘big’ in big data comes with a big price tag because it needs big servers, big teams, big software licences and a big timeframe. And if the value of the useful data correlations is thought to be low or impossible to estimate, then it becomes difficult to justify the cost of embarking on a big project.

Faced with uncertain benefits and high costs, most organisations do one of two things. Either they reach deep into their wallet and hope the hype is true, or they shrug and move on, convinced it’s not for them – and so miss out on the possibility of discovering useful correlations.

A third way

However, there is a third way. Modern cloud tools that offer unlimited storage and processing power can be brought to bear on a problem in a matter of minutes. Taking one or two datasets, pouring them into a data storage bucket in the cloud and plugging in a talented data analyst with a good visualisation tool can start generating useful answers in just days.

Such approaches can sidestep the long lead times and high cost of standing up a large-scale database and focus on quickly proving (or disproving) hunches and hypotheses relating to the data.

Google BigQuery

A recent proof of concept I ran recently saw us take the entire (anonymised) data warehouse of a national retail chain, put it into Google BigQuery and plug one of our data analysts into it. He was producing useful output within three days and by the end of three weeks was demonstrating interesting, actionable insights related to the changing behaviour of loyalty scheme members.

More excitingly, we were able to combine social media data with revenue figures to demonstrate a valuable correlation between Twitter activity and spend in the coming fortnight.

This speed changed attitudes. Instead of debating the merits of spending months and hundreds of thousands of pounds putting in place the tools and infrastructure, we were able to cut straight to the chase and draw out valuable answers in just days.

So don’t get bogged down and fearful that big data means big steps – just get started and learn as you go.