Related Tags

Six challenges for big data

Big data is about more than Hadoop and a bunch of fancy technology: there are some very real organisational barriers too.

It's a bit of a mirage. As soon as you get your head around it, it ceases to exist.

How so? The accepted definition for Big Data talks about exploiting “data sets whose size is beyond the ability of commonly used tools to process it within tolerable time”. By that definition, as soon as you’re comfortably handling the data, it ceases to be big.

Nonetheless, Big Data is clearly trending amongst the tech analysts, and it’s doing so for good reasons. The volume of data we’re handling is growing dramatically, Social media, the internet of things. The mass of data produced by smart electric grids, intelligent traffic systems, etc.

90% of the data ever created has been created in the last two years...

And yes, it’s not just about size. Gartner’s “3Vs” (Volume, Velocity, Variety) are all growing. We’re being asked to process data ever more quickly so we can respond to events as they happen, and that data is coming from an ever wider array of channels, sensors and formats.

Our data is fast and complex as well as big.

So let’s all go out and buy Hadoop, and our problems will be solved. Hurrah!

Not so fast. I can see at least six things that are going to get in the way of Big Data in the typical organisation:

1. Infrastructure

Big Data consumes a lot of technical infrastructure, storage, bandwidth, CPU, etc. And it generates highly variable workloads as it does so.

You need lots of infrastructure at some times, very little at others.Fortunately, the Cloud is made for this. The challenge isn’t technical so much as it’s one of finding a reliable cloud vendor, and of getting the economic model right.

Just don’t underestimate how challenging that can be in the current, rather opaque market for cloud services.

2. Applications

The application stack behind Big Data is complex. Some of it is immature. The Cloudera Hadoop distribution, for example, contains a dozen applications, and some of these are still pretty new.

This creates several challenges: you need to get up several learning curves at once, integrate many tools with your existing application stack, and build a stable operating environment out of these disparate pieces.

3. Skills

You need a deep stack of skills to do Big Data. As well as business specialists (to ask the right questions) and technologists (to tame the infrastructure and applications), you need “data scientists”.

These are the people who understand the statistical algorithms, can drive the visualisation tools, etc. They’re not easy to find. And once you’ve found them, you need to integrate them with the rest of your team, build appropriate reward and reporting structures, and so on.

4. Attitude

Big Data projects operate on a different cycle to traditional ones.It’s not so much “plan then do” as “experiment, learn and evolve”. It requires a mindset that’s attuned to research as much as delivery, yet which is able to temper research with business objectives.

Good Big Data teams will be very tolerant of “failure”. (If 50% of your experiments don’t fail, then you’re probably not testing the boundaries.)And they’ll allocate plenty of capacity to exploring the horizon and trying new stuff.

5. Fragmentation

Most organisational data is highly fragmented.The web team has a bunch of logs. Sales owns some of the customer data.Operations owns some more.

This creates challenges at several levels: syntactic (defining common formats), semantic (agreeing definitions) and political (negotiating ownership and responsibilities).

It also creates data quality problems as no-one’s responsible for the complete picture, so no-one ensures that data is correct, consistent and up to date.

Big Data needs to face all these challenges head on. (As data warehousing did before it. But Big Data has the added complications of semi-structured data and rapidly changing data definitions.)

6. Valuation

You can only do this effectively if you can ascribe clear value to the outcomes, otherwise you have no way to prioritise activity across your portfolio of experiments and investments.

Yet few organisations are able to put clear valuations on their current data, let alone on the fuzzy web that Big Data exposes.

Of those six challenges, the first two, infrastructure and applications, are fairly straightforward. The tools we need are (largely) there. We just need to learn how to use them and to fine-tune their economics.

It’s in the next two that the challenge lies: building multi-skilled teams with the right attitude. Right now, many Big Data projects are merely playing with the data, exploring the tools and shifting data around within its silos.

If we could build some stable, cross-functional teams and focus them on business-led experimentation, then we’d probably begin to find real value in the data we have stashed away. And along the way, we’d start to break down some of the silos that have grown around our data.

As ever, the real challenge isn’t the technology. It’s shifting our organisations to address the opportunities that Big Data creates.

You might be interested in

Comments (2)

DataH

Graham, great insight for helping companies simplifying their big data challenges. It is worth mentioning the open source offering from HPCC Systems to help companies with all 6 challenges you outline. As a superior alternative to Hadoop, HPCC Systems built-in analytics libraries for Machine Learning and integration with other open source tools like Pentaho provide an end to end solution for ETL, Data Mining and Reporting. For more info visit: hpccsystems.com

When thinking about the challenges of "big data", the only thing that I think makes these types of problems "special" is the velocity part, data in and data out real-time.

So much of the hype around "big data", especially the volume and variety pieces, have been dealt with for 30 years on mainframes and databases. Yes, there's way more data, but ultimately the same database ETL processes can get it into usable format. Using Map Reduce to process the data faster is a newer technique, but the math and general concept of parallel processing has also been around for some time.

Ultimately, I agree with the final point that the author makes, that the problem is not a technical one, it's a personnel issue. Getting the right people working on the right problem to create value for the business, not just playing with massive hardware for the sake of being "cutting edge".

Enjoying this article?

Get more just like this, delivered to your inbox.

Keep up to date with the latest analysis, inspiration and learning from the Econsultancy blog with our free Digital Pulse newsletter. You will receive a hand-picked digest of the latest and greatest articles, as well as snippets of new market data, best practice guides and trends research.