The truth about big data: It's more than technology

Big data is a maddeningly broad concept, in part because it represents a new, iterative approach to solving problems

InfoWorld|Jul 24, 2014

Hey, it must be hard to be the only person on the planet who doesn't understand big data.

Actually, that's far from true: You're in good company. While Gartner finds that 64 percent of enterprises are investing in big data, a similar chunk (60 percent) don't have a clue as to what to do with their data.

The real problem isn't one of technology, but of process. The key to succeeding with big data, as in all serious IT investments, is iteration. It's not about Hadoop, NoSQL, Splunk, or any particular vendor or technology. It's about iteration.

Big data, big confusion

Though the number of companies embracing big data projects has grown since 2012 -- from 58 percent of enterprises surveyed to 64 percent -- the level of understanding of exactly what to do with that data hasn't kept pace, as the Gartner data suggests.

This isn't all that surprising, given how hard it is to pull money from data. It's easy to say "actionable insights," but far harder to glean them. That's why data scientists currently outearn most other professions, with an average salary of $123,000, which continues to go up:

Those who do data science well blend statistical, mathematical, and programming skills with domain knowledge, a tough combination to find in any single person. Of these, I'd argue that domain knowledge matters most as it leads to the process of getting value from data, as Gartner analyst Svetlana Sicular hints:

Organizations already have people who know their own data better than mystical data scientists …. Learning Hadoop is easier than learning the company's business. What is left? To form a strong team of technology and business experts and supportive management who create a safe environment for innovation.

That "safe environment for innovation" is one that affords data practitioners room to iterate.

Innovation is iteration

There are at least two major problems with big data projects. The first is that many companies consider them, well, projects. Big data isn't a one-off project: It's a culture of collecting, analyzing, and using data. As Phil Simon, author of "Too Big to Ignore: The Business Case for Big Data," told me: "Do you think that Amazon, Apple, Facebook, Google, Netflix, and Twitter do? Nope. It's part of their DNA."

The way it becomes DNA, however, is the second detail that trips up companies getting into big data: They think it's a technology issue. While most great big data technology is open source, building out a big data application isn't as simple as downloading Hadoop or the NoSQL database of your choice. As IDC analyst Carl Olofson highlights:

Organizations should not jump too quickly into committing to any big data technology, whether Hadoop or otherwise, as their solution to a given problem, but should consider all the alternatives carefully and develop a strategy for big data technology deployment.

Such careful consideration happens by iterating. Rather than paying a mega-vendor a mega-check to get started (do this, and you are absolutely doing big data wrong), the right approach is to start small. As Thomas Edison noted, the trick is to fail fast or, as he says, "I have not failed. I've just found 10,000 ways that won't work."

Big data is all about asking the right questions, hence the importance of domain knowledge. But in reality, you'll probably fail to collect the right data and to ask pertinent questions -- over and over again. The key, then, is to use flexible, open data infrastructure that allows you to continually tweak your approach until it bears real fruit.

It's not only about big data

As mentioned above, this iterative approach isn't solely for big data. Ideally, most of IT should follow this approach. As one executive at a Fortune 50 bank told me, "Product stability comes from releasing code more frequently, not less. You want each release to be a non-event, not a major launch." This, of course, is the main idea behind agile development.

Agile development is aided by the influx of data technologies that easily embrace dynamic schema such as that supported by Hadoop, as my colleague Dwight Merriman, founder of DoubleClick and MongoDB, suggests:

[Modern development is] agile development. We are talking about lots of iterations, lots of really small releases. We have a release each day; then, we change it. The product manager says, "No, that is not exactly what I wanted," and we change it yet again. This notion of iteration has interesting implications for the database and data layer. If you had a new schema migration every day, that would be painful. But if we have something fluid in terms of what is being stored, that fits really well with this notion of iteration.

Agile iteration, in other words, is the heart of innovation today. While technology facilitates this shift, it's more a cultural shift than a technology shift. To innovate, you and your company need to start thinking of data as an essential ingredient to your day-to-day business, not a point project you code, then move on.

So long as you recognize that this culture will take time to build and accommodate plenty of failure along the way, you, too, can make big data into big business like Facebook and Google do.