8 Reasons Big Data Projects Fail

Most companies remain on the big data sidelines too long, then fail. An iterative, start-small approach can help you avoid common pitfalls.

Big data is all the rage, and many organizations are hell bent on putting their data to use. Despite the big data hype, however, 92% of organizations are still stuck in neutral, either planning to get started "some day" or avoiding big data projects altogether. For those that do kick off big data projects, most fail, and frequently for the same reasons.

The key to big data success is to take an iterative approach that relies on existing employees to start small and learn by failing early and often.

Herd mentalityBig data is a big deal. According to Gartner, 64% of organizations surveyed in 2013 had already purchased or were planning to invest in big data systems, compared with 58% of those surveyed in 2012. More and more companies are diving into their data, trying to put it to use to minimize customer churn, analyze financial risk, and improve the customer experience.

Of that 64%, 30% have already invested in big data technology, 19% plan to invest within the next year, and another 15% plan to invest within two years. Less than 8% of Gartner's 720 respondents, however, have actually deployed big data technology.

That's bad, but the reason for the failure to launch is worse: Most companies simply don't know what they're doing when it comes to big data.

8 ways to failBecause so many organizations are flying blind with their data, they stumble in predictable ways (including thinking that a data scientist will magically solve all their problems, but more on that below). Gartner's Svetlana Sicular has catalogued eight common causes of big data project failures, including:

Management resistance. Despite what data might tell us, Fortune Knowledge Group found that 62% of business leaders said they tend to trust their gut, and 61% said real-world insight tops hard analytics when making decisions.

Selecting the wrong uses. Companies either start with an overly ambitious project that they're not yet ready to tackle, or they attempt to solve big data problems using traditional data technologies. In either case, failure is the usual result.

Asking the wrong questions. Data science is a complex blend of domain knowledge (the deep understanding of banking, retail, or another industry); math and statistics expertise; and programming skills. Too many organizations hire data scientists who might be math and programming geniuses but who lack the most important component: domain knowledge. Sicular is right when she advises that it's best to look for data scientists from within, as "learning Hadoop is easier than learning the business."

Lacking the right skills. This one is closely related to "asking the wrong questions." Too many big data projects stall or fail due to the insufficient skills of those involved. Usually the people involved come from IT -- and those are not the people most qualified to ask the right questions of the data.

Unanticipated problems beyond big data technology. Analyzing data is just one component of a big data project. Being able to access and process the data is critical, but that can be thwarted by such things as network congestion, training of personnel, and more.

Disagreement on enterprise strategy. Big data projects succeed when they're not really isolated "projects" at all but rather core to how a company uses its data. The problem is exacerbated if different groups value cloud or other strategic priorities more highly than big data.

Big data silos. Big data vendors are fond of talking about "data lakes" and "data hubs," but the reality is that many businesses attempt to build the equivalent of data puddles, with sharp boundaries between the marketing data puddle, the manufacturing data puddle, and so on. Big data is more valuable to an organization if the walls between groups come down and their data flows together. Politics or policies often stymie this promise.

Problem avoidance. Sometimes we know or suspect the data will require us to take action that we don't really want to do, like the pharmaceutical industry not running sentiment analysis because it wants to avoid the subsequent legal obligation to report adverse side effects to the U.S. Food and Drug Administration.

Throughout this list, one common theme emerges: As much as we might want to focus on data, people keep getting in the way. As much as we might want to be ruled by data, people ultimately rule the big data process, including making the initial decisions as to which data to collect and keep, and which questions to ask of it.

Innovate by iteratingBecause so many organizations seem hamstrung in their attempts to start a big data project, coupled with the likelihood that most big data projects will fail, it's imperative to take an iterative approach to big data. Rather than starting with a hefty payment to a consultant or vendor, organizations should look for ways to set their own employees free to experiment with data.

A "start small, fail fast" approach is made possible, in part, by the fact that nearly all significant big data technology is open source. What's more, many platforms are immediately and affordably accessible as cloud services, further lowering the bar to trial-and-error.

Big data is all about asking the right questions, which is why it's so important to rely on existing employees. But even with superior domain knowledge, organizations still will fail to collect the right data and they'll fail to ask pertinent questions at the start. Such failures should be expected and accepted.

The key is to use flexible, open-data infrastructure that allows an organization's employees to continually tweak their approach until their efforts bear real fruit. In this way, organizations can eliminate the fear and iterate toward effective use of big data.

When selecting servers to support analytics, consider data center capacity, storage, and computational intensity. Get the new Hadoop Hardware: One Size Doesn't Fit All issue of InformationWeek Tech Digest today (free registration required).

Matt Asay is Vice President of Community at MongoDB. He was previously VP of Business Development at Nodeable. You can reach him at mjasay@mac.com and follow him on Twitter @mjasay. View Full Bio

Great overview. According to a recent IDG SAS survey, knowing which questions to ask and interpreting meaningful insights are two tasks very few organizations feel capable of accomplishing. There needs to be more education around making use of data. At the same time, organizations need to learn to start small, celebrate successes and build on.

Nothing beats the basics. Such as an intimate understanding of the data. Other basics exist. Such as Gartner's research is more than a year old. Ancient in othr words. Gartner's news release did not clarify if they had 720 respondents, or if they sampled 720. No dicussion of response rate. They say, "survey of 720 Gartner Research Circle members"--which doesn't actually sound like a sample at all. Given the lack of basic information it is hard to tell if the small year to year differences may or may not be meaningful. What Gartner's study says to me is that despite the hype, nothing much changed year to year.

I agree with the test and learn stage. The analytics of big data is a crucial element within further understanding the big data process and to create strategies. With so much data available for the enterprise, the right test and learn applied with appropriate metrics is key.

This is actually 100 percent spot on. The word "big data" has already become a byword along with "context" and yet only a few people really understand them. The herd mentality certainly hits a soft spot to me, as well, because that's how I see most of these companies do: just because everybody is doing it, they think they have to do it too. I am not saying only a few have to do big data, but again, it's about understanding the reason why it should be done in the first place. With understanding of the vision and/or the goal comes the proper planning that helps identify the right process, people, and technology for big data.

I've enjoyed Matt Asay's writing on big data for some time. I'm really please that he offered this contribution to InformationWeek. Practical, real-world advice for an enterprise audience is our stock in trade, so this column is a perfect fit. I particularly like (and echo) the advice to rely on your existing staff to experiment with big data approaches to known problems.

"Sicular is right when she advises that it's best to look for data scientists from within, as "learning Hadoop is easier than learning the business." Interesting to hear an expert like Matt state this so succintly. We hear the same from some forward-looking CIOs. Are you listening, hiring managers?

Most IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.

Why should big data be more difficult to secure? In a word, variety. But the business won’t wait to use it to predict customer behavior, find correlations across disparate data sources, predict fraud or financial risk, and more.