Thank you

Sorry

If I asked you for the defining characteristic of a big data customer, you'd probably say they're sitting on large amounts of data. If I asked for the defining characteristic of a NoSQL customer, you might answer they require high levels of concurrency.

Well, if that's the total market for NoSQL and big data, then both MongoDB, Inc., as well as the various companies supporting Hadoop should probably shut their doors and call it a day.

In truth, opting for Hadoop is in many ways an economic decision. If a company has deep pockets and daunting amounts of data, then it can throw money at a high-end MPP solution from IBM, SAP, or Teradata -- in fact, most large companies have already made that sort of investment. But not all of us hang out with the 1 percent and light our cigars with $100 bills. Even those that do then have to make business decisions "up front" on whether the exorbitant costs of keeping data and deciding what to do later.

For the rest of us, Hadoop provides analytics capabilities we couldn't access before. Even the cost of commercially supported "enterprise" distributions of Hadoop amounts to nickels on the dollar compared to, say, IBM Netezza.

NoSQL technologies like MongoDB or Neo4j are also, in effect, economic decisions. If you buy a fat enough server and pay for enough developer time, you can indeed run nearly any document or graph database job in your favorite RDBMS. But developer time is not cheap and server licenses get expensive -- plus, the infrastructure to scale up an RDBMS so that it supports high availability and disaster recovery costs a bundle. No wonder the brighter operations folks like the sound of the NoSQL alternatives: Save money by using commodity hardware, and snap on more servers as needed.

In all but the tiniest companies, it's a myth that your data is "small" and your concurrency requirements are light. If Hadoop and MongoDB were aimed only at the companies that are already capturing massive amounts of data and have millions of users, the market would be much smaller than MongoDB's valuation alone implies.

The dirty secret is that big data and NoSQL vendors aren't just targeting gigantic, consumer-facing companies like Facebook or Google. The technology applies much more broadly, and as the supply of high-concurrency, low-cost, flexible data storage increases, so will demand. If you can hoard all that data cheaply, why not mine it cheaply as well and compete with the big names?

Don't get me wrong -- massively scaled companies with massive amounts of data can and do deploy NoSQL databases and big data tools. However, your IT department has trained itself to throw away data it doesn't think is relevant today. This stuff is useful to all of us and changes the way we think about data: capture first, analyze later. It's a much bigger market than you think.

This story, "The dirty truth about big data and NoSQL" was originally published by
InfoWorld.