In battle for Hadoop, MapR raises $30M

There’s a lot of positioning within the Hadoop community over who has the most contributors to Apache Hadoop and whose distribution is the most open source. Depending on the source, MapR might be singled out as the antithesis of what Hadoop should be. But MapR doesn’t mind the digs: The company is racking up customers and just closed a $30 million venture-capital investment that brings its total funding to $59 million since launching in 2011.

Because its roots are as an open-source project, some members of the Hadoop community are rightfully concerned about keeping it as open as possible. This gives customers more flexibility in moving from product to product, they argue, and could help prevent a technological splinter like what happened with Unix in the 1980s and significantly slowed the popular operating system’s uptake and rise to ubiquity.

Advertisement

MapR’s feature list

MapR catches some flak because it has made its name pushing a pair of Hadoop distributions (one free and one not) that are based on the company’s proprietary file system that it claims is significantly faster than the standard Hadoop Distributed File System that many of its competitors use. Last year, it announced a commercial version of the usually HDFS-based HBase database, currently in beta, that also includes many of MapR’s homegrown improvements around performance and reliability.

Although, according to MapR VP of Marketing Jack Norris, the criticisms of its semi-proprietary aren’t entirely fair. He told me during a recent call that there are more than a dozen open-source packages within the company’s Hadoop distribution, and noted that allowing data access via NFS is hardly a tool of vendor lock-in.

But at the end of the day, MapR is a business and it’s doing what it can to make money in the new world of big data. If customers want features they can’t get from open-source versions of Hadoop, MapR will gladly supply them. In fact, he said, open source is “really not a core issue that comes up during the sales cycle.” (Norris took a more-defensive tone in a discussion about this topic last year: “No one can name the top 5 or 10 engineers on Oracle’s database,” he told me, “and no one really cares.”)

Norris points to a recent blog post from Gartner analyst Merv Adrian in defending his company’s position. Addressing the concern over open source and Hadoop — particularly as it relates to MapR and former OEM partner EMC — Adrian wrote: “Having some components of your solution stack provided by the open source community is a fact of life and a benefit for all. So are roads, but nobody accuses Fedex or your pizza delivery guy of being evil for using them without contributing some asphalt.”

But MapR could just as easily point to its customer list and partnerships to prove the effectiveness of its approach, at least. Norris said its customers in fields such as advertising and retail analyze data on more than 90 percent of the internet population monthly and more than a trillion dollars in transactions every year. (It’s pretty mum on naming customers, although Norris did cite ComScore and Ancestry.com as users.) Both Amazon Web Services and Google have partnered with MapR to boost Hadoop performance on their cloud platforms.

Still, Hadoop is still relatively young as a commercial technology and it’s very early on for Hadoop as an IT market all its own. What customers like now might not be what they like forever, and there’s plenty of competition for those workloads and dollars. When you look at its bigger, better-funded and better-known competitors such as Cloudera, Hortonworks, EMC Greenplum and now Intel, it’s easy to see just how tough a fight MapR has in front of it.

Norris isn’t sweating it, though. “The big major weakness that needs to be addressed [with Hadoop] is the dynamic read/write capability of HDFS,” he told me. As long as the other players keep relying on HDFS at the storage layer, MapR will at least have a strong point of differentiation.