EMC joins forces with Hadoop distributor MapR Technologies

EMC today formally announced a reseller partnership with MapR Technologies, a start-up that plans to sell a proprietary MapReduce product based on Apache Hadoop.

MapR's product offers multiple channels to data via the Network File System protocol, which is widely used in network-attached storage today. The company also re-architected the distributed NameNode, the centerpiece of an HDFS file system. The NameNode is a hierarchical naming system on a distributed database in the same vein as a single domain name space. The rearchitected NameNode offers greater high availability, Schroeder said.

MapR said it also eliminated all single points of failure in the Hadoop stack and created an automated failover feature called Job Tracker, which shares application jobs between multiple nodes so that if a primary node fails, it automatically picks up the task on the next available node.

MapR also added data mirroring for business continuity, wide area replication support and data snapshot capability to its software for greater resiliency.

"The only data protection within Hadoop is replication," Schroeder said. "Typically people make three copies fo data. That doesn't help you if you have a user or application error."

The snapshot capability allows administrators to roll an application back to a time prior to an error. For example, if an application or user error occurred at 9 a.m., the administrator can roll the application image back to 8:59 a.m.

"It's the same thing you have in any serious storage platform from companies like EMC, HP or NetApp," he said.

Because MapR's file system is more efficient than HDFS, users will achieve two to five times the performance over standard Hadoop nodes in a cluster, according to Schroeder. That translates into being able to use about half the number of nodes typically required in a cluster, he said.

"Hadoop nodes cost about $4,000 per node depending on configuration. If you add in power costs, HVAC, switching, and rackspace, you'll probably double that," Schroeder said. "Our product can immediately save you $4,000 and over 8 years it'll save you $8000 per node."