Platform Computing Extends HPC Reach Into MapReduce

High-performance computing leader Platform Computing hopes to capitalize on the big data movement by spreading its wings beyond its flagship business of managing clusters and grids and into managing MapReduce environments, as well. As was the case when Platform made its foray into cloud computing in 2009, the news is significant, because Platform has a solid foundation among leading businesses, especially in the financial services industry. If large financial organizations were leery about taking their analytics efforts to the next level, Platform might help spur them along, and it might help drive even further choice for customers by driving other HPC vendors into the MapReduce and Hadoop space.

Technologically, Platform’s new forthcoming product, called MapReduce Workload Manager which will be formally announced this summer, extends the management capabilities of the company’s flagship LSF and Symphony products to MapReduce systems. Platform’s approach is unique, because it doesn’t tie users to any specific MapReduce implementation or file system, but rather, lets them mix and match their processing and storage software. It works with Apache’s Hadoop MapReduce and Hadoop Distributed File System (HDFS), but also with Apache Hive and Apache Pig — all of which use Java APIs — but also with IBM InfoSphere BigInsights, and Python and C++ MapReduce APIs.

On the storage end, MapReduce Workload Manager Platform’s products will also support Appistry CloudIQ Storage Hadoop Edition and IBM Global Parallel File System (GPFS), but it plans to expand to cover even more in the future. The product makes Platform somewhat competitive with Cloudera, in that it can offer advanced systems management for Apache Hadoop, but also because it opens the doors for users, however unlikely, to avoid choosing Cloudera, or even Apache Hadoop, for any layer of their MapReduce-centric big data stack.

Any time Platform moves into a new space involving distributed systems, it’s a big deal because of its strong and trusted presence within mission-critical HPC data centers. As Platform VP of Product Management Ken Hertzler explained, the company boasts 10 of the top 20 Fortune 500 companies as customers, as well as many other large financial institutions. Hertzler said its financial customers have been demanding this type of product after experiencing deluges of data and analytics tools ill-equipped to deal with all the new information. Most have likely been experimenting with Hadoop to process that data, and now they have a trusted vendor available to make those systems easier to manage, better-performing and more reliable. Platform claims MapReduce Workload Manager its product can scale to 10,000 nodes per application, handle 17,000 tasks per second and operate at sub-millisecond latency.

I think Platform’s presence in the space also might spur even greater big data participation by other former HPC-only vendors like Univa and Adaptive Computing, which also have expanded their scopes to encompass cloud computing in the past couple of years. The trend appears to be following massively distributed systems wherever they go, and now they’re heading toward big data. Univa actually has a secondary big data play thanks to the Grid Engine software (created by Sun and pretty much abandoned by Oracle) it now develops and supports because Grid Engine can integrate with Apache or Cloudera Hadoop clusters, but Univa doesn’t yet sell a Hadoop- or MapReduce-focused product similar to what Platform is now doing plans to do.

Whatever happens, Platform’s entrance into this space is further validation of Hadoop — which has been driving the big data hype wagon — and further evidence that major banks are very interested in moving their big data efforts into production. Many might do so with Platform, but everyone selling big data tools should looks at this as an opportunity to strike while the iron’s hot in an industry known for spending on IT.