Hadoop in the Cloud – Infochimps and VMware

Infochimps is proud to be a part of a new effort launched today by VMware to enable big data applications running on Hadoop to be deployed more easily on top of virtual and cloud-based IT environments. The Serengeti project, released today under the Apache 2.0 license, is built upon a number of open source technologies including our own Ironfan tool and supports all major Hadoop distributions including Cloudera, Greenplum, Hortonworks, and MapR.

Ironfan is the foundation of the Infochimps Platform and the basis of our customers’ Big Data deployments. It makes provisioning and configuring Big Data infrastructure simple – you can easily spin up clusters when you need them and kill them when you don’t, so our customers can spend their time, money, and engineering focus on finding insights, not configuring and deploying machines. Ironfan is quickly becoming the number one deployment tool for Hadoop platforms in the cloud, and this endorsement by VMware and inclusion in Serengeti is further evidence of the popularity of the tool.

What does Serengeti mean for Infochimps users?
From the beginning, the Infochimps Platform has been built on a foundation of open source tools for managing data that simplify the experience of working with complex technologies such as Hadoop. Within the Infochimps Platform, Ironfan, as well as other tools like Wukong and Swineherd, are major open sourced components of the stack. And with our enterprise tools including Data Delivery Service and Dashpot, customers can deploy complete Big Data environments and be assured of highly reliable delivery of data to their Hadoop environments.

The Serengeti project supports our open source tradition with its strong open source foundation and support by all of the major Hadoop distributions. Within the Serengeti project, Ironfan enables users to quickly and easily configure and deploy Hadoop clusters on top of VMware vSphere® in minutes with a single command. Now, users running VMware’s virtual and cloud infrastructure can more easily take advantage of the power of Hadoop as well as other Big Data technologies like the Infochimps Data Delivery Service, Dashpot, and Infochimps big data expertise to manage, process, and analyze massive amounts of unstructured, semi-structured, or structured data at scale and in the cloud.

We’re excited to be included in Serengeti and look forward to working with VMware customers and partners as they further their use of Big Data technologies.

Interested in learning more about Infochimps, VMware, and Serengeti? Contact us today for more information!