Machine-Learning Platform Certified For Cloudera

George Leopold

(Lassedesignen/Shutterstock)

In the run up to next week’s Hadoop confab in Silicon Valley, vendors are releasing a flock of automation and other tools aimed at beefing up the mainstream data processing framework. Among them is an attempt to incorporate data science with a leading Hadoop distribution via a machine-learning approach.

Boston-based data science automation specialist DataRobot said this week its machine-learning platform designed to fill the data science skills gap has been certified on Cloudera Enterprise 5. The Cloudera platform is used for data management and analytics. DataRobot said the certification means Cloudera users could incorporate machine learning into their analytics operations without additional interfaces or protocols. The approach also eliminates the need for extensive data management skills, the company added.

The machine-learning automation software platform is designed to help deploy new and updated releases of Cloudera Manager using Parcels, the mechanism used by Cloudera to distribute software to a managed cluster.

The automated approach also integrates CDSs, or custom service descriptors, that allows the Cloudera manager to monitor resources used by DataRobot. Cloudera Manager 5 was unveiled last year. It added CSDs as a way for customers to add their own managed services.

DataRobot added that its automation tool also incorporates the Kerberos network authentication protocol for supporting Active Directory and LDAP (Lightweight Directory Access Protocol). In addition, the automation platform is enabled for YARN to provide resource management in multi-tenant environments as well as Apache Spark for in-memory data processing, the company added.

DataRobot claimed it is so far the only independent software vendor certified by Cloudera on Spark, YARN, CSDs and Parcels.

The inclusion of Spark support is timely given Cloudera’s “One Platform Initiative” launched last fall. The initiative is designed to close existing gaps between Spark and Hadoop while giving Spark the enterprise chops needed to be the default engine for workloads in Hadoop. That could help it take the mantle from MapReduce, which for years had been the go-to technology for a range of Hadoop frameworks.

A Cloudera customer noted in a testimonial that the integration of DataRobot’s machine learning platform with Hadoop would help boost the analytics haul on its data lake running on the Cloudera platform while improving the performance of its data science team.

During Strata + Hadoop World next week in San Jose, DataRobot said it would be demonstrating how its platform automates the use of open source machine learning algorithms to build predictive models. The company claims its approach eliminates the need for additional coding among other manual math asks.