MLflow Opens Up to R

Alex Woodie

Data scientists who work within the R environment can now partake of MLflow, the open source project that Databricks released earlier this year to help manage workflows associated with machine learning development and production lifecycles.

In June, Databricks co-founder and CTO Matei Zaharia unveiled MLflow as a way to automate much of the work that data scientists do when building, testing, and deploying machine learning models. The open source software was designed to fill in the gaps between the various tools, frameworks, and processes when building machine learning systems, including tracking code, packaging models, and deploying them into production.

According to Databricks, MLflow allows users to package their code as reproducible runs, execute and compare hundreds of parallel experiments, on any hardware or software platform, including on premise and cloud based environments. Assistance with hyperparameter tuning is also provided.

“It basically helps you move between data prep, building models, and deploying models,” is how Databricks co-founder and CEO Ali Ghodsi described MLflow in June. “It makes reproduce-ability really easy for all these frameworks that are out there.”

MLflow already supported the development languages most commonly used with Apache Spark, including Python, Scala, and Java. Thanks to work done between Databricks and RStudio Inc. — the Boston, Massachusetts-based company behind the open source RStudio package — there is now an R API that hooks into MLflow version 0.7, which was also just launched today at the Spark + AI Summit Europe taking place in London.

Machine learning developers who work in R are better off for the integration, according to JJ Allaire, CEO of RStudio.

“In many organizations, machine learning workflows are far too ad-hoc, with no systematic tracking of experiments, inadequate protocols around reproducibility, and no consistent way to package and deploy models,” Allaire says in a press release. “Integration of R with MLflow will significantly broaden the reach of the project by allowing a broader community to use and contribute to MLflow.”

Zaharia says he’s seen a “flurry of interest and contributions” around MFflow, which he says “validates the need for an open source framework to streamline the machine learning lifecycle.”

MLflow supports SciKit-Learn, TensorFlow, Keras, PyTorch, H2O, and Apache Spark MLlib packages. It also supports hosted machine learning environments, including the one from Databricks, in addition to Microsoft‘s Azure ML and Amazon‘s SageMaker.