Apache Mahout, a machine learning library for Hadoop since 2009, is joining the exodus away from MapReduce. The project’s community has decided to rework Mahout to support the increasingly popular Apache Spark in-memory data-processing framework, as well as the H2O engine for running machine learning and mathematical workloads at scale.

While data processing in Hadoop has traditionally been done using MapReduce, the batch-oriented framework has fallen out of vogue as users began demanding lower-latency processing for certain types of workloads — such as machine learning. However, nobody really wants to abandon Hadoop entirely because it’s still great for storing lots of data and many still use MapReduce for most of their workloads. Spark, which was developed at the University of California, Berkeley, has stepped in to fill that void in a growing number of cases where speed and ease of programming really matter.

Support for multiple data frameworks is yet another reason to learn Mahout.

This entry was posted
on Thursday, March 27th, 2014 at 3:24 pm and is filed under H20, Machine Learning, Mahout, MapReduce, Spark.
You can follow any responses to this entry through the RSS 2.0 feed.
Both comments and pings are currently closed.