Smart systems, makers and technology

Menu

Recommendation Engine with MongoDB and Mahout

The goal of this article is to show how to setup a very simple recommendation engine on top of a MongoDB in combination with Apache’s open source machine learning library Mahout. The recommender engine / collaborative filtering code inside Mahout was formerly a separate project called Taste and has continued development inside Mahout alongside other Hadoop-based code. Our simple recommendation engine should implement a matrix factorization method, in order to calculate and add the missing ratings of users within a sparse matrix of given user ratings.

All implementations of the interface org.apache.mahout.cf.taste.recommender.Recommender are intended to give the user a number of recommended items, or simply to return the specific rating for a single item. The Recommender interface is based upon a specific implementation of the DataModel interface, which stands for the data repository that holds and persists the user ratings. Mahout gives you a predefined set of already supported DataModel implementations, containing file based repositories as well as Apache Hadoop or MySQL repositories. Since version 0.6 Mahout also added a MongoDB DataModel implementation (org.apache.mahout.cf.taste.impl.model.mongodb.MongoDBDataModel). You can find the class MongoDBDataModel in a second Mahout package, which is called integration, it is not included in the core Mahout package.

Now we build an example of such a document in your MongoDB:MongoDBDataModel represents a DataModel backed by a MongoDB database. This class expects a collection in the database which contains a user ID (long or ObjectId), item ID (long or ObjectId), preference value (optional) and timestamps (“created_at”, “deleted_at”).

After you have created the MongoDB content above, you can create a new instance of the class MongoDBDataModel. This instance of DataModel is the connection to your MongoDB database. The model provides an access interface to all necessary information about items, users and their ratings for the recommendation engine:

After successfully creating an instance of the datamodel, which is refering to the ratings table/document, you can start to build a simple recommender. In order to calculate the missing ratings we will use the matrix factorization approach, which is implemented within the Mahout SVDRecommender class (org.apache.mahout.cf.taste.impl.recommender.svd.SVDRecommender). The SVDRecommender method demands for a specific factorizer implementation, where we will choose the “Alternating-Least-Squares with Weighted-λ-Regularization” factorization, which is implemented within the class ALSWRFactorizer.