2.
Overview■ Apache Mahout: apache-licensed library with the goal to provide highly scalable data mining and machine learning■ its collaborative filtering module is based on the Taste framework of Sean Owen■ mostly aimed at production scenarios, with a focus on □ processing efficiency □ integratibility with different datastores, web applications, Amazon EC2 □ scalability, allows computation of recommendations, items similarities and matrix decompositions via MapReduce on Apache Hadoop■ not that much used in recommender challenges □ not enough different algorithms implemented? □ not enough tooling for evaluation? → it‘s open source, so it‘s up to you to change that! 13.09.2012 DIMA – TU Berlin 2

10.
Project: novel item similarity measure■ in the Million Song DataSet Challenge, a novel item similarity measure was used in the winning solution■ would be great to see this one also featured in Mahout■ Task □ implement the novel item similarity measure as subclass of Mahout’s ItemSimilarity■ Future Work □ this novel similarity measure is asymmetric, ensure that it is correctly applied in all scenarios 13.09.2012 DIMA – TU Berlin 10

11.
Project: temporal split evaluator■ currently Mahout‘s standard RecommenderEvaluator randomly splits the data into training and test set■ for datasets with timestamps it would be much more interesting use this temporal information to split the data into training and test set■ Task □ create a TemporalSplitRecommenderEvaluator similar to the existing AbstractDifferenceRecommenderEvaluator■ Future Work □ factor out the logic for splitting datasets into training and test set 13.09.2012 DIMA – TU Berlin 11

12.
Project: baseline method for rating prediction■ port MyMediaLite’s UserItemBaseline to Mahout (preliminary port already available)■ user-item-baseline estimation is a simple approach that estimates the global tendency of a user or an item to deviate from the average rating (described in Y. Koren: Factor in the Neighbors: Scalable and Accurate Collaborative Filtering, TKDD 2009)■ Task □ polish the code □ make it work with Mahout’s DataModel■ Future Work □ create an ItemBasedRecommender that makes use of the estimated biases 13.09.2012 DIMA – TU Berlin 12