Deconstructing Recommendations on Spark

This talk will focus on the practical details of building a recommendation engine on top of Spark's ML Lib ALS collaborative-filtering algorithm that can reliably generate predictions for 25 million users from a space of 5 million products. The unique aspect of this work is two-fold. First, we are able to generate scores for every combination of user and product (125 trillion possible values) on a small 6-node cluster. Secondly, clever optimization provides several orders of magnitude improvement over ML Lib's predictive step with linear performance scaling as more cores are added to the system. The primary goal is to present the optimizations and parameter tuning necessary to achieve these gains coupled with a discussion of the Spark internals that come into play. The talk will be tailored for the intermediate Spark developer who wishes to understand the trickier aspects of Spark and how these affect both stability and performance.