Abstract:

Collaborative filtering or recommender systems use a database about user
preferences to predict additional topics or products a new user might like. In
this paper we describe several algorithms designed for this task, including
techniques based on correlation coefficients, vector-based similarity
calculations, and statistical Bayesian methods. We compare the predictive
accuracy of the various methods in a set of representative problem domains. We
use two basic classes of evaluation metrics. The first characterizes accuracy
over a set of individual predictions in terms of average absolute deviation. The
second estimates the utility of a ranked list of suggested items. This metric
uses an estimate of the probability that a user will see a recommendation in an
ordered list. Experiments were run for datasets associated with 3 application
areas, 4 experimental protocols, and the 2 evaluation metrics for the various
algorithms. Results indicate that for a wide range of conditions, Bayesian
networks with decision trees at each node and correlation methods outperform
Bayesian-clustering and vector-similarity methods. Between correlation and
Bayesian networks, the preferred method depends on the nature of the dataset,
nature of the application (ranked versus one-by-one presentation), and the
availability of votes with which to make predictions. Other considerations
include the size of database, speed of predictions, and learning time.