Improvements to Collaborative Filtering Algorithms

Anuja Gokhale

M.S. Thesis
Computer Science Department, WPI
May 1999

Abstract

The explosive growth of mailing lists, Web sites and Usenet news has
caused information overload. It is no longer feasible to search
through all the sources of information available in order to find
those that are of interest to an individual user.

Collaborative filtering systems recommend items based upon opinions of
people with similar tastes. Collaborative filtering overcomes some
difficulties faced by traditional information filtering by eliminating
the need for computers to understand the content of the
items. Further, collaborative filtering can also recommend articles
that are not similar in content to items rated in the past as long as
like-minded users have rated the items. Unfortunately, collaborative
filtering is not effective when there are too few users that have
rated an item or for users that do not have a strong history or
correlation with other users.

Content-based systems use content to filter or recommend items. These
perform well when users know and specify topics in which they are
interested. Recommendations for a user are based solely on a profile
built by analyzing the content of the items which that user has rated
in the past. Content based filters face problems of
over-specialization. When the system can only recommend items scoring
highly against a user's profile, the user is restricted to seeing
items similar to those she has already seen. Also, it is often
difficult for content-based filters to understand the meaning of text
or even the actual content of complex items.

We combine the strengths of content-based filtering techniques with
collaborative filtering to provide more accurate recommendations. We
use thresholds to improve the accuracy of traditional filtering
algorithms, and design and implement a way to apply content-based
filtering to an online newspaper. We compare our improved algorithms
to current algorithms using both off-line and online experiments and
show that these result in more effective filters that can help manage
the massive amount of information that is confronting us today.