Tuesday, 8 January 2013

How online recommendations work

I recently wrote a short article for PCQuest on collaborative filtering. It has since been published in both the print form (Jan 2013) and online too. In case, you're not able to access the online version, I've created a backup here, or you can read my article below:

It's late at night and you're bored. The television is devoid of entertainment- fairly typical. You're in the mood for a movie anyway. This latest one has great reviews but you're still not sure if it lives up to your high standards, so you call a friend who watched it recently. Once it passes the litmus test, you head online and purchase the movie. The movie is engaging and you have a wonderful time.

How is this relevant to your online experience? Online services like Amazon and Netflix make a living acting as your friends, ostensibly helping you out by recommending things to purchase along the way. Even when you purchase the movie, your information is stored and processed to be served as recommendations to you and even others. The better their recommendations, the more you're likely to follow their recommendations and purchase the product (at least in theory). In any case, your overall online experience is enhanced and you're pleased with their astute inferences.

This innocuous recommendation feature is in reality powered by sophisticated algorithms and data crunching machines which reside in Amazon's data centers. Companies spend a large amount of time constantly refining these algorithms.

There are various ways one might implement this algorithm. Companies might examine users who are similar to you and use this information to serve you recommendations. They might decide to identify similar or correlated items.
One popular algorithm to match similar items (very basic and naive) is outlined below:

for each item I1
for each customer C who bought I1
for each I2 bought by some customer C
record purchase C{I1, I2}
for each item I2
calculate similarity(I1, I2)
return table

Basically, items that a particular customer bought together are stored in a table. This is done for all items, and this information is used to calculate a similarity rating to match similar items. Similarity is calculated using the resultant item vectors (I1 and I2 for example) and algorithms like the cosine similarity algorithm take these vectors as inputs to produce a similarity rating. Billions of records are thus processed. All the complicated and heavy processing is done in data centers. When you click on a item, Amazon refers to these tables (this is a relatively fast operation; the building of these tables is the slow part) to determine which items to recommend to you.

It's interesting how such seemingly simple "customer's who bought this also bought this" feature is backed by so much research and complexity. In a world where customer attention is king, every competitive advantage counts.

So the next time you get a recommendation online, think of the lengths that such companies go through to get you this information. Don't feel guilty though- this service is not free because you give them your information to work with too.