Amazon: Customers Who Bought Related Items Also Bought

Perhaps Amazon has had this feature for a while, but today, for the first time I noticed a section labeled “Customers Who Bought Related Items Also Bought” as seen in the screen shot above. I was looking at an unreleased book, which might explain why they couldn’t show me information based on customers who actually bought the item.

Has anyone else noticed this? Am I just late to the party? I tried to find more information online, but nothing showed up. I assume they are using some item similarity measure to assemble a set of related items, and then are basing collaborative filtering on the purchase history associated with that set.

I’m very curious to hear more from anyone who is familiar with this functionality.

14 responses so far ↓

From what I’ve read of this, it’s using item-item collaborative filtering. Most collaborative filtering has a matrix of items-users, trying to figure out what items a particular user would want. This one is using items as both the rows and columns of the matrix, trying to figure out what other items are popular given an interest in a starting item.

But in this case, the item it’s using has no purchase history. So they must be taking a semi-supervised approach, using some rule-based or statistical similarity measure to identify related items that do have a purchase history, and then combining the collaborative filtering results obtained from those items. I’ve seen this idea described in research papers on data mining, but this is the first time I’ve seen it implemented.

Couldn’t they also be looking at a correlation of pre-orders or wishlists for the product to seed the correlation? for products that are not yest released, it seems like an actionable substitute to drive results.

Bryan, that’s possible, but in this case I doubt it. I happen to like the book I used as an example (I reviewed it for the publisher), but I doubt enough people know about it to have pre-ordered it. Maybe more people know now, since this post made it onto Techmeme. 🙂

Chris, I did see that page when I tried to research the feature. But it left me with two questions:

1) What RelationshipType values or combination thereof do they use? This is content-driven similarity, not the collaborative filtering for which Amazon is famous. That’s not to say they can’t do both–they obviously do. But I’m curious if they’ve published anything about it.

2) How do they then use the set of content-driven related items as inputs to their collaborative filtering engine? Do they assign weights based on some measure of similarity? How do they account for the diversity of results, which might confound a vector-based approach?

OK, that’s more than two questions. But it gives you an idea of why I find this so interesting. And why I’m surprised not to find anything about it on the web. For all I know, this feature has been available for a while, but no one else seems to have taken the time to notice it.

I am not sure how they might be pulling off the “related items” concept, but I’d be interested in any insights you gain on it, Daniel (so hopefully, you’ll share what you find in a future post). I’ve tried to work out a way to do this with information about employees in an enterprise – I have built a simple solution but not one I’m happy with. I’ve described the work on my blog but haven’t made any progress since that write-up.

Lee, I’ll share what I learn. I think the interesting question is what approach they take to content-based similarity, especially given that their products have nominal rather than numerical attributes. I looked at this problem several years ago; you can find my SIAM Data Mining 2002 paper here:

“Surely this system is just picking the most similar books and giving you their standard collaborative filtering results?”

I don’t doubt it, but that’s a bit underspecified. How many “most similar” books? Do they contribute equal weights, or are the weighed based on the degree of similarity? For that matter, is the weighting linear? Do they do anything to address diversity within the set, i.e., books that are similar to the unpublished item but very different from one another? And how “standard” is their collaborative filtering in the first place?

Regardless, thanks for the pointer to Ruthven’s presentation. Interesting stuff, even if it doesn’t answer the above questions.

And, on an unrelated note, I hope you like the subtitle of that book you were looking at. I’ll vouch for the quality of the contents; my own contributions as a reviewer were cosmetic.