tag:blogger.com,1999:blog-6569681.post114306381909454743..comments2016-12-09T07:22:39.222-08:00Comments on Geeking with Greg: Early Amazon: SimilaritiesGreg Lindenhttp://www.blogger.com/profile/09216403000599463072noreply@blogger.comBlogger6125tag:blogger.com,1999:blog-6569681.post-1143157824142854642006-03-23T15:50:00.000-08:002006-03-23T15:50:00.000-08:00nathan, that sounds plausible. I like your idea b...nathan, that sounds plausible. I like your idea better than mine, actually. Much simpler. It kinda sounds like an IDF (inverse document frequency) weight. Things with near ubiquity contribute very little to "relevance". <BR/><BR/>Reminds me of the Greiff 1998 SIGIR paper.. it is actually items in the middle of the IDF curve that give you the best performance. Very low idf (words like "the" and "an") are near useless. Very high idf (words like "snighzimpup") are also near useless. It is words in the mid-range that are the most valuable.<BR/><BR/>I can see the same thing for books. If a book is purchased near ubiquitously (Harry Potter) it is not very valuable. If a book is purchased only once, and you happen to also buy something that one other person also bought, that can be equally useless...because you've got high potential variance with only a single datapoint. <BR/><BR/>But for that stuff in the middle.. a medium amount of purchases, with a medium amount of "people who also bought" links.. I can see why that'd be golden.jeremynoreply@blogger.comtag:blogger.com,1999:blog-6569681.post-1143141548268493562006-03-23T11:19:00.000-08:002006-03-23T11:19:00.000-08:00My guess is that one criteria being considered is ...My guess is that one criteria being considered is that the higher the sales rank of a book, the less it counts in the recommendation engine. If the purchase rate=100%, the recommendation weight=0, and conversely if the purchase rate=.000000001%, then recommendation weight=9.9 (on 10 pt scale, of course).<BR/><BR/>So, if you bought Harry Potter (imaginary weight=0.1) and One-Legged Poets of the Seventh Century BC (9.8), you're recommendations would be far more heavily weighed towards weird poet books. Of course, the disparity would have to be smaller than that, to ensure that if you bought ten wizard books and one poet book, you got more wizard recommendations than poets. So I guess you'd run with a smaller range, say a default of 6 and a max of 10, so a Harry Potter book would be 6.1 and the less popular book would be 9.8, with extra books in a series counting a lot less than a full book. So, if you bought two equally popular non-series wizard books from different series, they'd count 12.2 against wizard recommendations, while two Harry Potter books would count considerably less, perhaps 8.1 (one full book and one third book) and one poet book would count 9.8. Weighing all the various numbers agains other user's histories would give results you could use, as opposed to a simple similarity system.<BR/><BR/>Just rambling some ideas...Nathan Weinberghttp://www.blogger.com/profile/08092706828301421060noreply@blogger.comtag:blogger.com,1999:blog-6569681.post-1143129650382759312006-03-23T08:00:00.000-08:002006-03-23T08:00:00.000-08:00I, too, am interested in hearing about the solutio...I, too, am interested in hearing about the solution, although I realize Greg has absolutely no incentive to share it (and probably some disincentives, in the form of NDAs?).Adrian Holovatyhttp://www.holovaty.com/noreply@blogger.comtag:blogger.com,1999:blog-6569681.post-1143079599831013422006-03-22T18:06:00.000-08:002006-03-22T18:06:00.000-08:00So.. any hints on the solution? It seems the firs...So.. any hints on the solution? It seems the first thing I'd try would be to weight my co-occurrences by content. I.e. rather than doing just a simple purchase "link" analysis, I would weight the links higher or lower, based on the textual or content (objective) similarity of the two media objects in question. Perhaps I'd also throw in metadata, such as category (home:garden, electronics, etc) where available and appropriate.<BR/><BR/>Is this kinda along the lines of what you did?<BR/><BR/>(BTW, I will evenutally go non-anonymous, as you asked, once I finally set up a blog. I'm too lazy at the moment. Workin' on it..)jeremynoreply@blogger.comtag:blogger.com,1999:blog-6569681.post-1143075571985169982006-03-22T16:59:00.000-08:002006-03-22T16:59:00.000-08:00You know, I've noticed that non-technical people d...You know, I've noticed that non-technical people don't understand that sometimes, what appears to be simple - is actually very difficult & a what appears to be a BIG change - is actually easy. <BR/><BR/>Congrats for getting recognition from the top, that's rare (for most of us in the corporate world).Arniehttp://www.blogger.com/profile/01669076739841687326noreply@blogger.comtag:blogger.com,1999:blog-6569681.post-1143067838887197922006-03-22T14:50:00.000-08:002006-03-22T14:50:00.000-08:00Wow. That would be the highlight of my entire pro...Wow. That would be the highlight of my entire professional career, a billionaire doing that. Wow.<BR/><BR/>This is exactly why I love the Amazon series of posts.Nathan Weinberghttp://www.blogger.com/profile/08092706828301421060noreply@blogger.com