DeviantArt, with its huge number of artworks and a large userbase, is just the kind of site that could use a good recommendation engine. A recommendation engine is basically a program that analyzes your tastes and recommends some images/products/whatever that you might like.

Unfortunately there don’t seem to be any official plans to create a recommendation system. So, being the naive creature that I am, I went ahead and started building my own recommendation engine for DA. Maybe I’m in over my head.

Seeing is believing

Here are some screenshots of recommendations that my current system generated. They’re all of the “people that liked this also liked that” type – deviation-based. The script can also make user-based recommendations – “based on your past favorites, you might like this” – but I won’t post those screenshots here, because suggestions the script made for me wouldn’t make a lot of sense to you 😛

Anyway, here we go. The “source” deviation has a red border, and the pictures are the top five generated recommendations. If you think I chose the best examples you are, of course, completely correct 😉

The algorithm might be improved by taking into account what categories each deviation belongs to, so that the suggestions are similar to the initial image.

Getting Technical

I used the free version of Vogoo PHP Lib as the basis of the recommendation algorithm. Vogoo implements several collaborative filtering algorithms, including both item-based and user-based models. I have modified it to improve performance, because some of the original scripts do get sluggish when you have tens of thousands of rows in the DB. Sooner or later I’ll also start tweaking the suggestion algorithms – lots of room for experimenting there.

The rest of the setup is PHP + MySQL + Apache, all running on my PC (for now).

I’d love to put the system online and let other people check it out (when I manage to add at least a rudimentary user interface to it), but the harsh truth is that none of my shared hosting servers could possibly handle it. The script needs a lot of bandwidth and CPU power to effectively support more than a couple of users. And even if I had that, I’m not sure if I wouldn’t run into trouble with DA for downloading thousands of RSS feeds non-stop.

I could get a VPS… which would cost ten times more than my current hosting. Hmm.

More data!

There are a few things that need to be considered even before you can start daydreaming about how to generate the actual suggestions. One of the tasks is choosing what to use as the source data, and how to obtain it. The first part is easy – your past favorites are a natural source of information about what kind of deviations you like. Getting that information is more complex. If a recommendation engine was developed by DA programmers this wouldn’t be a problem at all – they could query the DeviantArt database(s) straight away. However, a random hobbyist (I) obviously can’t do the same, and DeviantArt doesn’t have an API. I resorted to using the RSS feeds of user favorites, and checking the “Who favorite’d this?” lists on individual deviations.

So I’ve got a way to access the favorites… and I’ve got a resource problem. There are millions of users and probably billions of recorded favorites on DA. It would take a few years to download all that through RSS feeds (if you don’t want to inadvertently DDoS DeviantArt) and a decent server farm to analyze it. I decided to be selective and only download the info that is reasonably relevant to the users that use the suggestion engine (me and a few randomly chosen usernames). It goes like this :

For every “active” user I look at which deviations (s)he recently +fav’ed.

For every one of those deviations, I check what other users also favorited them.

For every of those users I also find what their latest favorites are.

Visually the algorithm could be imagined as a pyramid or an upside-down tree.

How much is enough?

As far as I know, recommender algorithms work better with more data. On the other hand, there are technical limitations to how much information you can store and process. So how much information do you need to generate decent recommendations? Here’s my experience :

2500 favs processed – So-so. Three or four out of 40 images were pretty good.

43 500 favs processed – Finally getting somewhere! About 30% of the suggestions were worthy of a +fav.

By the way, it took more than 24 hours to gather the 43 thousand favorites. That’s partly because my connection is slow.

In Conclusion

I wrote this post mainly because I wanted to see what reactions and comments (if any) I’d get. If there’s enough interest I might try and figure out how to get the script up and running on a public site somewhere. If nobody cares, well, at least I have another programmer’s toy to amuse myself with 🙂

I could go and improve/tweak the algorithm, etc, but I just don’t have the resources to put something like this online – a very powerful server would be needed, and I have no idea if the script would scale. So the idea is back-burnered indefinitely; still interesting but impractical at the time.

Your site and posts are very interesting ! Thanks for providing such a great resource. With so many junk sites out there it’s refreshing to find one with valuable, useful information ! I’ll be back to read regularly !
Thanks,
Jeanine

I am starting a new internet website directory and was wondering if I can submit your blog?
I’m trying to grow my directory slowly by hand so that it maintains good quality. I will make sure and put your blog in the correct category and I’ll also use, “Building a DeviantArt Recommendation Engine | W-Shadow.com” as your anchor text.
Please make sure to let me know if this is acceptable with you by mailing me at: leolaaquino@inbox.

Search

This site uses cookies to improve your experience, to personalize ads and to analyze traffic. It also shares information about your use of this site with social media, advertising and analytics partners. By using this site, you agree to its use of cookies. AcceptSee Details