The Moz Blog

Reddit, Stumbleupon, Del.icio.us and Hacker News Algorithms Exposed!

The author's posts are entirely his or her own (excluding the unlikely event of hypnosis) and may not always reflect the views of Moz.

It is greatly ironic that algorithms, the quintessential example of all that is not human, would be so fundamental to social media. Last week I wrote a post about how Google gathers user data. This week I continue by exposing how popular social media websites use algorithms to utilize user data.

Although humans power social media, it is algorithms that provide the frameworks that make user input useful. As proven by the countless social sites online, finding the correct mix of participation and rules can be extremely difficult. Below are some of the algorithms that when combined with the right people have proven successful.

Popular Social Media Algorithms

Y Combinator's Hacker News:

Formula:

(p - 1) / (t + 2)^1.5

Description:

Votes divided by age factor

p = votes (points) from users.
t = time since submission in hours.

p is subtracted by 1 to negate submitters vote.
age factor is (time since submission in hours plus two) to the power of 1.5.

First of all, the time 7:46:43 am on December 8th 2005 is a constant used to determine the relative age of a submission. (It is likely the time the site launched but I have not been able to confirm this) The time the story was submitted minus the constant date is ts. ts works as the force that pulls the stories down the frontpage.y represents the relationship of up votes to down votes.

45000 is the amount of seconds in 12.5 hours. This constant is used in combination with yts to "water down" votes as they are made farther and farther from the time the article was submitted.

log10 is also used to make early votes carry more weight than late votes. In this case, the first 10 votes have exactly as much weight as votes 11 through 101.

The initial stumbler "power" (Audience of the initial stumbler divided by the amount of times that stumbler has stumbled the given domain) is added to the sum of all the subsequent stumbler's powers.

Subsequent stumbler power is ((Percentage of audience stumbler makes up divided by the number of times given stumbler has stumbled domain) + a predetermined power boost for using the toolbar - a predetermined power drain if stumblers are connected) + (% of the stumbler audience + a predetermined boost for using the toolbar)

N is a "safety variable" so that the assumed algorithm is flexible. It represents a random number.

Points = (Amount of times story has been bookmarked in the last 3600 seconds)

Description:

Rank on Del.icio.us Popular is determined by comparing points. Points represent the amount of times a story has been bookmarked in the last hour. The higher the rate, the higher the points. Every bookmark counts as one point.
3600 is the seconds in one hour.

Digg is different. The company is a lot less transparent than the above mentioned companies. It is fearful of being gamed and in response has created a secritive algorithm that appears to be far more complex than its competition.

At a minimum I expect that Digg's algorithm takes into account the following factors:

Submission Time

Submission Category

Submitter's Digg authority

Submitter's website wide activity

Sumbitter's friends and fans

Subsequent digger's authority

Subsequent digger's friends and fans

Subsequent digger's geo location

Subsequent digger's HTTP referer

If you have any other advice or thoughts that you think is worth sharing, feel free to post it in the comments. This post is very much a work in progress. As always, feel free to e-mail me or send me a private message if you have any suggestions on how I can make my posts more useful. All of my contact information is available on my profile: Danny Thanks!

About DannyDover —
Danny Dover is a passionate online marketer, influential writer and obsessed bucket list completer. He is the author of the best selling book Search Engine Optimization Secrets and the founder of Intriguing Ideas LLC. Before starting his own company, Danny was the Senior SEO Manager at AT&T and the Lead SEO at SEOmoz.org.

With StumbleUpon - three very important factors are categories, tags and reviews. Curious about what tags or categories to focus on? Check out the tag cloud here.

With Digg - one of the most important factors is the domain's authority. Authority in terms of being an established, recogonized and established source for the Digg community (CNN, Ars, Engadget, BBC, etc).

I've 'seen' profiles just under 4 months old climb to the top ranks of Digg by simply being truly active in the community (and having really great content).

It's not about gaming the system as much as it is about just being active in the community. Adding value to the community and truly engaging with them seems to work the best. I don't feel it's about the math or the factors. I feel it is about choosing to spend some resources on building a true social media profile that people want to associate with . . . similiar statements could be made about SEOmoz. Take Sean for example . . . in well under a year, he's become the most popular person on SEOmoz (non-employee). How did he do it? By spending a lot of time and energy to PARTICIPATE.

Danny is another great example . . . the time and energy he puts forth in creating these 'advanced' posts has made him quite the up an comer on the SEOmoz staff within this community.

I could not agree more Mr. Payne. I would love to have 3-4 hours per day to dedicate towards commenting on SEOmoz and being active in a few social networks. I am not sure how all of these people do it? I have to work on client's and our own company 10 hours a day and have a family. Whew!

Its amazing how much all the social mediaites and mozzers put out. I asked Rand at a conference one time, how do you find the time to put out such well thought out posts and run a company and have a relationship? You just have to be a really really good writer. And probably work alot and be really effective when you do work!

For me . . . it's what I do 'for fun'. I honestly would prefer to hope on SEOmoz and either make a complete fool of myself or pump out some great content than say . . . watch a baseball game, play xbox, etc. It truly is an enjoyment for me.

As for Rand, I believe he is just wicked smart and it doesn't take him long to post a great article. He has a lot of support from his fiance Geraldine and, like most successful people, he surrounds himself with people that encourage and support him. Now, keep in mind, Rand is constantly 'working' but I would hope it doesn't feel like 'work' to him. I know much of what I do doesn't feel like work to me.

Wow, that was a lot to absorb at 7:30am :) One thing that's been striking me lately about social media sites, and that all of these formulas back up, is the newness factor. Sites seem to reward pages that are "hot" and get a lot of attention quickly, but what about those pages that have real staying power and people come back to time and time again? It seems to me that the latter are some of the best resources, and social media does a very bad job of recognizing those resources.

Of course, you could argue that those resources are the ones that people build links to over time and have the most power in the SERPs, which is probably true. I'm just not sure if the "what have you done for me lately?" approach of social media is really helping us find the best content. As a culture in general, we really overvalue what happened in the past 24 hours.

Most social media users are plugged in every day and are pretty current on what's interesting to them. They can go to their profile and easily find something that they read or commented on if that article is no longer where it was the last time they accessed it. And you're right, there is a high value for recent news, but I think that's kind of the point. Bring people back and make them stay connected in order for social media advertisers to get more exposure.

That said, I've been disconnected by travel and inaccessability twice recently, only to find out about my killer Nalgene bottles and Tim Russert's passing through non-social media avenues. SM is great for what it is, but I don't think the intention was to create an archival resource.

Sounds like you may have a good idea for your own version of an SM site though Dr.

You're absolutely right, Tim; the whole point of most social media sites is to distribute news and generally new and interesting material, and there's nothing wrong with that. I think my reaction may be a broader one and is a bit related to the election coverage this year. It seems like the media in general is so desperate to find news (especialy now that we have 24-hour news channels and blogs) that we treat what happened today as the only thing that's worthy of attention. That trend worries me a little.

I agree to a certain extent. Again, it all comes down to money if you want to get really broad. Nothing is ever going to replace that internal filter that humans have, and we're certainly going to need to use it more and more as this now-or-never trend continues in the world of making money off of news and current events.

I probably give the average information consumer too much credit in this area (even though I generally give the average person very little credit), but it seems to me that savvy with regard to how, when and where, someone chooses to consume information correlates well with the ability to filter/censor and take that information for what it's worth.

To me, this ability also indicates some sort of predisposition to fact checking (things that don't seem right), and using more archival sources for finding older material. To your point, even Google is trending towards valuing new information more highly, but at least they settle out their SERP's in favor of more established and Google worthy listings over time.

Great Discussion! Will be interesting to witness this in person over time.

My thoughts exactly. And comments definitly play a big role in the success of a social media campaign on any of these sites, specifically with Digg in my experience. Creating quality content and putting earnest effort into promotions is still the best way to succeed.

There's some systems worth spending time learning how to game (Google) and others that don't really require it (Social Media).

When someone submits a story to one of these sites, their decision to do so is completely their own, but from that point on, it's the combination of human behavior and the algorithms that drives the story's progression. Almost like a ping-pong ball going back and forth - votes, comments, ranking changes, more votes (or not), etc.

At this point, these social media sites are still dependent on human beings to take the first step and submit content. The day that Digg figures out how to find articles that are interesting all on its own...that's going to be really something.

And in a way, that's what Google does already when it serves up the SERPs, which makes the connection between search engines and social media sites clearer to me than it's ever been before.

The only teeny-tiny suggestion I can make to you is to add a "so what?" concluding paragraph to your posts. You've obviously spent an enormous amount of time researching these posts, and I would highly value your opinion on what it all means (especially in the realm of SEO). Sure, you might just be speculating in an educated manner, but that's all any of us are doing most of the time anyway.

Next week I'll be expecting the complete algo from goog. And then a suplimentary post evrey time it changes.

Thanks Danny!

Edit:Here's a legitimate project suggestion, or maybe you could just pointme in the right direction. I'm looking for a good breakdown of the differences between the old and new .js googalytics scripts. Any thoughts?

Great post Danny! It's kind of like social sites a la MTV unplugged flavor.

My question is related to del.iciou.us, what are your recommendations in getting submissions saved in the critical first hour? Every time I try and submit content there, it doesn't get picked up, or it gets a measly three or four saves. Do you submit and then ping a gazillion people through twitter or facebook or similar to tell them to go favorite it? Or, are there gorilla profiles on delicious which people follow like the Dead and save their posts because they like the profile's tastes?

This formula implies that as more time passes on a post it becomes more values. Assuming we have the same vote ups and downs for a story so the log10 (z) is constant and y is still constant, the more ts we have the higher the yt/45000 value which implies higher the ranking

Since the SEO people have been more obsessed lately with getting *something* on the front page, and less concerned with what it actually is, perhaps now would be the time to discuss whether you're more likely to have something be rated highly by working to make interesting and creative content or by trying to game the system, because, let's face it, that's the whole point of the post, right?

I'm just saying it now so I can point to this when it eventually happens: gaming the system destroys its usefulness. Eventually reddit and stumbleupon are going to be just as bad as digg, full of stupid, self-promotional crap that no one except other spammers want to read.

The real people will be long since gone to another site, until y'all find us there and start spamming again. This doesn't have to happen. You could all get together and decide to blacklist/bury the people who exhibit the worst spammy behavior, so that actual users will still want to use the sites, instead of always trying to find somewhere the spammers haven't reached yet.

But I know this won't happen. No one has every been able to get people to simply agree to stop polluting and wasting any resource, be it water or airwaves. The only solution that has every kinda worked is regulation, so get ready for it, suckas, 'cause it's coming and you're the ones that brought it down upon your own heads.

I wasn't going to comment since the article is so out of date, but just to clarify the organic bonus was related to each person, not a general bonus and so does not cancel out ;) if you read the full article it would have made more sense. That and some LaTeX as you can clearly see it was simplified for the blog to the point of being unusable as anything but a guide.

We may well release the complete math at some point now that our accuracy has dropped below the 60% mark.