Of course many people have been raising concerns about the manipulation and irrationality of Digg front page items (for example here, here, here, here, here, and here).

Recently the problem of "cabals" of Digg story promoters is getting more and more attention. To their credit, the Digg administrators have made it possible to track who is submitting and promoting which stories, and the results are dramatic. A tiny portion of Digg members are submitting stories, and tiny networks of friends are promoting each other's stories, resulting in a very tiny elite group of people determining an overwhelming amount of content that gets attention on the Digg front pages.

Kevin Rose, one of the Digg founders, has recently announced new efforts to try to outsmart these organized groups of co-promoters, in an effort to "catch" them and downgrade their influence on voting. The idea is to identify non-diverse voting patterns and flag those as less important. The effort is misdirected.

Digg suffers from a fundamental flaw in design. In fact its entire approach to leveraging the crowd's wisdom is completely backward.

2. First Things First - What Constitutes a "Good" Story?

Before we can talk about fixing the Digg-style model, we need to have some agreement about what constitutes a "good" story. I define a "good" story as one which is considered good by those who actually take the time to read the stories and have some interest in the subject area, as opposed to stories which simply sound appealing based on their title. The objective of this discussion is to identify why the Digg model is bad at finding such stories, and propose a model which would do a better job.

3. Crowds Don't Do A Good Job of Voting on Stories

The idea of leveraging the "wisdom of crowds" is an enticing one. In his famous book of the same title, James Surowiecki suggests that large populations of (average intelligence) people can often perform better than presumed-superior elite. The basic idea is that instead of employing experts to make decisions, we can leveredge the power of large groups of people voting at once, and averaging votes.

The problem is that crowds aren't equally good at making all decisions. Common sense questions, and questions where the population has a reasonable chance of possessing the background knowledge available to tackle a problem, are well solved by a "crowd vote." But some questions, like estimating the predictive power of astrology, or estimating the gravity of Pluto, are not handled well by popular crowd vote, either because the crowd members don't possess the background domain knowledge necessary to make an informed opinion, or because they are highly biased for some reason.

We have plenty of anecdotal evidence already that a large portion of stories that make their way to the front page of Digg are:

Either driven there by small groups collaborating in order to artificially inflate a story for personal gain (financial or otherwise)

Or elevated to front page (prominent) status because of a cascade of mass crowd action, based not on the actual "value" of the content of the story to these people, but on the "catchiness" and sensationalism of the title of the story.

Furthermore, cites like Digg are highly susceptible to irrational trends and epidemics of attraction to keywords and slogans. If an important event happens on a busy news day where Britney Spears gets married, it can be lost forever. Because votes accumulating rapidly in tight temporal proximity is so important to the ratings on sites like Digg, there is an undo emphasis on stories that have titles which appeal immediately to mass audiences. Digg turns out to be very efficient at identifying catchy headlines, and very good at weeding out all stories that don't have catchy headlines.

4. Too Big an Incentive to Game The System

Part of the problem with services like Digg and Google is that in such a big marketplace, where attention is so financially valueable, the monetary benefits to prominent placement is so huge that it serves as a irresistible incentive to figure out ways to game the system. An arms race is in place between the groups trying to exploit the ratings algorithms and the services which are only mildly interested in curbing the behavior, and usually only when their own financial interests are at stake.

The current approach by Digg, to try to outwit these manipulators and reduce their corruptive influence is pure folly. First, because it's not practical to beat such manipulations - in the end such behavior is impossible to discern from actual voting. And second because it doesn't address the other core problem: Crowds are not good at identifying good stories.

5. Crowds are not good at identifying good stories - they are only good at identifying sensational and catchy headlines.

Whether it's because the title has a dominant biasing effect or because they don't actually read the page content before they vote, the result is the same: A clear pattern of predictably shallow, duplicative content pages being promoted which have little value to the readers. And the ease in capturing the attention of voters makes it all the more trivial for the small groups of manipulators to structure story titles to secure crowd votes.

6. Digg.com Gets Everything Backwards.

Digg is using crowds in the wrong way, for the wrong role. They have a very small group of elite people submitting stories, a shadowy network of collaborators who work together to artificially promote stories of their own choosing, and a crowd that is led around by the nose like sheep.

If we ask instead what is the crowd good at, and when do we need domain experts, we end up with a completely different model.

7. A New Model: Crowd Suggestions and Public Expert Filtering

There is too much information on the internet for a small group of experts to find all of the interesting stories each day. For finding potentially good stories, we need to leverage the power of a large group of distributed people. One wants a way of making it as easy as possible for people to submit new potential stories, putting as little obstacles in their way as possible. For a Digg-like site this would mean removing the need to describe the story, title it, register, etc. It would also mean welcoming people to submit their own sites and authored articles, rather than treat such things as spam. After all, the objective here is like the objective of brainstorming - we want to welcome a wide variety of suggestions from any sources.

Where Digg ends up with a small elite group that submit stories, and a larger population of crowd voters, instead we want to shift the emphasis on large numbers of crowd submitters, by making it as easy as possible to submit, and perhaps limiting the number of story submissions per day (which would be a complete anathema to the current dig model with elite submitters do most of the story submissions).

8. What About Crowd Voting? Eliminate it Completely.

That's right, you heard me - eliminate the ability of normal users to vote on stories. They may enjoy it, and they may end up with a certified 100% user content created "web 2.0" site, but the bottom line is that the content sucks. If the crowd is not good at identifying good stories then they should be removed from the loop.

9. What's the Alternative to Crowd Voting?

The alternative to having the masses vote based on the headlines on stories they don't read should be obvious: Let voting/filtering be done by domain experts with some background and context for evaluating the value and interest of stories.

Just as you don't ask a crowd to perform dental surgery, you shouldn't be asking a crowd to evaluate the worth of a story on quantum physics.

Let's compare the Digg model with the proposed model along two dimensions:

Discoverers/Submitters

Editors/Selectors

Old Media

elite domain experts

elite domain(?) experts

Digg Model

small group of hyperactive elite

crowd + underground manipulation groups

Recommended Model

crowd

elite domain experts

10. How Do We Choose Experts?

An inevitable question that arises with this model is how to choose experts. The answer of course is that you choose experts for a content site the same way that you choose experts for any task: in a wide variety of ways designed to ensure diversity, quality, judgment, and integrity.

For example you might have a central body that interviewed qualified candidates in different fields and assigned them to specific domains of submitted stories. Experts who are voting and filtering stories should be publicly identified so that watchdog organizations and the public could investigate the possibility of bias or corruption.

11. A Representative Elected Body of Expert Voters

One particularly interesting possibility is the idea of publicly electing representatives who would run for office as domain-specific experts. Here normal people would vote for candidates in specific fields of expertise, based on their past performance (which would be a matter of public record), and their background experience. This would be a true representative system, where users are selecting domain-knowledgeable proxies for their votes, entrusting these representatives to make informed decisions about the veracity, value, and novelty of new stories.

This small group of experts would be much more easily monitored by users in terms of tracking their recommendations about stories, which should be public. Readers will be able to see exactly which editors selecting or rejected which stories.

Various hierarchies of experts are possible. In one extreme you could employ a single domain expert in each domain area, with an expert making the single decision each day about the ranking of stories, with total transparency to readers. At the other extreme one might create a hierarchy of voting experts with votes weighted based on domain knowledge and experience. Open elections could be used to let readers identify and weight different experts differently based on past performance. Regardless of the arrangement, decisions by experts should be transparent and available to anyone.

There is one real practical impediment to using experts to do the final level of filtering and selecting - the cost in time and money involved in supporting this small group. Given the money generated through ad revenue by sites like Digg, this really shouldn't be a serious problem, and sites like netscape have begun moving in this direction. Netscape may or may not suck, but the idea to use expert editors to help filter stories and add background context is an improvement on the Digg model. Alternatively one might imagine that volunteers would be willing to fill these jobs and would appreciate the added public recognition than would be due a small number of domain experts in the new model.

12. Summary

We are currently in a period where "user-generated" content is king. In the rush to produce sites built from user-generated content, we've seen a mass removal of the role of domain experts and proxy representatives. Whether it's Digg or Wikipedia, there has been a move to treat everyone as if they had exactly the same background level of expertise on every subject. This is surely a temporary aberration. Not everyone is qualified to take part in every decision. Eventually we are going to have to return to a more balanced solution where users influence content in a way that makes sense according to their interests, background knowledge, available time, and abilities, and where domain experts provide a necessary element of context, continuity, consistency, and informed judgement.

Since I mentioned netscape I thought maybe i should comment a little on the things going on at netscape and how they fit in.

One of the principle ideas here at donationcoder.com is the idea of trying to find a fair way to reward those who create content on the site.

The netscape idea of paying their top users makes some sense to me, in that it is keeping with the idea of returning some of the money being made on the site from advertisements, back to the users responsible for the content which is generating this revenue.

One of the difficulties that sites like netscape have is in getting right the "incentive" issue. If top users are determined and paid based on the traffic they generate, then the *incentive* for them is to create traffic, whether by hype or manipulation, is substantial. This is part of the danger of paying people based on some metric - their is a strong incentive to play to that incentive.

(As an aside, this is one of the reasons we have tried to avoid any system on this site to reward people based on traffic, downloads, etc., but instead try to facilitate and encourage DIRECT user-to-user donations. The hope is that the incentive will therefore be on creating content that users actually benefit from and appreciate. This approach is, like the others, not with it's own dangers, such as the possibility that content creators will be motivated to focus only on content which generates donations..)

With the recent announcement by digg that they are going to try to tweak the algorithm to catch and circumvent the action of manipulative digg groups, this article has some quotes which point out the folly of this approach:http://wired.com/new...0.html?tw=wn_index_2

"If 30 or 20 or 50 or 90 people want to digg each other's stories, let them," Chrisek, a Digg user in the site's top 60, said via e-mail. "I digg my friend's stories. I also report my friends' stories as inaccurate or as spam when need be. Am I more prone to digg stories from my friends? Of course."

the problem is that this unwanted behavior of a group of people conspiring to inflate a story onto the front page is completely indistinguishable from friends and strangers digging stories they like.

and let's share some laughs at netscape:

Jason Calacanis, head of a rival social news service at Netscape.com, scolded Digg for punishing its core contributors. "Frankly I think Digg is tripping over itself here," he wrote on his blog. "The top users earned their spot and they should be rewarded for their contributions -- not penalized. One person, one vote -- that's the rule. You can't change that or you change the fundamental premise of democracy."

Forgive my cynicism but i doubt Jason has the slightest bit of interest in the fundamental premise of democracy at netscape or on any other website - his concern is the business model and scratching away whatever market share he can from digg. so spare me.

Someone has brought up slashdot and asked how my suggestions relate to slashdot.

First of all let me say, I'm not a fan of slashdot. I think it's become irrelevant, slow, and displays in general bad judgement about what stories are good vs. bad.

Which is a bit distressing since slashdot uses a model quite close to what I'm proposing, i.e. a crowd-suggest , expert-filter model.

A key difference though is that slashdot has very little transparency. It's not clear who is selecting which stories and why. The biases and conflicts of interest of editors is hidden.

I suggest that having transparent accountability of the domain expert editors is very important. Users should be able to see exactly who made what decisions about what stories.

The idea of a representative voting system for the experts means that users can vote on (or rate) experts in much the same way they currently rate stories has a number of benefits. Voting on experts based on their long term editing choices seems more rational and likely to lead to considered decisions as opposed to instantaneous mass voting based on a glimpse at headline titles.

IMHO, this is not Digg wonderful idea. This is an idea of collective decision making (includes collaborative filtering). It was originated long before digg, and long before the Internet, and allows different implementations. Digg's implementation is not the first and also quite poor, too vulnerable to abuse and information cascades. http://www.shmula.com/197/digg-as-a-game has a good analysis and two good suggestions for digg. On the other hand, the suggestions are pretty much strightforward and were implemented at my site right from the beginning and long before digg registered its domain name. But maybe the popularity of digg, at least partly, is related to its present design that is so prone to abuse and decision errors. It seems that the goals of attractiveness and quality might be in conflict in this case. I wonder, if digg was better implemented and protected from abuse from the beginning, would it be so popular?

I wonder, if digg was better implemented and protected from abuse from the beginning, would it be so popular?

That's a fascinating question, and one that could be asked about other products that get popular.. I wonder if there is a name for this in marketing circles.. A product which attains popularity because it is hackable..

I think a key difference between dig and stumbleupon is the fact that with stumbleupon, there is no front page to try to get your links to. There's no point where there is a select, limited amount of space to try to ascend to. While there is a ranking system based on your "popularity", there is no quota to fill, one need only be an active member with a great deal of links floating around to reach full status on that.

Links you add and approve/disapprove impact you and others with taste similar to you -- and nothing else. Stumbleupon is a different service altogether. While dig is about taking a consensus and presenting what everyone must/will like based on that consensus, Stumbleupon takes the whims of the users and uses it to attempt to serve, based on the individuals preferences, what it feels they would like. One is a chef who believes no one will dislike his cooking and you will eat what is put before you. The other is a personal chef who, after a while, anticipates what you'll want next.

While dig is about taking a consensus and presenting what everyone must/will like based on that consensus, Stumbleupon takes the whims of the users and uses it to attempt to serve, based on the individuals preferences

I agree with you. In fact, SU has the equivalent of digg's front page (check http://buzz.stumbleupon.com/) averaging the preferences of all users, but it is not the main point in SU unlike digg. SU buzz page is better implemented than digg's buzz/front page. For example you can't vote right from the SU buzz page, you will have to go and see the website first. DIgg makes it too easy for you to vote without even reading the story. Digg interface is thus conducive to information cascades, while SU design will break at least some of them. As a result SU has less dependent sample of user preferences to aggregate, than one which digg collects. Hence, SU can have a better aggregated estimate even with a smaler sample size. Of course, as you mentioned, the main strong point of SU is that it recognizes different clusters of users with shared preferences, instead of pulling all the users into a single cluster as digg does.

Since we are talking about diggs idea, I would like to post a link of its less known precursor: http://intermix.org Intermix stands for INTERnet Metropolitan Information eXchange. It is a free software authored by Roger Eaton around 1999 or even earlier. It implements collaborative evaluation and ranking of news, the same idea that later was picked up and explored by digg. Intermix has less users, but has some advantages over digg. I like that Intermix allows to rank stories by two criteria: interest and approval. To my knowledge none of the recent collaborative filtering systems has this. Sometimes I discover a website that is definitely interesting, but I don't agree with its POV. Since most sites interpret bookmark as an agreement (digg) or don't differentiate between the two (SU), I normally will not publicly bookmark such a site. Intermix is flexible enough to accomodate such cases.

Interesting post up from Federated Media, which i guess is paid to promote Digg.com? or am i getting that wrong, maybe they are just a congolmerate of sites? sell advertising on digg.com pages.

One of the maxims of running a business is as it starts to get traction, folks will notice, and when they notice, they will start to question it. That's happening more and more here at FM...Had the poster done his homework, he'd realize FM has a large (12 and growing) direct salesforce, which is very focused on selling Digg, as well as all of our other sites...

Here is a case study of flickr that is somewhat related to this thread. It cites catering to the power users as a major success factor of flickr. It is quite likely that the same factor (or maybe strategy) played a major role in digg popularity.

Once power users realize their power to influence the project, they often become its evangelists and actively promote the project by recruiting new members. Promoting such a project means increasing their influence, which is a stong motivating factor. If the system is truly democratic, and hard to abuse, promoting the project only decreases the influence of each individual member. This might explain why truly democratic projects remain relatively small, while projects appealing to democracy, but allowing for abuse, grow very rapidly. However, this kind of rapid growth doesn't necessarily lead to diversity of ideas and "wisdom of crowds" phenomenon as the project is not utilizing abilities of most of its members constructively.

Once power users realize their power to influence the project, they often become its evangelists and actively promote the project by recruiting new members. Promoting such a project means increasing their influence, which is a stong motivating factor. If the system is truly democratic, and hard to abuse, promoting the project only decreases the influence of each individual member. This might explain why truly democratic projects remain relatively small, while projects appealing to democracy, but allowing for abuse, grow very rapidly.

that is a fascinating theory, i'm very curious to know whether it really holds up.

So this why dc.com is still not as popular as digg: you just can't abuse dc or its members. mouser is much too nice and honest to let this happen. And decisions are made in the open too, everything is too transparent, too accountable, too democratic!

Mass opinion is biased by media,hypes and fud (and people tend to follow the 'stream/flow/whatever')

Example:

* A sais 'Such and so sucks'* B has no idea what A is talking about* but when B talks to C about that very same 'such and so', B is very likely to say it sucks and pretend all educated about the subject.* Then repeat this cycle with C -> D..Z.

Then on the other hand, the 'expert editors' solution also faces several problems* Expert opinions will always be their own opinions, no matter how hard they try to remain unbiased and remain objective.* Even experts suffer from this same phenomenon as above, but on a different level/scale, and in a different way.* Power corrupts.* There are more opinions, views, fields of intrest, than there is people(and experts) on the planet.* People get lazy after repetitive work. After having reviewed for example 1000 articles, the tendency exists to fall into a system where articles/posts are judged by quick browsing/reading and not full evaluation, and don't get the thought they deserve before being submitted or rated or whatever. Even when articles ARE fully read and evaluated, the danger exists where the editor's opinion gets molded by certain keywords that fall into the editor's intrests/likes/etc.

Probably not the best points one could come up with,... and there's probably more. I'm not good at providing proof of concept or even writing for that matter. I'm not even going to pretend i know wtf i'm talking about because i don't, because ever since blogs,forums and message boards came along, suddenly everyone becomes a columnist/writer/spelling expert (I fail at all 3 ) [IRC tends to feel more natural]

But I do have very much of a gut feeling that systems like digg, slashdot, etc, will always be and remain broken as long as there are humans involved.

Just for those that haven't seen my comments on digg on the digg page related to this article/post, I'll quote myself:

Digg is not, has never been, and most likely will never be a good source of links to the best the web has to offer. It is nothing more than a mirror of the ideas & concepts that are hot in the current pop culture. And that is what it is intended to be. It's all about what is popular without any regards to quality. And many people will vote for stuff based on what they think other people will think if they don't because they think their own views are not popular.

If it were possible to buy 1 share of stock in every company whose goods & services are featured in a link that appears on the front page, each time they make it to the front page, you'd probably become richer than Bill Gates, once you eliminate the links to anything related to Digg, Google, or Linux.

The best stuff never makes it to the front page because it's not hip & trendy. The likelihood of an article about cancer being cured by a small group of unknown scientists living in a small 'boring' country making the front page is much less than one where the headline would be something like "Digg member cures Brittney Spears of warts with the aid of Google and Linux".

Would anybody like to test my stock market theory? I mean a serious simulation...and code something that will do the job and publicly display the results/progress?

I am curious of what the results would be. (am I alone in my curiosity?)

Ok, dc is many things (a resource for finding software, a philoshophy, even a business -if small), but... aren't we somehow implementing what mouser describes in his article? We (the dc crowds) post things that we consider interesting, and a few other hyperactive users (mouser, people who post a lot) bump the threads that are more juicy (acting as the proposed experts).

I'm sure this is not as straightforward as I have described it, but still.

The interesiting difference (that could be added to the table) is that digg, traditional editing, etc rely on blogs as CMS, whereas dc relies on forum posts (i.e. all users have enough privs to post news)... thus implementing the ideas of crows filtering. That is something that slashdot doesn't really implement that well.