Random thoughts of a computer scientist who is working behind the enemy lines; and lately turned into a double agent.

Monday, November 29, 2010

Wisdom of the Crowds: When do we need Independence?

I have been thinking lately about the conditions and assumptions for the wisdom of crowds to work. Surowiecki, in this popular book, gave the following four conditions for the crowd to arrive at
the correct decision.

Diversity of opinion: Each person should have private information even if it's just an eccentric interpretation of the known facts.

Independence: People's opinions aren't determined by the opinions of those around them.

Decentralization: People are able to specialize and draw on local knowledge.

Aggregation: Some mechanism exists for turning private judgments into a collective decision.

The part that got me mostly puzzled is the independence assumption. Actually, I can support pretty much any thesis. I can argue that independence is necessary. I can argue that we do not really need independence so much. And I can argue that independence is evil. And I will do all these things below.

In other words, it is better to have a couple of independent opinions, rather than having thousands of correlated voices.

Lack of independence: Perhaps not so bad

We have examples where lack of independence is not always bad.

For example, according to the paper "Measuring the Crowd Within" by Vul and Pashler, even asking the same person for a second time and getting the average can lead to improved outcomes.

Or take the other poster-child application of wisdom of crowds: prediction markets (or markets, in general). In these markets, people trade based on their personal information. However, they can always see (and get influenced?) by the aggregated opinion of the crowd, as this is reflected in the market prices. And empirical evidence illustrates that (prediction) markets work surprisinglywell, despite (or because of) the lack of independence. Prior work has even demonstrated that even non-public information spreads quickly through the market (and the SEC checks for insider trading if they detect unusual activity before the public release of sensitive information.)

Wikipedia is another example: People do see what everyone else has done so far, before adding the extra information.

One paper that I found to be of interest is the Naïve Learning in Social Networks and the Wisdom of Crowds by Golub and Jackson. The authors address the following question: "for which social network structures will a society of agents who communicate and update naïvely come to aggregate decentralized information completely and correctly?". The results are based on the ideas of convergence for Markov Chains. One of the basic result says that the Pagerank-score of a node in the network defines the weight of the node's influence in the final outcome.

In all these cases, the participants get information from the crowd, they do not just follow blindly. So, there is some benefit in interacting.

Independence is bad

Going even further, we have cases where complete independence of participants is bad!

This typically happens when participants know only parts of the overall information. Through communication, it is possible to identify the complete picture, but lack of communication leads to suboptimal outcomes. Consider the example in Proposition 2 from the paper "We can't disagree forever" by Geanakoplos and Polemarchakis:

We have a 4-sided dice, with mutually exclusive outcomes A, B, C, and D, each one occurring with probability 0.25.

In reality, the dice rolled 1. But nobody knows that. Instead the knowledge of the players is:

Player 1 knows that the event "A or B" happened

Player 2 knows that the event "A or C" happened

Both players can bet on whether "A or D" happened.

So, look what happens

No independence: If player 1 can communicate directly with player 2, they can figure out that event A happened, and they are certain that "A or D" occurred with probability 1.0

Independence: If player 1 cannot communicate, then both players assign a probability of 0.5 to the event "A or D". This is despite the fact that they collectively own enough information to figure out that A happened, and there is a market to trade the event. In other words, the market fails to aggregate the available information.

So, we have a scenario where the inability to spread information actually results in a bad outcome. However, if we allowed the participants to be non-independent, we could have an improved outcome.

Influence vs Information Spread

So, we can see actual examples where spread of information (and hence, lack of independence) can be both good and bad. Lack of independence, can lead to groupthink: and the individual voices get drowned in a sea of correlated opinions. At the other extreme, lack of communication leads to suboptimal outcomes.

The paper by Ostrovsky "Information Aggregation in Dynamic Markets with Strategic Traders" (in EC'09, I think also forthcoming in Econometrica) provides a rigorous theoretical framework on what are the conditions for information to be aggregated in a market: essentially we have "separable" securities for which all the available information can be aggregated, and non-separable ones that do not have this property. However, I do not have the necessary background to fully understand and present the ideas in the paper. And I cannot see how to connect this with the literature of information spreading in social networks.

In a more intuitive sense, it seems that we need information to spread and not just influence.

Unfortunately, I cannot grasp the full picture, despite the fact that I tried to look the problem from different angles (Ironic, eh?).

I still not fully understand the implications of the above in the design of processes that involve human input. Does it make sense to show to people what other people have contributed so far? Will we see effects of anchoring? Or will we see the establishment of a common ground and get people to coordinate better and understand each other's input?