Sunday, February 24, 2008

Back from CI Foo

Tim O'Reilly intentionally tends to leave his terms weakly defined -- as he did for the term "Web 2.0" -- so there were several attempts to come up with a compact understanding for what is and is not "collective intelligence".

The simplest definition came from Kim Rachmeler at the end of the second day:

The network knows what the nodes don't.

This nicely captures the idea that the sum has to be more than the parts. There is an emergent property to collective intelligence where problems can be solved that might be difficult to solve any other way.

More detailed definitions came from some of the other sessions. One interesting question was whether massively parallelizing a task across many humans -- something that was referred to as Man-Reduce as opposed to Map-Reduce -- is collective intelligence. In the extreme case, each of the tasks could be independent and the system could merely be collating the results. Is that collective intelligence?

This led down a path where some asserted that there was nothing in collective intelligence that could not be accomplished, given enough time, with one person. However, a compelling counter-example came up in the form of recommendations (e.g. "Customers who bought X also bought..."). To reproduce that system, a single person would have to be able to somehow become many different people with different tastes and preferences. That is hard to imagine.

There was also an attempt to apply organizational theory to collective intelligence, resulting in a detailed taxonomy that listed several methods of aggregating the individuals such as voting, averaging, weighted voting by reputation, mimicry (also referred to as organizational memory), and clustering/hierarchies. This led to a conclusion that there may be nothing new in collective intelligence other than the fact that communication between individuals is now internet-enabled, allowing it to occur at a scale and speed impossible before, yet that scale and speed does create something new and important.

Finally, there was a curious idea of attempting to develop a programming language for human nodes in a network, something that might provide a more rigorous theoretical framework to analyze what is possible and not possible in these systems. One thought here was to start by developing that programming language on what is possible using MTurk rather than the full range of possible human actions, something that might have immediate practical output as well.

Looking back at the discussion, I like the definition "The network knows what the nodes don't." However, if you accept this definition, it seems you have to accept that the the system as a whole can do things that individuals alone cannot. I think that has to lead to the conclusion that systems that are merely massively parallelizing a task among humans (i.e. "Man-Reduce") do not represent collective intelligence.

One interesting theme that came up in many sessions was a concern about manipulation of systems trying to use collective intelligence. We observed that small systems typically were not subject to attack because there was little value in doing so, but that some big systems (such as Amazon reviews) appeared resistant to attacks due to the amount of effort required to overwhelm the large number of legitimate participants. However, in large systems where there is a "winner-takes-all" (e.g. Digg or Google where gaming the system to get top rank will result in a massive amount of lucrative traffic), the benefits of manipulation can justify even quite costly efforts at spamming.

The discussions on manipulation often led into discussions of reputation. Should one participant in these systems get one vote? Or should trustworthy people or experts be given higher weight? And how do we find those trustworthy people or experts? If we nominate experts using equal votes, have we solved the problem or merely transferred it to a meta level? Some suggested a TrustRank-like method of transferring reputation in a network of participants. Others noted that, since community sites tend to start with a loyal, dedicated audience but get diluted over time, a simple seniority-based system of trust and reputation might be able to preserve that core as the community grows.

There were many other discussions in and out of sessions, but these were the major themes and questions that stick in my mind. I much enjoyed the many other conversations I had, including the opportunity to talk at length with Hal Varian and Rodney Brooks, as well as the chance to see old friends and new. A great experience overall.

Could I ask you the small favor of filling in some gaps of my knowledge? Or at least gaps in my knowledge of terminology? What was meant when people talked about "active versus passive" collective intelligence? And what was meant by the implicit vs. explicit (top down vs. bottom up?) ends of collective intelligence?

Hi, Jeremy. Sorry, I'm not sure. I don't think I mentioned those terms.

Let me try to guess though if that might be useful. I think that both "active" and "explicit" refer to systems where people explicitly take some action (like editing Wikipedia or sharing with someone else in the network). I think that both "implicit" and "passive" would refer to the system sharing between people without requiring any action (like anonymously sharing what people have clicked on, bought, or tagged).

What do you think? I don't quite have the context here, so I am guessing. Do you think those definitions fit?

Hmm. There are other forms of gaming the system other than injecting bad (or insincere) data. Example: The Church of Scientology somehow manages to get reviews critical of their books removed from Amazon. It is not known how they do this, but it is obvious that it is happening.