Wikipedia

(via Slashdot)
According to an article in APC, of the more than 2.8 million lines of code contributed to the Linux kernel over the last year or so, 75% were written by paid developers. Considering the business ecosystem that's grown up around Linux over the last 10 years, this should come as no surprise. But still, it's an interesting counterpoint to the notion that Linux is written by a community of dedicated volunteers. I think that characterization is probably still largely correct: volunteers write Linux. The kernel is a particular beast with a particular social system. What happens at the core of Linux matters so much to the IBMs of the world that it stands to reason they would get particularly involved there.

But I also think this is an interesting window into what happens to open-source systems as they grow, evolve, and become essential to the computing world. What percentage of Wikipedia is written by paid representatives? Nobody knows. Aside from some notable exceptions in which journalists, politicians, or Scientologists were caught with their hands in the cookie jar, we don't know where a lot of Wikipedia's content comes from. I think it's a fair assumption that some large percentage of it comes from paid representatives. It's probably not as high as 75%, though.

The Wall Street Journal recently made news by publishing some results by Spanish researcher Felipe Ortega. Ortega crunches some numbers and finds a decline in the number of Wikipedia editors. The folks at Wikimedia decided to hit back (subtly) a few days later, basically arguing that Ortega is counting the wrong thing.

One of the issues the WSJ piece brings up is the question of whether the decline is the result of a lack of new material. In other words, some people argue that, with over 3 million articles in the English Wikipedia alone, it's hard to find new stuff to write about. Is this the case?

Guardian columnist Mark Graham thinks not. In a recent column, Wikipedia's Known Unknowns, he takes a look at geographical distribution of Wikipedia articles that have geotags:

Graham sees this as evidence that there's plenty left to write about. But I think he's missed the point. It's true, Wikipedia has not yet covered the entire domain of human knowledge. There are many places on the globe that aren't well documented yet. But that's exactly the point. The people who live in those places aren't well represented on Wikipedia (yet). And the people in heavy Wikipedia-using countries don't often go to those places.

Graham's map essentially shows that this is, in fact, a big challenge for Wikipedia. With 3 million articles, Wikipedia has largely covered the easy stuff. General knowledge and popular culture are comparatively well represented, and so is geographical knowledge in the parts of the world where Wikipedia is very popular. So the barrier is now much higher for someone who comes to Wikipedia looking for something to write about. Increasingly, that person needs to have some kind of relatively specialized knowledge, to have been somewhere relatively unique, and then has to feel able and willing to share that knowledge. Well, that's a high barrier to entry for a lot of casual users, and I think it's at least a part of the reason why Wikipedia's editor numbers have plateaued.

So, actually, the question isn't whether Wikipedia is running out of new material. It's not. The question is: who knows (and will write about) the material that isn't on Wikipedia yet?

From my POV, there are two interesting dynamics going on here. First is the question of whether Wikipedia should be a news source. Unequivocally, it is a news source. But I think many in Wikipedia's heavy editor community act on an ideology that classifies Wikipedia as an encyclopedia, not a news source. This is myopic at best, delusional at worst. It also provides a nice illustration of why ascribing attitudes to the "Wikipedia community" as a whole is misleading. The most vocal Wikipedians, the heavy editors, often hold tight to dogmas that aren't representative of others' attitudes.

The second, related issue is what Wikipedian's call 'Recentism'. If you go to Joe Wilson's page right now, you'll see a funny little notice at the top that says "This article or section may be slanted towards recent events. Please try to keep recent events in historical perspective." Here's the thing: I think this a sound sentiment, and good advice. But again it reflects that clear dogma about what Wikipedia is and what it should be. How can Wikipedia really be "neutral" when it has deep-seated dogmas and policies that restrict and direct not only its content, but how people should use it?

Drawing on a huge amount of prior research, the paper develops an interesting model of the progression of participation in online collective action (although they don't call it that). Actually, I would say the references in this paper are almost entirely must-reads for anyone interested in online participation, and the manner in which Preece and Shneiderman go through them is almost like the syllabus for a good course on understanding online participation.

(Click for a larger image.)

The figure above highlights the paper's main model. I like that the authors include all those arrows to indicate that it's not a step-wise progression from one stage to the next. I think this is a key point. Preece and Shneiderman talk a bit about Lave and Wenger's notion of Legitimate Peripheral Participation. One of the key misconceptions of that work is that it suggests a linear path from periphery to center. But Lave and Wenger go out of their way to argue that although there are some activities that are peripheral (yet legitimate) there are many paths from them towards others types of participation. They also argue that 'central' is not the right idea since communities are constantly in flux, and suggest 'full participation' is a better term. I think Preece and Shneiderman are on board with all of this.

I am thinking about another way to conceive of this model which highlights another key point: progressing in participation usually means supplementing participation with new knowledge, activities, and social interactions, but not supplanting the previous forms. A 'leader' on Wikipedia is certainly still a 'reader,' and though she may spend less time fixing typographical errors (as a 'contributor' might) and more time arbitrating disputes, the progression is often about growth rather than a substitution. An alternative way of visualizing this progression is below.

(Click for a larger image.)

This is not a perfect way either, more of a straw man. Gain some things, lose some things. One thing lost in this new visualization is the progression of thick green arrows that indicate the path that Preece and Shneiderman argue many users follow.

I don't think this alternative way of looking at the progression of participation fundamentally alters Shneiderman and Preece's argument. From one point of view, this is just a quibble about visualization. But actually I think the venn-style highlights that reading is a starting point, and the progression from there goes in many directions. At the same time, deeper forms of participation each share much in common with others, but some new activities as well. For me, even though the more linear style is common for visualizations of conceptual models, it's important that the model not imply separations that might not exist, and that it emphasizes that increasing participation is often a process of learning and growth which allows deeply embedded participants to experience more and share more with a diverse array of others.

The NY Times is reporting that the English language Wikipedia will soon moved to a "flagged revisions" system by which edits to articles about living people will have to be approved by a more experienced editor before they appear on the live site. This system has been tested for about a year on the German language Wikipedia. On that site, an "experienced editor" is someone who's crossed a threshold of number of successful edits. There were about 7,500 of them in the German case, and there are likely to be an order of magnitude more in the English Wikipedia.

The NY Times article notes that:

Although Wikipedia has prevented anonymous users from creating new articles for several years now, the new flagging system crosses a psychological Rubicon. It will divide Wikipedia’s contributors into two classes — experienced, trusted editors, and everyone else — altering Wikipedia’s implicit notion that everyone has an equal right to edit entries.

In reality, those classes have been present for some time now. As part of my dissertation research I've been interviewing less experienced Wikipedians about their perceptions of the site. One constant theme has been the perception of a class system in Wikipedia. Casual editors worry that their edits aren't good enough, and that they'll be rebuked by Wikipedia's upper-classes. They perceive a mystical group of higher-order contributors who make Wikipedia work. They believe that the barrier to entry is high and that they don't know enough about how the system works even to make small edits. Partly I think this is a function of the increasing complexity of the Wikipedia system. Partly it's because of Wikipedia's increasing stature – less experienced users feel the consequences of their actions, when so many millions read the site each day.

I also think classism is something that Wikipedia's heavy-editor community actively cultivates. The NY Times notes the work of Ed Chi at PARC. Ed and he colleagues have done some really interesting work. Among other things, they've noticed a trend towards resistance to new content. In a recent paper presented at GROUP, Tony Lam and his colleagues found that the rate of article deletions is growing, and that most articles are deleted shortly after they are created. Wikipedia has a core of frequent editors who zealously guard their territory, sometimes actively discouraging newcomers, and enforcing complicated and arcane policies in ways that can reduce new participation. The ideology of Wikipedia is a level playing field in which everyone has a voice, but the practice of it is often far from that ideal.

This latest move is troubling in that it seems to represent a lack of faith in crowdsourcing and the wisdom of crowds, in the model that made Wikipedia what it is today. This change will also remove another of the important social-psychological incentives that draw new people into the Wikipedia fold: the instant gratification that comes from seeing your work reflected on a Wikipedia page. There will certainly be many papers written on the before-after comparison, and I suspect we'll see significant changes in the dynamics of the site, at least for the pages that will see this change.

Now, I'm no fan of Scientology, though I admit I think the whole thing is more laughable than anything else. But for Wikipedia this is a bad decision that leads down a bad road. There's two big issues here.

First, if Wikipedia starts to ban whole organizations rather than policing malicious individuals (who Register writer Cade Metz calls "Wikifiddlers" – love it!), how does it draw a reasonable line between protecting Wikipedia and social engineering? Wikipedia is already a horribly slanted body of knowledge, mostly as a function of the types of knowledge that its user communities value highly – natural sciences, computer science, engineering, popular culture. Picking and choosing organizations to ban will make this bias worse. Does Wikipedia only ban organizations that are easy to hate – Scientologists, neo-nazis, etc.? If this is about the policy, and an attempt to thwart coordinated propaganda, then shouldn't we also be banning IP ranges for, say, baseball teams, celebrities, and Congressmen, all of whom engage in organized propaganda attacks to gussy up their Wikipedia pages?

There's also a more fundamental problem with this – it breaks the model of "Wisdom of the Crowds." The whole point of WotC in the Smith / Surowiecki sense, is that a person is dumb but people are smart. When people are diverse, their biases cancel each other out. Picking the number of jelly beans in a jar isn't that different from making Wikipedia. We need all manner of biases. We need people to be wrong in all ways, and to coordinate propaganda in all ways. That doesn't mean we should allow all kinds of malicious activity – going after individual Wikifiddlers makes sense to me. But banning whole groups is a slippery slope that could hurt Wikipedia's reputation and quality in the long term.