Nicholas Carr's blog

Veropedia and the Wikipedia mine

A year ago, I wondered “why no devious entrepreneurs have made a concerted effort to take Wikipedia’s content, which is of course free to be reused in any way, and reformat and rebrand it as an attractive commercial site.” Why not mine Wikipedia’s unexploited commercial value?

Companies like Metaweb, through its Freebase service, and Radar Networks, through its Twine service, have begun mining Wikipedia, at least indirectly. They use the site’s content to help populate the Semantic Web databases they’re building. Embedded in Wikipedia’s structure and links is a lot of information about the relationships between things, and this information may have considerable commercial value in building a more intelligent web.

Now, we have a startup – Veropedia – that hopes to more directly exploit Wikipedia by presenting, in effect, a premium edition of the free encyclopedia. It’s a cream-skimming – or cream-scraping – operation that aims to make money through advertising. As Slashdot explains, Veropedia is an effort “to collect the best of Wikipedia’s content, clean it up, vet it, and save it in a quality stable version that cannot be edited. To qualify for inclusion in Veropedia, a Wikipedia article must contain no cleanup tags, no ‘citation needed’ tags, no disambiguation links, no dead external links, and no fair use images after which candidates for inclusion are reviewed by recognized academics and experts.”

Veropedia claims that its relationship with Wikipedia is symbiotic – that even as it sucks up the Wik’s economic value it will help improve the quality of its host. As one of Veropedia’s developers explains, “Veropedia has a very comprehensive article checker that points out just about every flaw with an article that a computer program can find. But articles aren’t edited on Veropedia. Veropedia contributors must go and edit the article on Wikipedia, fixing up all the flaws, until a quality version is ready for importation to Veropedia. So everyone wins: both Wikipedia and Veropedia get improved articles.”

This is a slick variation on the sharecropping business model. Veropedia doesn’t even have to invest in the upkeep of the plantation; it just swoops in, grabs the most valuable crops, and sells them down the street at its own farm stand. To switch back to the mining metapher, the Veropedians don’t have to get their hands dirty – they leave the shovels and pickaxes to the Wikipedians. Here’s how the self-described “group of Wikipedians” who started Veropedia put it: “if you think of Wikipedia as a diamond mine, we think of ourselves as jewelers who provide a finished product to the public.”

Will it work? Probably not. A quick look at the Veropedia site doesn’t reveal much added value. It looks like a shoestring operation that’s pretty much doing straight screen-scrapes at this point. In following links, you end up bouncing back to Wikipedia in a confusing way. And I don’t see any evidence of these “recognized academics and experts” who are supposed to be reviewing and “verofying” everything. If you’re going to monetize Wikipedia’s content through an alternative encyclopedia site, you first have to overcome Wikipedia’s momentum – and that means offering users clear and compelling benefits. Veropedia doesn’t.

There may be diamonds in them thar Wikipedian hills – or at least some sparkling chunks of cubic zirconia – but Veropedia is still miles from pay dirt.

Okay, so some other site comes up with an algorithm that exposes what’s goofy on Wikipedia. Then Wikipedia comes up with one too and the other guys are unnecessary. That seems a losing game.

The “algorithm” that seems in place is that I think people are learning how to use Wikipedia. If it’s a verifiable fact, like a date something happened, it’s almost certainly right. If it’s a matter of opinion, it’s probably biased left, but you can read the article to see if the other side got in there, or if it has a strong whiff of “needs cleanup.” And if something ever got mentioned in The Simpsons or inspired an indie band song, it’s absolutely certain to be in there.

Beyond basic fact-checking or instant briefing on something (which one was the Thirty Years War again?), though, you should know that you need to go somewhere else. (I realize lots of high schoolers don’t see it that way, but add that to the things you LEARN in high school.) I used to keep the mini Columbia Encyclopedia by my desk in case I suddenly needed to know when Bertrand Russell died. Wikipedia serves the same function– and no more.

My 7th-grade daughter asks about the US Constitution for her history class. Where do I send her? Wikipedia. I know it will be accurate enough; she needs to know which amendment covers cruel and unusual punishment, and the subtleties of the Supremes’ changing stance on the death penalty are irrelevant.

I saw a picture of Robert Plant and wondered how old he was. Wikipedia.

Who was it who grabbed the high ground at Gettysburg? I know Wikipedia will remind me it was Buford and I don’t need to dig out my copy of The Killer Angels.

Who makes disk partitioning software? It’s in Wikipedia.

I know that for any common knowledge item, it will almost certainly be in there, and the info will be accurate enough. One-stop shopping, Internet style. It’s faster and easier than Google or Live Search.

If I really want to understand Robert Plant’s singing, I’ll listen to how he handles phrases on Song to the Siren compared to the Tim Buckley original. When my daughter hits high school and really wants to dig into this country’s thinking on the death penalty, I’ll suggest to her the actual court decisions. And it wasn’t until I stood on the broad field called Oak Ridge and looked back at Cemetary Hill that I began to understand just what Buford’s insight meant.

But below that level — for most of us, most of the time — Wikipedia suffices. It’s good enough.

Attacking it for accuracy is like complaining that the Red Sox defense isn’t as good as that of the Rockies. It’s true, but it’s not dispositive in terms of the end result.

> Wiki is a very useful site however it does have a very distinct left leaning slant.

Maybe so. Wikipedians “do nuance”. If that makes them elitist and intellectual and people perceive them as left leaning, so be it. In a time when the right has allowed themselves to be represented by the likes of Fox News, I can’t say I sympathize.

I’m quite sure properly attributed, factual information from the right is welcome in the Wikipedia.

Checked Veropedia’s Henry James article, which I hauled through the FA gauntlet on Wikipedia. Vero’s version was a word-for-word scrape from Wikipedia. If this is all the site offers, why bother? At least answers.com combined the article with content about James from several other sources.

Well, you speak of Veropedia like it is trying to be a for-profit commercial venture. This is speculation. You also say that “to switch back to the mining metapher (sic), the Veropedians don’t have to get their hands dirty – they leave the shovels and pickaxes to the Wikipedians.”

The truth is, however, that all of the contributors to Veropedia were selected by Danny Wool. Each one of them is a heavy contributor to Wikipedia. Many of the uploaded articles were actually created or substantially worked upon by the Veropedia members who uploaded them.

Danny has plans to speak to academics and experts in the field to get their opinion of the articles that have been uploaded.

There are a number of issues with the project: the parser needs work on it’s interface, it is actually just a scraper at the moment of a particular revision so it has bugs, and there are many more great articles to upload. But… it’s a new project.

Anyway, at the very least, it’s not doing anyone any harm. I don’t really see where the problem is, but then again I’m one of those who upload via the site.

Given that you don’t really like Web 2.0, and you dislike Wikipedia even more, your comments are really just pure speculation. You don’t really have any more clue than the next man about Veropedia!