Share this story

I'd never expected to find Jimmy Wales' name in the author list of a scientific paper, much less a list that includes one of the deans of fly genetics, Michael Ashburner. But the two are among the two dozen authors of a paper that will be released this evening by Genome Biology. The group is writing to let the biology community know that the first WikiProfessional project, WikiProteins, is ready for beta testing.

A little over a year ago, we covered the announcement of of this project, and the basic concept remains the same: wikis need contributors, but it's hard to attract contributors without any existing content to draw them in. WikiProteins solved this by importing a huge amount of data from existing databases, such as PubMed, Swiss-Prot, and the Gene Ontology database.

The new paper describes a major advantage to this approach. Traditionally, biological information has been divided between two approaches: data mining, which involves parsing existing information to identify semantic content and connections within it, and curating, which involves expert, manual analysis of data. By importing information from both types of sources, WikiProteins should theoretically contain the best properties of both types of data: reliable information supplied by experts and potential connections among data that haven't previously been explored.

The paper provides a number of measures of the success of this approach. For one, the import process has identified over a million individual authors, and a similar number of concepts that connect them and the other items stored in the database. The different data sources also seem to have paid off, as the authors determined that well over half of the protein-protein interactions brought in from curated databases could not have been identified by data-mining PubMed abstracts.

In calling for biologists to get involved in the beta process, the people who generated WikiProteins have a number of roles in mind. For starters, they expect that the data mining process has generated a significant number of spurious connections, and hope that the community will help in pruning those. For example, they noted that the gene abbreviation "CLB2" mapped to at least five different genes (depending on the organism), as well as a material used in dentistry, Clearfil Liner Bond 2; manual intervention may be needed to sort these out. They're also hoping that contributors will simply dump sentences from the literature into WikiProteins in order for them to be indexed and further connections mined.

As the name implies, though, the WikiProfessional approach relies on experts taking part in the evaluation of the data. On a general level, the WikiProteins data will be made available so that other experts can reindex its contents using other analysis tools, possibly generating more connections to contribute back.

But the success of the project will largely depend on its ability to attract biologists that feel a sense of ownership regarding a topic and make sure any changes to it are accurate. To that end, and to allow other users to evaluate the credibility of contributions, contributors are expected to register under their professional names, preferably identifying themselves as one of the authors that have been indexed.

Anonymity is often viewed as promoting the open exchange of opinions—it forms the basis of the scientific peer review process for precisely that reason. If WikiProfessional can turn it into a positive feature, it may be an essential component in translating what appears to be a great concept into a useful one.