Rants, raves (and occasionally considered opinions) on phyloinformatics, taxonomy, and biodiversity informatics. For more ranty and less considered opinions, see my Twitter feed.ISSN 2051-8188 View this blog in Magazine View.

Monday, February 02, 2009

Wiki modelling - Part 1

Time to make some notes. I've been playing with using Sematic Mediawiki to create a database of taxonomic names, literature, specimens, sequences, and phylogenies. One challenge is to come up with simple ways to model these entities, in a way that makes both data entry simple and querying as simple as possible. Some things are straightforward. For example, a publication can be modelled like this:OK, I've ignored the attributes. The diagram simply shows the use of MediaWiki REDIRECT to enable the use of standard publication GUIDs as Wiki page names (see earlier posts for more details, and a hack to deal with problem characters in DOIs). One benefit of GUID REDIRECTs is that I can refer to publications using GUIDs, and the wiki user will be taken to the article page without any fuss.

Likewise, we can model a journal like this: Again, GUIDs are REDIRECT pages. This means an article page can have the ISSN of the publication it appears in as one of its attributes, and we can then use ISSNs in our queries.

People are a bit trickier, given the absence of GUIDs (or the desire to keep obvious ones, such as email addresses, private) (see doi:10.1371/journal.pcbi.1000247 for some background). I plan to have a single page for each author, and have alternative spellings link to that page:

This is one motivation for my work on equivalent author names. By finding clusters of equivalent names it would be possible to pre-populate the wiki with author names from bibliographic databases, whilst minimising the number of duplicate pages for the same author.