Rants, raves (and occasionally considered opinions) on phyloinformatics, taxonomy, and biodiversity informatics. For more ranty and less considered opinions, see my Twitter feed.ISSN 2051-8188 View this blog in Magazine View.

Monday, January 25, 2016

A decade ago (OMG, that can't be right, an actual decade ago) I created "iSpecies", a simple little tool to mashup a variety of data from GBIF, NCBI, Yahoo, Wikipedia, and Google Scholar to create a search engine for species. It was written in PHP, relied on some degree of *cough* web scraping to get its data, and was a bit of a toy (although that didn't stop me complaining that it could do more than EOL at the time). Eventually I got sick of dealing with Google Scholar constantly changing it's HTML and blocking IP addresses to stop people harvesting data (I once managed to get my entire campus blocked), or services disappearing such as Yahoo's image search, and I eventually pulled the plug on it.

It's nothing fancy, just takes a species name and searches GBIF, EOL, CrossRef, and Open Tree of Life, grabs some data and puts it together on a web page. There are lots of limitations (e.g., only fetches the first 300 localities in GBIF, requires scientific names, tree viewer is pretty awful) but it was pretty simple to put together. It's entirely client-side based, the code is all in the HTML file (and a few Javascript libraries) (the code is on GitHub: https://github.com/rdmpage/ispecies).

Fun as this was, there's a bigger problem with iSpecies and that's that it is a "mashup". I'm simply grabbing data from different sources and redisplaying it. What I really want is what has been described as a "mashup" (awful term, don't use it), that is, I want to combine the data so that it is more than the sum of its parts. For example, some of the data could be cross linked (especially if add a few more sources and we drill down a bit). Some of the papers discovered by CrossRef may include original descriptions, or may be the source of some of the points plotted on the GBIF map. Some may include the phylogenies used to build the Open Tree of Life tree. In order to build a data mashup instead of a web mashup we need to operate at the level of data rather than just human-readable web pages. That is the next thing I'd like to work on, and in many ways it shouldn't be a big leap. The new iSpecies was fairly easy to create because we now have a bunch of web services that all speak JSON. It's a small step from JSON to JSON-LD (especially if the JSON-LD is constructed with reuse in mind). So while it's nice to see iSpecies back, there's a much more interesting next step to think about.

This is an appealing vision, because it seems unlikely that having multiple, small communities clustered around taxa will ever have the impact that taxonomists might like to have. Perhaps if we switch to focussing on objects (sequences, specimens, papers), notions of identity (e.g., DOIs, ORCID), and alternative measures of impact we can increase the visibility and perceived importance of the field. In this context, the recent paper "Wikiometrics: A Wikipedia Based Ranking System" http://arxiv.org/abs/1601.01058 looks interesting. A big consideration will be how connected is the network connecting taxonomists, papers, sequences, specimens, and names. If it's anything like the network of readers in Mendeley then we may face some challenges in community building around such a network.