All about the good, the bad and the ugly things in life but mainly stuff about evolution, diversity of life, life forms and morphogenesis, phylogenies, trees and insects. Lots of biological news and comments. Cool discoveries of new species etc...

Friday, May 25, 2012

Four years to digitise the phylogenies published so far!

Phylogenetic knowledge is being squandered at a rate of approximately 20,000 phylogenies a year (assuming that all papers with phylogen* in the title or abstract have illustrations of phylogenies). Fortunately, this loss of knowledge (and wasted money) is being tackled from multiple angles. On the one hand there is the open access movement that is striving to make publically-funded science freely available and accessible to everyone. For phylogenies this will hopefully go hand in hand with a greater submission of phylogenies to databases like TreeBASE. On the other, there are efforts to digitize past phylogenies: TreeRipper, TreeSnatcher and now TreeSnatcher Plus.

TreeSnatcher Plus has recently made a number of improvements on its predecessor and it was great to see that they benchmarked it against the same dataset as TreeRipper which contained phylogenies from the open access BMC Evolutionary Biology. They state that the average time for processing was 160s per phylogeny. I was interested to see how long it would take to digitise all the phylogenies produced to date. Assuming that all papers with phylogen* in the title has an image of a phylogeny, there are 734,585 published phylogenies according to ISI Web of Knowledge, that would require 4 years to digitise. The result might not be so bleak if Pubmed represents a more accurate number of phylogenies: 131,659 articles which would require a little under 1 year to digitise semi-automatically.

O.K. these numbers a pie in the sky but can we afford this wasted time?