Hi list,
I'm using a script very similar to bp_taxonomy2tree.pl distributed with
BioPerl (with the only difference that I'm using taxids instead of taxon
names). Basically, the script generates a taxonomic tree given a list of
taxids using the NCBI taxonomy db. For each taxon, it generates a taxon
object, and then merge this object to a tree object that keeps growing. It
runs very well with a small number of taxa, but with many taxa (>1000), it is
very very very slow (about a week for 3000 taxa).
The slowness is due to the function merge_lineage (line 65), which merges the
existing tree object with a new taxon object. I guess that the difficulty
with a big tree (i.e. more than 1000 leaf) is to find the nodes in common
between the tree and the new taxon object...
Would you have any idea of how to get around the problem? Should I look under
the hood of merge_lineage to try to improve it for large trees?
Thanks!
Version: bioperl-1.5.2_102
OS: GNU/Linux
-Tristan