Here the whole group in the lab of Computational conSequences during the Spring/Summer of 2015. I’d say that this is the best group ever.

Gustavo leaves today, going back to Michoacán, Mexico after spending his sabbatical here. Julie left a few weeks ago, also back to Michoacán. She might come back for the Spring/Summer 2016.

The only locals are Brigitte, Kissa, Thomas, César and me. Brigitte and Kissa being honorary members who have been in the lab for collaborative reasons, but work for their M.Sc. degrees with other faculty members at Laurier (Michael Suits and Geoff Horsman, respectively).

N. Ward, G. Moreno-Hagelsieb, Quickly Finding Orthologs as Reciprocal Best Hits with BLAT, LAST, and UBLAST: How Much Do We Miss? PLoS ONE 9, e101850 (2014).

The story goes as follows. At a talk by some group I heard that they were using UBLAST to quickly find members of some protein families rather than use a Hidden Markov Model approach. They said it was much faster, so I became curious. I downloaded USEARCH 5 back then to try and test for the things I commonly do with NCBI’s BLAST. I was surprised at how fast this program ran. In any event, I thought that testing this program for some task would be a good work for an undergrad student. That was Natalie’s undergrad thesis. Back then about using different options under USARCH to try and get as much coverage with UBLAST as with NCBI’s BLAST (UBLAST was not an option in USEARCH 5, rather, a local alignment search had to be done). We became more ambitious, and decided to test a few more programs. BLAT was something I was already playing with, while an article by Jonathan Eisen (Darling et al., PhyloSift: phylogenetic analysis of genomes and metagenomes. PeerJ 2, e243. 2014) pointed me in LAST’s direction (besides reviewers asking for more programs to be tested).

Later on, at some other talk, I think this was a talk by Robert Beiko. He mentioned something about BLAST being too slow for some task, and I asked him why not try UBLAST. He said something to the effect of not knowing how much they might miss.

The articles we published cover one task each. One is the task of finding orthologs as reciprocal best hits. Pretty straightforward. How many orthologs are found by each program when compared to BLAST. Essentially, finding orthologs as reciprocal best hits does not require the finding of every possible match. Top matches would be enough. So, if UBLAST, for example, found just a few top matches (under version 5, we could control the number of matches found before the program stops looking), that would be enough to determine the best, and thus figure out reciprocal best hits. We though we might miss many matches, but still find most of the reciprocal best hits, and that’s what we found to be the case except between evolutionarily distant genomes (see second reference above).

For the test on overannotation, the main idea was that for that task we compare proportions, not total number of matches. Thus, if UBLAST, LAST, and BLAT missed potential homologs, but still found equivalent proportions to those found by NCBI’s BLAST, then the programs would work fine for estimating overannotation. Well, that’s what we found.

Finally, why democratic genomics? Well, tools that can run sequence comparisons in a fraction of the time that BLAST runs, and that in a desktop computer, then comparative genomics of a much larger scale becomes available for most if not all bioinformaticians. Why would I care? Well, because the most people can participate the higher the number of ideas that can make it into the field. Not everybody has access to computer clusters. There’s other avenues towards this democracy, like the availability of some precomputed homologies and orthologies. Yet, people will want to do their own tests for many reasons. From doubting the quality of existing data, to testing genomes and protein sequences not already available in databases. Maybe there’s also a good chance that genome and protein comparisons will be done via cloud computing, and be quite accessible to mere mortals. Maybe web-based tools like RAST and MG-RAST are good enough for these tasks instead of having our own thing. I don’t know. For now I think that the more options the better. These two articles are not enough. Strategies should also be developed to avoid wasting time and effort comparing sequences. As we develop our ideas and test programs, we will publish our results either in articles, or, if not enough for a publication proper, in blog entries.

Share this:

Like this:

Summary: let’s make manuscripts for review reviewer-friendly instead of atavist-editor-friendly.

There are many things we carry on because of … let us call it “tradition” to avoid calling it by its proper name: “atavism.”

Today I am finishing reviewing a manuscript, and I feel irritated again that the article has the figures last, by themselves, and that I have to jump from one page with all the figure legends, while trying to match them to figures that the journal’s software had the good idea to tag with numbers, but still, no legends. Shit. I wonder, I publish articles myself, and I have decided to put the legends at the bottom of the figures because my experience as a reviewer has told me how much easier it would be if this was the norm. A few journals, when you upload figures, have a field for the legend, but few authors seem to notice. What about mes amis et amies, you made this very clear to authors? Why do we carry on with this atavism from much older times when figures were sent by snail mail, for lack of anything better, and pages had to be put physically together, and a whole process of postprodution (I don’t know why the speller is suggesting “prostitution” instead of this word) carried on. Who knows why the figures had to come separated from the legends, but whatever, it was so. Today, we electronically send the figures and manuscript first for review, and we are asked later to send “production” figures anyway. So why not save some pain to our peer reviewers and give them something easier to examine? Shit, even if the journal does not ask you so. They will ask for “proper” figure later anyway (if and when your article gets accepted, that is). So double and triple please, put those legends with the figures. Let us stop this atavist custom and be merrier.