April 1 to April 11

It’s started out as a quiet beginning to the bioperl list summaries. I actually wanted to use the image of tumbleweeds rolling across the desert plains in Texas, my home state, and no, before you ask, I’m not a fan of our current president. But my hopes were dashed; it ended with a flurry of activity. In case people ask, ’YT’ is "yours truly"; I feel a bit odd inserting my name in these summaries, even occasionally. Low self-esteem? I don’t know; I’m a biologist, not a shrink. I’ll let you be the judge. Anyway, I’ll probably make these little reports biweekly (that’s every other week, as opposed to ’semiweekly,’ but I digress). That’s to avoid upsetting the PI and allow myself some time to do a few other things, like get out of the lab, talk to the wife, grab a beer, etc. Oh yeah, and write some Perl. If anyone wants to trade off every other week, let me know. First up…

Bioperl-announce

BOSC 2006

Okay, okay. Technically this was relayed on the list in late March, but it IS big enough to be included regardless. Darin London has posted the official announcement for BOSC 2006.

BOSC 2006 will be held by the Open Bioinformatics Foundation on August 4-5 in Fortaleza, Brasil as a Special Interest Group (SIG) meeting at the 14th International Conference on Intelligent Systems for Molecular Biology. Consult The Official BOSC 2006 Website for more information:

Bioperl-l

Bio::SeqIO::genbank confusions

Scott Markel reported problems that tag values in Bio::Annotation::Simple objects which had a zero value were not parsed correctly and were written as an empty string. He then added a test case and script to prove his point. The confusion started when YT decided to try the script on Windows (oops #1), found that it gave a different error (oops #2), and, thinking the two were linked (oops #3), committed Scott’s recommended fix. The test case data (a small GenBank file) was off by one space with the feature line, thus causing one error, while a second fix was made in Bio::Annotation::Simple by Heikki around December 2005 which fixed Scott’s original error (aha! someone didn’t update from CVS). Everything was right with the world again, except that Scott’s reported (and now redundant) fix was committed to CVS. YT corrected the ’correction’ (thanks Hilmar) and there was much scratching of heads… Confused? Me too.

Bio::SearchIO::psl issues

Albert Smith reported via Bugzilla and the main list that SearchIO had an issue parsing PSL formatted files from WebBlat. We (Albert and YT) were finally able to replicate the error but not without some perseverance. This ended up being a case where in inserting the ’-w’ flag helped to spot a not-so-obvious error. Fixes were made and people were joyous (or at least Albert was…)

A report was made that parsing a standalone BLAST report (the file was ’very large’) was causing a 99.9% spike in processor before ending in a killed process message. Jason pointed out that it was likely that the file was simply too large. It turns out that ’very large’ is approximately 215 MB. Didn’t know that…

Getting sequences by ID

A question was raised by Yuval on how to parse a large group of sequences (20,000) to get only a small group of IDs into fasta format in a reasonable amount of time. Torsten and Ryan chipped in with their suggestions (flat DB indexing sounded good); Yuval came up with his own. Amir Karger also chipped in a perl one-liner that’s not Bioperl-related (blasphemy!!!)…

Bio::Tools::RestrictionEnzyme and Bio::DB::fasta issues

Nick Staffa wanted to know what’s up with the cut_seq method in Bio::Tools::RestrictionEnzyme. Turns out that module is deprecated. Brian O fixed that in documentation and YT pointed this out. Then Nick found that someone forgot to implement the is_circular method, which Brian fixed in Bio::DB::Fasta…

Use SearchIO::blast, not Bio::Tools::BPLite!

Sonmitra Mondal wanted to know why he got a ’bad gateway’ error using his script and what was going on with hsp->sbjctseq. YT pointed out that the ’bad gateway’ error sometimes happens at NCBI during peak hours, but that a bigger problem was his use of sbjctseq method which is from the deprecated BPLite module. Brian corrected the current documentation to reflect that…

Coloring with GFF2

Marco then asked another question how to display binding sites, coloring based on the score, thus further proving he’s the reincarnation of Bob "Happy Little Trees" Ross. Okay, if you’re not American or didn’t watch PBS in the `80’sand `90’s, that probably flew over your head. The saga continues…

Humanely Slicing Alignments

Iain Wallace tries to figure out is there is a way to take a slice out of an alignment when one of the sequences doesn’t have a residue in that position. He keeps getting an error. Brian O. offers an undocumented solution, then documents it…

Primers and Sequences

Kevin Victor wants to know if there is a way to search for a primer sequence pair for a long sequence in batch mode. As rightly pointed out by Donald Jackson, there are probably better ways of doing this, namely EPCR…

Leftover #1 : Sunil posted a link to a new tool, MeltDNA, which is a Perl-based tool for predicting DNA duplex hybridization and thermodynamics. It was wondered aloud where a publication describing this might be found, and Torsten had the answer…
http://bioperl.org/pipermail/bioperl-l/2006-April/021253.html

BioSQL

Tumbleweeds flowing across the plains (I had to use this somewhere)… no posts.

Bioperl-guts (for the die-hards)

Lincoln Stein has posted a slew of new and updated modules in relation to GFF3 databases, along with updated tests and a script.