Below we summarize the accomplishments from the two days. We welcome feedback on the topics covered and hope that by sharing our work we can encourage more programmers to become part of the open science bioinformatics community. Actively working to build well-tested, community-developed, interoperable tools is how we solve increasingly difficult research questions ranging from human health to plant breeding to microbial community function. The progress made in two days illuminates the effectiveness of open collaborative science.

Barrnap – Bacterial ribosomal RNA predictor

Torsten Seemann, Tim Booth

For the last 8 years RNAmmer has been the standard tool for predicting ribosomal RNA features in genomes, because it is reasonably fast, accurate, and works on bacteria and eukaryotes. Its drawbacks are that it relies on small, older databases; requires an older conflicting version of HMMER; and has restrictive licence terms. To resolve these issues we have implemented a new rRNA predictor which uses the new “nmmer” tool from HMMER 3.1 for searching DNA profiles against DNA sequence. We used the Silva and GreenGenes seed alignments for the 5S, 23S and 16S genes to build the profile models from. Barrnap is a small Perl script which takes FASTA as input, and outputs the rRNA feature predictions in GFF3 format. It will be packaged in Bio-Linux and replace RNAmmer in the Prokka bacterial annotation system.

BioJVM – Coordinating and integrating BioJava and ScaBio

Spencer Bliven, Andreas Prlic, Markus Gumbel

Both Java and Scala run on the Java Virtual Machine. As such, it makes sense to coordinate and document the various Bio* projects which run on the JVM and therefor can interoperate to some degree. We are able to successfully reference BioJava functions from Scala code and ScaBio functions from Java code. The ease of this process means that users can easily use both libraries from whichever language is more suited for their biological problem.

Biopython

Peter Cock, Konstantin Tretyakov, Bin Zhang

The Biopython team worked on training new users at Codefest and exploring integration of Biopython with other Python molecular visualization toolkits like PyMol. Infrastructure development involved testing and debugging on multiple systems, including identifying and fixing Windows and PyPy problems. We also identified areas where we can make it easier to contribute to Biopython: specifically easing the process to report and fix bugs by moving to integrated GitHub issue tracking and working to support Biopython-associated projects with easy installation tools.

Galaxy Debianization

Tim Booth

I spent several hours revisiting previous work on the Galaxy package for Bio-Linux and made significant progress towards it being something that can go into Debian-proper. Results will be committed to Deb-Med public SVN and patches will be forwarded to the Galaxy dev mailing list.

Standards and Visualization

Ontology and provenance representation

The goal of this group was to investigate and implement solutions to use ontologies to help people find and use the programs and data they need for their work, and to help automate the integration of tools or data resources into workflows or workbenches. We also wanted to identify useful provenance metadata, to store in a rigorous way the conditions and configuration of analysis steps run by users. This improves transparency, reproducibility, and reliability of the scientific results.

We worked toward inclusion of the EDAM onotology as part of the Mobyle system’s built-in type and classification mechanisms. We created a user case by identify workflows in Mobyle and mapped the descriptions unto EDAM classification to allow mapping between the types. We also investigated the possibilities opened by projects such as PROV to standardize the provenance information stored by systems such as Mobyle. We added a prototype functionality to the development version of Mobyle that dynamically generates this provenance information in a JSON-based format.

Integrate DGE-Vis & Dalliance, JS animation scheduler

David Powell, Thomas Down, Skyler Brungardt, Alex Kalderimis

We worked on integrating two visualization tools: the Dalliance genome browser and the DGE-Vis RNA-seq explorer. We now have a proof-of-concept tool that makes it possible to visualise RNA-seq analysis while browsing the genome. This inspired a JavaScript scheduler that is able to schedule slow animation updates when the JavaScript engine is not busy, allowing smoother animations and more accurate windows. Finally, we added a JBrowse-compatible JSON backend for Dalliance for integration with Intermine.

Infrastructure

Infrastructure management via CloudBioLinux (CBL)

Documentation: Due to the increased interest by individuals to use and contribute to CBL, we invested effort into creating purpose-driven documentation for CBL. This should help people use the endproduct of CBL, customize CBL their needs, as well as learn about the internals of CBL with the aim of contributing. We will finish and make the documentation available on ReadTheDocs over the coming months.

Build frameworks: We developed a simpler automated method to invoke the CBL build framework to help remove complex error prone steps.

Web tooling: In spirit of making CBL more accessible and easier to use, we’ve decided to tackle development of a lightweight webapp that helps with customizing and generating CBL configuration files.

We also worked to build a tool that helps provide run time estimations for bioinformatcs jobs (e.g. “how long should aligning 40 million reads against hg19 with BWA take if I use 8 cores?”). We plan to collaborate on longer term development of this with the Genome Comparison of Analytic Testing team.

GATK-based reusable pipeline based around Rubra/Ruffus

Clare Sloggett, Bernie Pope

We worked on code cleanup, documentation and test data for a reusable pipeline to handle variant calling and annotation, using Rubra built on the Ruffus framework. It handles BWA alignment, GATK alignment cleaning and variant calling and ENSEMBL annotation. To make these pipelines easier to run, we worked on integrating them into the GVF flavor in CloudBioLinux.

[…] you are wondering what happens exactly at a CodeFest, I suggest Brad’s blog post from the BOSC Codefest 2013, or Möller et al (2013). Basically these meeting are a chance for developers of open source […]