Bioinformatics Rotation Students

With gigabase genomic samples, Metagenomics plants itself firmly as a Big Data science. High-throughput sequencing using the Roche 454 and Illumina Hi-seq technologies has reduced the cost to perform a genomic study, but has raised whole set of analytical challenges. Rotational students from the CIHR Graduate Program in Bioinformatics program assist in the development of novel statistical techniques, computational algorithms, analytical visualizations, and their integration and deployment as usable software.

Through life’s complexity pyramid, information is transferred through a hierarchy of genomic, transcriptomic, and proteomic modalities. It seems logical that the analysis of such genomic datasets also be hierarchically integrated. The Integrated MetaPROteomics Viewer (IMPROV), an analytical software developed at the Pacific Northwest National Laboratory (PNNL), attempted to solve this problem for hierarchically clustered datasets through its Galaxy Tree-map data-visualization. However, this initial prototype lacked essential data-import and export functionalities, and had no content support from existing databases, stymieing its uptake and use on available metagenomic datasets.

This rotation project attempted to standardize data IMPROV import and export to many different filetypes. This lead to the prototyping a .improv data file format that would ease the internal use of the IMPROV program. Furthermore, pathway information from the Kyoto Encyclopedia of Genes and Genomes database (KEGG) was integrated into the Galaxy tree-map view, adding the higher-level of metabolic pathway analysis to metagenomic datasets.