Selected Orals

Remember that the acceptance of your oral presentation was notified by e-mail to the address you wrote when submitting your work.

List of accepted oral presentations

You will have a total time of 15 minutes for your presentation. Therefore, you should plan your presentation time for a maximum of 12 minutes, leaving 3 minutes for discussion and switch of speakers. Please be aware that session chairs will make a concentrated effort to keep sessions on a structured time schedule.

Please note that each presentation was assigned a unique ID of the form "DPX", where D is the day name (Monday, Tuesday or Wednesday), P is the period of time (Morning or Afternoon), and X is a serial number. Note also that the presentations in the list were grouped by topic session.

Short Abstract: We have performed a phylogenomic analysis of the Schistosoma mansoni proteome, classifying proteins into multi-domain architecture classes and constructing gene trees for proteins and for Pfam domains contained in these proteins. The resulting analyses are provided in the PhyloFacts-SchistoDB database. For each gene family in Schistosoma mansoni, the PhyloFacts-SchistoDB database contains a gene tree, multiple sequence alignment, predicted orthologs, hidden Markov model, identified Pfam domains and annotation data. A set of 1884 candidate sequence orphans in the S. mansoni genome has been identified, with the function of 259 of these orphans being potentially informed by matches to PhyloFacts families.

Short Abstract: The generation of biodiversity is tied to the evolution and re-wiring of gene regulatory networks (GRNs). One component of these GRN are transcription factors and other transcriptional regulators. We have devised a pipeline for the identification of TFs and TRs, exploiting the domain architecture of these proteins. Currently we have a set of rules, representing 138 proteins families, that we have applied to the identification of ~20 different plant species and several species of Stramenopiles, where important plant pathogens are found. Results for plant species are available at http://plntfdb.uniandes.edu.co/; we are now developing a newer interface for Stramenopiles.

Short Abstract: Few studies have examined available data to test hypotheses associated with the phylogeographic partitioning of ISAV infecting viral population, the population dynamics, or evolutionary rates and demographic history. We addressed these questions using modern phylogenetic methods. A recombination breakpoint was consistently detected in the Hemagglutinin-Esterase gene around the Highly-Polymorphic Region (HPR). Evolutionary relationships of ISAV revealed the 2007 Chilean outbreak group as monophyletic for the fusion gene. Their tMRCA is consistent with epidemiological data and demographic history showed a profound bottleneck. Selection analyses detected ongoing diversifying selection in both genes associated with protease processing and the HPR region, respectively.

Short Abstract: To unravel the genetics underlying pathogenicity in Ascomycetes, we reconstructed the evolutionary history of gene families and gene expression behavior using a sample of eight species. Both evolutionary events at coding sequence and regulatory network levels contributed to the acquisition of pathogenic traits in Ascomycetes. At the coding sequence level gene families with functions related to pathogenic host-fungus interactions showed an accelerated evolution rate. Also most pathogenicity related genes tend to be co-expressed with a different gene set than their orthologous counterparts in non-pathogens, indicating a considerable rewiring of the regulatory network during pathogenicity emergence.

Short Abstract: Recent studies have indicated that fungi have a large number of APs in comparison to other taxonomic groups. We are interested in the functional diversification and redundancy of fungal APs. We performed a phylogenetic analysis of APs obtained by HMMer profiling of 107 sequenced eukaryotic genomes. The obtained phylogeny contains, besides the generally accepted A01A and A01B subfamilies, six subfamilies that are exclusive to fungi. Moreover, we identified a D and an E that are specific to the fungal subfamily of Yapsins, These residues are likely involved in monobasic sequence recognition and substrate processing of zymogens of various secreted hydrolases.

Short Abstract: The study of evolutionary rates is a central issue to understand the mechanisms underlying protein molecular evolution. In this work we study how the presence of conformational diversity in proteins could influence the rate of evolution. We have determined that the evolutionary rate positive correlates with the degree of conformational diversity measured by the maximum RMSD between conformers. Our results support the idea that proteins with larger native conformational space could have a higher average of inter residues contacts, a measure of protein designability, giving rise to an increased evolutionary rate.

Short Abstract: To make full use of research data, the bioscience community needs to adopt technologies and reward mechanisms that support interoperability and promote the growth of an open ‘data commoning’ culture. Here we present the prerequisites for data commoning and describe an established and growing ecosystem of solutions using shared concepts to support that vision. The ISA commons is a growing exemplar ecosystem of data curation and sharing solutions built on a common metadata tracking framework, providing tools and resources to create and manage large, heterogeneous data sets in a coherent manner (Sansone*, Rocca-Serra* et al., Nature Genetics, in press).

Short Abstract: Tandem Repeats (TR) are sequences where the same pattern repeats consecutively. So far, they have been used as genomic markers, but new studies have associated them to important regulatory processes which increased their relevance. Although several TR discovery algorithms are available, genome-wide TR discovery is time consuming, and any improvement in computational performance may be significant. We present a new fast de novo genome-wide TR discovery algorithm, called ConvolutionTR. It deals with large genomes on average 30% to 50% faster than other approaches; furthermore, ConvolutionTR finds all the TRs while the most popular algorithms do not as will be shown.

Short Abstract: Set enrichment analysis identifies enriched biological categories/terms by evaluating the proportion of differentially expressed genes against a background reference (BR) in high-throughput experiments. However, results will depend on BR choice. Currently there is no way to analyze enrichment robustness to BR choice. Bootstrapped versions of a given BR were carried out to provide robustness of enriched terms to BR selection. Thus, terms with a high stability are candidates to by explored leaving spurious enrichment out of the analysis. Results showed that stable terms were found to validate main experimental hypothesis and also new terms emerged providing new biological insight.

Short Abstract: Transposases (Tnps) are enzymes that are encoded by Insertion Sequences (ISs). Tnps are one of the commonest proteins found in nature and play an important role in gene and genome evolution. However, they are difficult to predict bioinformatically and given the increasing availability of prokaryotic genomes and metagenomes, it is incumbent to develop rapid, high quality automatic annotation of Tnps. We have developed the most extensive and robust database of HMM Profiles for pattern recognition of Tnps in Prokaryote genomes. In this work we describe the novel biology revealed by high-dimensional analysis of over 210,000 Tnps.

Short Abstract: Spinal cord injury (SCI) results in motor and sensory loss. Xenopus is only able to regenerate after SCI before metamorphosis. Our hypothesis is: SCI induces a regenerative-permissive transcriptome in the X. laevis spinal cord, absent after metamorphosis. Overexpression or repression of key genes determines the regenerative abilities of pre-metamorphic stages. A bioinformatics analysis of our RNA-Seq study of the spinal cord transcriptome after injury in Xenopus regenerative and non-regenerative stages showed differential expression of genes related to neurogenesis, cell cycle and extracellular matrix. These differences may account for the regenerative abilities of pre-metamorphic stages, and the lack thereof after metamorphosis.

Short Abstract: The present work presents the comprehensive genetic analysis of a natural hypersaline microbial community using metagenomic deep-sequencing. We examined the accumulated metagenomic data via three complementary data analysis treatments. Foremost, raw sequencing reads were analyzed for phylogenetic and functional content (environmental gene-survey approach). Secondly, we reconstructed the genomes of the most abundant members of the microbial community (assembly-driven approach). Finally, we used the assembled genomes to dissect the genetic heterogeneity that is present in each microbial population (population genetic approach). The combination of different approaches provides a more detailed picture of the genetic and functional diversity of microbial communities.

Short Abstract: A new method is presented to generate theoretical protein structures for such remote homology modeling cases where sequence signal is not useful any more. The approach uses NMR chemical shift data and an exhaustive library of protein structure building blocks, Smotifs. The current modeling approach does not require any sequence information for modeling.

Short Abstract: Correlated mutation has been used mostly to detect structural contact pairs in protein families. Based on previous observations that it can also provide functional insights, we provide a framework devoted to this purpose, based on an amino acid specific correlation measure, used to build networks summarizing correlation and anti-correlation patterns in a protein family. Network decomposition results in subsets that can be further assessed by parameters and procedures, proposed for this methodology, having useful applications in protein family analysis. This framework is applied to the family of Fe/Mn superoxide dismutases, highlighting its potential use in protein characterization and gene annotation.

Short Abstract: Linear motifs mediate many protein functions. We studied conservation and evolution of eight linear motifs within the intrinsically disordered (E7N) and globular (E7C) domains of the papillomavirus E7 protein using 200 sequences. E7N and E7C show similar conservation, which is explained by the globular structure of E7C and by the conserved and coevolving linear motifs in E7N. Several motif pairs show high co-occurrence, suggesting that they form functional and evolutionary units. Multiple independent appearances of several motifs during papillomavirus evolution provide direct evidence for convergent evolution, which may play an adaptive role as shown by correlation with phenotype.

Short Abstract: Transcription factors (TFs) regulate the gene expression by binding to cis-regulatory boxes at promoter regions of the DNA. On the other hand, DNA mutations away from regulatory boxes may modulate transcriptional efficiency. Using coarse grain simulations we studied the dynamics of different promoter regions in the timescale of the microsecond. Simulations performed in absence/presence of the TATA binding protein modify the structure and dynamics of DNA up to ~12 base pairs far from the binding site, suggesting a cross talk between different regulation boxes within the promoter region.

Short Abstract: We have implemented a new software based on dynamic programming that uses a previously computed optimal superposition of two structures to generate a new optimal alignment and to calculate structural similarity values that can be used as standard to select the best structural alignment from a set of solutions. We show several examples of structural misalignments in catalytic domains of DNA polymerases identified by our software. Our software allows the identification of biologically relevant alignments from a set calculated with different structural alignment programs.

Short Abstract: The Enzyme Function Initiative aims at developing a large-scale strategy for function prediction. We present the work of the bioinformatics scientific cores (superfamily/genome and data/dissemination cores) of the strategy applied to one of five model systems: the isoprenoid synthase superfamily. All members of the superfamily conserve the catalytic machinery needed for carbocation formation, but each incorporates different mechanistic strategies generating 5 distinctive functional subgroups. We discuss the collection of sequence and associated functional data, the use of sequence similarity networks for visualization of entire superfamilies, target selection, data curation, annotation transfer, and data dissemination through our Structure-Function Linkage Database.

Short Abstract: We present a method for assisting virtual screening of drugs during the early stages of the drug development process. This methodology is proposed to improve the reliability of QSPR prediction. First, a transformation is sought for mapping a high-dimensional space defined by potentially redundant or irrelevant molecular descriptors into a low-dimensional target-related space. Second, we apply an applicability domain model on the low-dimensional space for assessing confidence of compound classification. By a probabilistic framework our approach identifies poorly represented compounds in the training set and space regions where the uncertainty about the predicted class is higher than normal.

Short Abstract: Anti-inflammatory drugs (NSAIDs) have been widely used to treat inflammatory processes. Due to the potential side effects NSAIDs might produce in parallel with the need for new safer and more efficient anti-inflammatory drugs, this research aimed to characterize in a molecular level, a new series of analgesic and anti-inflammatory compounds synthesized by our laboratory. The structural studies of the new series of NSAIDs were conducted by molecular docking with COX-1 and COX-2. Our docking studies have demonstrated that all the compounds are able to interact to both COXs with good score and appropriate conformation, thus showing promissory in silico activity.

Short Abstract: Integrins are membrane-spanning heterodimers composed of non covalently linked α and β subunits. Integrin-binding activity on adhesion proteins can be modulated by short synthetic peptides containing RGD, KTS or ECD motifs. As the integrin-mediated cell attachment influences and regulates migration, growth and apoptosis, small RGD/ECD-containig peptides can be used to probe integrin functions in various biological systems. Combining structural, “in silico”, “in vitro” and “in vivo” techniques, our group developed small peptide-based structures to target α6β1 integrins. Drug design based on these structures may provide new treatment possibilities for diseases such as metastasis and inflammation pathologies.

Short Abstract: In this work, a combination of classical molecular dynamics and hybrid quantum-classical methodologies was used in order to elucidate the reaction mechanism carried out by CYP121, a cytochrome p450 essential to life of Mycobacterium tuberculosis. This protein is highly interesting not only because it is essential for the viability of the bacilli, but also because there is evidence that it catalyzes an unusual reaction. Our results allow for a better understanding of this enzyme and for the general reaction mechanism of CYPs proteins which could be of great value for anti-tuberculosis drug design.

Short Abstract: Protein knots are intriguing structural motifs that have challenged both experimental and theoretical knowledge. We investigated thermodynamic and kinetic folding of the smallest knotted proteins known, VirC2 and MJ0366, from the ribbon-helix-helix (RHH) family of proteins, employing energy landscape theory and structure-based molecular dynamics using both coarse and all-atom graining. A preordered and looped, but unknotted, intermediate state is observed, and threading the loop by plugging or slipknotting is required to reach the knotted state. Moreover, we compared these results with Arc repressor, an unknotted dimer with similar architecture, showing that this topological constraint increases the free energy barrier severely‬.

Short Abstract: 11beta-hydroxysteroid dehydrogenase type 1 (11BHSD1) catalyzes the interconversion of inactive cortisone to active cortisol in a NADPH dependent manner within cells of key metabolic tissues. Excess cortisol elevates blood glucose levels, leading to insulin resistance and metabolic syndrome. Recently solved crystal structures of 11BHSD1 enzymes provide a source of structural information that can be used in virtual screening, a technique widely used in drug discovery projects. In this work, we report a combined shape-based database searching and fast rigid docking approach to identify potential human 11BHSD1 inhibitor candidate compounds from a the NCI Open library of compounds.

Short Abstract: The MHC genomic region in most species is extremely polymorphic. The distinct specificity of the majority of the MHC molecules remains uncharacterized. Here, we describe a tool to functionally cluster MHC class I molecules (MHCI) based on their predicted binding specificity. The tool provides highly intuitive heat-map and graphical tree-based visualizations of the functional relationship between MHCI variants. The method has a flexible web interface that allows the use to include any MHCI of interest in the analysis. When applied to the HLA-A and B system, the method reproduces the conventional 12 HLA supertypes. MHCcluster is available at www.cbs.dtu.dk/services/MHCcluster.

Short Abstract: It has been demonstrated for MHC class I that a pan-specific predictor can benefit from being trained on cross-loci data, however the polymorphism of α and β chains of MHC class II molecules complicates the development of pan-specific methods and limits their specificity to HLA-DR molecules. In this study, using the predictions for HLA-DR, we demonstrated the first steps towards the development of pan-specific methods for HLA-DP and HLA-DQ. We have shown how the pseudosequence, defining MHC binding environment, can be shortened in order to reduce the input space for prediction methods without loosing or even increasing binding prediction accuracy.

Short Abstract: The immunoglobulin superfamily (IgSF) is a large group of cell surface and soluble proteins that plays a key role in cell recognition, signaling and adhesion. We performed an IgSF-family wide prediction of new receptor-ligand relationships based on sequence homology to known IgSF receptor-ligand relationships. Our method measures sequence similarities via hidden Markov models (HMMs), and allows empirical information as part of the scoring scheme. We correctly predicted 40 out of 53 IgSF pairs that shared similar ligands from the STRING database. The method was then applied to 477 IgSF human proteins of interest, of which 380 IgSFs can be assigned.

Short Abstract: DNA damage and repair play a central role in aging and cancer. It is currently unknown, how damage detection proteins (DDPs) recognize different adducts embedded in the genome and how distinct proteins can detect the same adduct. Through a comparative analysis of the known three-dimensional structures of protein–damaged-DNA complexes and isolated damaged-DNA structures, we show that the minor groove is widened at the lesion point. In this position DDPs insert a residue in stacking geometry into the minor groove of DNA. These findings suggest a common DNA minor groove shape readout mechanism for damage detection.

Short Abstract: The increasing number of sequenced pathogen genomes provide great opportunities for the development of diagnostics, especially in the case of pathogens with complex genomes such as Trypanosoma cruzi. In this work we present a bioinformatic prioritization strategy to select peptides to be included in peptide microarrays, an excellent platform for large-scale screening of peptidic B-cell epitopes. The strategy integrates many feature predictors and experimental datasets, to propose candidate peptides for inclusion in the array. 200 candidate peptides were experimentally assayed against sera from patients, allowing the identification of 37 novel potential diagnostic epitopes.

Short Abstract: With recent advances in structural genomics, a growing need for functional annotation of newly solved crystal structures is needed. Here we present a method that combines sequence-based relationships and structural similarity of transcriptional regulators with computer prediction of their cognate DNA binding sequences. We applied this method to the AraC/XylS family of transcription factors, which is a large family of transcriptional regulators found in many bacteria controlling the expression of genes involved in diverse biological functions. Three putative new members with known three-dimensional structure but unknown function were identified for which a probable functional classification is provided.

Short Abstract: Domain architectures and catalytic functions of enzymes in metabolic systems are formulated as a two-layered network consisting of domains, proteins and reactions. We propose an algorithm to reconstruct the evolutionary history of domain-protein-reaction networks across multiple species and categorize the mechanisms of metabolic systems evolution. The reconstructed history reveals distinct patterns of evolutionary mechanisms between prokaryotic and eukaryotic networks. Moreover, different metabolic pathways are enriched with distinct network evolution mechanisms. Finally we elicit and validate two general principles underlying the evolution of domain-protein-reaction networks. These results shed new lights on the evolution of metabolic systems.

Short Abstract: We developed CPModule, a novel approach for CRM detection with a performance that is competitive to that of other state-of-art tools, but that in contrast to previous tools can handle much larger datasets (such as 100 sequences in combination with a library of 517 PWMs). The flexible framework underlying CPModule in combination with its exhaustive search strategy allows to explicitly compare the feasibility of CRM detection in the presence and absence of ChIP-derived information. We good show on a real dataset that without ChIP-based information, CRM detection becomes an almost infeasible task.

Short Abstract: Most genome projects lack financial support to review annotations. Usually annotations are based solely on sequence similarity to a previously known gene, which was probably annotated in the same way. A large number of predicted genes remain unassigned to any functional category despite there is enough evidence in the literature to identify their function. We developed a classifier trained with term-frequency vectors automatically disclosed from text corpora of genes representative of functional categories of the JCVI ontology. The classifier unambiguously (confidence ≥ 0.7) assigned to functional categories 5,235 (from ~24k genes previously unclassified) for which there is literature in MEDLINE.

Short Abstract: In this work we present Discriminative Local Subspaces (DLS), a novel supervised machine learning method designed to analyze gene expression data and predict new candidate genes associated to a biological process of interest. DLS uses the knowledge available in Gene Ontology (GO) to generate informative training sets that guide the discovery of expression signatures: expression patterns defined in subsets of experimental conditions that are distinctive to the process of interest. These signatures provide key information to make new functional connections and provide valuable insights to help biologists to understand the predictions and guide future experiments.

Short Abstract: By model-based analysis of gene expression time course data, we reconstructed the precise timeline of transcription of yeast ribosomal genes, spanning a 20 minute interval. The time of expression is related to position within the yeast ribosome: proteins localized deeper inside the ribosome are expressed earlier than the ones on the outside, which suggests that timing of expression is optimized to facilitate the assembly of the complex. The expression times are correlated with distance between the Rap1 motif and coding sequence, which implies involvement of RAP1p in regulating the expression time, via a previously unknown mode of regulation.

Short Abstract: Although the identification of protein interactions by high-throughput methods progresses at a fast pace, ‘interactome’ datasets still suffer from high rates of false positives and low coverage. To map the human protein interactome, we describe a new framework that utilizes experimental evidence on structural complexes, the atomic details of binding interfaces and evolutionary conservation. The structurally-inferred interaction network is highly modular and more functionally coherent compared to experimental interaction networks derived from multiple literature citations. Moreover, structurally-inferred and high confidence high-throughput networks complement each other well, allowing us to construct the merged network to generate testable hypotheses and provide valuable experimental leads.