Mr.SymBioMath

Term: 2/2013 - 1/2017 (48 months)

Project full title:
High Performance, Cloud and Symbolic Computing in Big-Data problems applied to mathematical modeling of Comparative Genomics

Abstract:
Large scale genomics projects exploiting high throughput leading technology have produced and continue to produce massive data sets with exponential growing rates. So far, only a small part of this data can be abstracted, managed and processed, giving an incomplete understanding of the biological process being observed. The lack of processing power is a bottle neck in acquiring results. Comparative genomics is a good example since it includes all the ingredients: huge and ever growing datasets, complex applications that demands large computational resources and new mathematical and statistical models for analysing and synthetizing genomic information. A promising approach to address such massive data sets is the creation of new computer software that makes effective use of parallel processing. This proposal pursues the linking of different research domains to come up with a coordinated multi-disciplinary approach in the development of tools targeting Big-Data and computationally intensive scientific applications. Generic solutions for Big-Data storage, management, distribution, processing and final analysis will be developed. These solutions will target a broad range of scientific applications, in concrete, as proof-of-concept they will be implemented in the 'Comparative Genomics' field of bioinformatics and biomedical domains. Applications such as the detection of main evolutionary events, new comparative genomics' models that can be evaluated experimentally, for inter-species evolutionary distance, the composition of the k-mers dictionaries for each specie, or customising symbolic computing methods to determine the consensus tree from a sequence of trees with application in multiple sequence alignments, phylogenetic studies, clustering algorithms, etc. present in diverse fields of bioinformatics, from NGS-DNA assembly to gene-expression, all of them well suited applications to apply HPC-CC approaches and with high and attractive potential for commercialization.