Bioinformatics and Computational Biology

With thousands of genomes made available by next-generation sequencing technologies, one of the core challenges for bioinformaticians is how to analyze and compare them on a large scale. Within this context it is essential to develop efficient algorithms and tools that are capable of dealing with whole genomes representations as long sequences or huge sets of reads using appropriate data structures and combinatorial pattern matching techniques. Current research includes:

Design and development of alignment free techniques, in particular models where the biological variability is taken into account using approximate components

Design and development of algorithms based on efficient data structures to speed-up sequence analysis, and to deal with larger datasets

Design and development of tools for data analysis with applications to phylogenetics, metagenomics, and motif discovery

Theoretical studies of mathematical models, data structures and combinatorial properties of strings: the outcome of this more abstract research line allows to develop conceptual tools for sequence analysis that have potential application also on other several contexts (e.g. text analysis, time series analysis, social data analysis, etc.)

Modern sequencing technologies generate data more efficiently, economically, and with greater depth than previously possible. This has fostered a number of sequencing-based applications like genome re-sequencing, RNA-Seq, ChIP-Seq etc. However the data volume generated is growing at a pace that is now challenging the storage and data processing capacities of modern computer systems. In particular, core research activities in the field are:

Next-generation sequencing technologies allow the collection of massive amounts of genomic measurements, including somatic mutations, in large cohorts of cancer patients. The analysis of these massive amounts of data poses many computational challenges and requires the design of efficient and rigorous algorithmic techniques. We design efficient and mathematically well-founded computational and statistical methods to solve problems that arise in the analysis of large datasets from cancer studies, with a major focus on the identification of mutations and genomic features associated with the disease. Specific areas of investigation include:

Finding significantly mutated pathways: efficient methods to find groups of interacting genes that are significantly mutated in cancer using various computational techniques (e.g., analysis of large interaction networks, discovery of combinatorial patterns of mutations including exclusivity, etc.)

Discovery of mutations associated with clinical parameters: rigorous and efficient methods for the identification of groups of genes with mutations associated with survival time, drug response, etc.

Inference of cancer evolution from sequencing data: combinatorial and statistical methods for the reconstruction of cancer evolution from sequencing data (e.g., cross-sectional datasets or multiple samples from one patient)