Bioinformatics

Tabbed contents

The Bioinformatics Unit develops and establishes computer-aided methods for the identification and verification of new biomarkers for personalized diagnosis and prognosis of diseases as well as for the detection of novel therapeutic targets.

It has only been known for a few years that a multitude of RNA molecules are not translated into proteins. The latest scientific findings show that those non-coding RNAs (ncRNAs) perform fine-regulatory tasks in gene regulation and are therefore suitable as markers for individual disease stages as well as for disease progression. The unit develops strategies for the efficient processing and (statistical) analysis of molecular biological data that has been gained from extensive clinical cohorts based on next-generation sequencing, microarrays, as well as by DNA-, RNA-, and epigenetic analytics in order to detect disease relevant ncRNAs. The gene regulatory mechanisms of ncRNAs are modeled by using methods from systems biology and RNA computational biology.

Our objective is to analyze the potential of these innovative RNA molecules as biomarkers or therapeutic targets and to establish them as appropriate clinical markers or targets.

RNA biomarker discovery

The Bioinformatics Unit is a member of RIBOLUTION – an integrated platform for the identification and validation of innovative RNA-based biological markers for personalized medicine – a research association supported by the Fraunhofer-Zukunftsstiftung (Fraunhofer Future Foundation). We detect and establish RNA-based biological markers that are suitable as reliable indicators for a disease or its course. In this context, we are responsible for the storage, computer-aided processing and statistical analysis of the molecular-biological high-throughput data obtained by state-of-the-art measurement methods. The processes we implement cover the entire data life cycle in the biological marker discovery field, beginning with data creation, through primary and secondary analysis, up to medical knowledge generation. All software solutions have been implemented taking standards of quality managemnt into consideration. Access to a high-performance computing cluster ensures that computer-intensive solutions which have accrued because of the quantity and variety of data, can be efficiently realized.

Computational RNA biology

It has been known for a number of years that RNA molecules not only exclusively convey hereditary information of the DNA into amino acid sequences, but also perform extensive regulatory functions themselves. Non-protein coding RNAs are thereby subdivided into two rough groups, ncRNAs with a nucleotide sequence length of less than 200 nt (short ncRNAs) and the novel long ncRNAs, which have a sequence length of more than 200 nt. The gene regulatory mechanisms of the short ncRNAs, such as miRNAs and snoRNAs, are usually very well explained, while functions are only described exemplarily for the group of long ncRNAs. Studies on individual long ncRNAs have shown that they control central cellular processes such as transcription and translation. Furthermore, they are also involved in sub-cellular localization, in the organization of cellular spatial structures and in the control of epigenetic modifications. We and others were able to show that long ncRNAs in various tissues and signal pathways associated with disease are specifically regulated. Novel therapies based on long ncRNAs could then have specific impact and produce smaller side effects than traditional approaches. With methods from the RNA computational biology and systems biology, such as the prediction, modelling and classification of RNA secondary structure motifs, as well as by evolutionary and transcription studies, we address the topic of which gene regulatory mechanisms control cellular processes by long ncRNAs that have been identified as biomarkers, and to what extent these are suitable as therapeutic targets.

Optimization of the processing and analysis of sequencing data for routine clinical applications

Next-generation sequencing technologies produce genome- or transcriptome-wide data within days. This data is usually processed and analyzed by invoking a variety of bioinformatics software in sequential order. While the time required for data generation is reduced continuously due to enhanced sequencing methods, such optimizations have barely been achieved for data analysis. The effect on clinical routine applications is disadvantageous, because waiting times until therapy decisions are unnecessarily long. Our objective is to optimize the analysis of high-throughput sequencing data, such that it can be applied in clinical routine applications. Our in-house analysis pipeline meets the highest quality criteria because at all times it ensures the availability, integrity, confidentiality and authenticity of the data.

Selected completed projects

Development of custom expression microarrays for an efficient and cost-effective analysis of the tumor-associated expression pattern of long non-coding RNAs. With the aid of the custom expression microarrays, we could show that a multitude of long non-coded RNAs in the mammary carcinoma and glioblastoma are significantly regulated and are therefore suitable as biomarkers.

The analysis of transcriptome-wide expression studies showed that non-coding RNAs are not only specifically expressed, but are also to a larger extend than protein-coding genes specifically regulated by disease-relevant signal pathways.

We developed an algorithm (TileShuffle) for the efficient analysis of transcriptome-wide expression data measured by means of tiling arrays. -Using a permutation approach we were able to estimate the background signals more precisely with regard to probe-specific artefacts than other methods. We thus achieve a greater sensitivity at the same specificity.