B.I.S. dato

Dienstag, 6. Juni 2017

It has been the 10th year that I work in the field of computational biology. And probably it is the right time to ask myself: where should my research next go?

Looking Back

I started in 2007 developing software modules and pipelines to allow quantitative analysis of biological systems. Together with Stefan Wiemann, my Ph.D. supervise, I developed the KEGGgraph software to translate biological pathways in KEGG, previously mostly used visually, into graph models that can be analyzed formally. That led to the very first peer-review publication of mine in the field.

Following that, I spent three years understanding how microRNA regulate gene expression in human breast cancer and gastrointestinal tumor. There I had the opportunity to work with outstanding colleagues like Florian Haller, Stefan Uhlmann, Heiko Mannsperger, Özgür Sahin, Agnes Hovrat, Katherina Zweig, etc to study microRNAs using latest technologies such as reverse phase protein arrays and network analysis. That's the time when I was fascinated by systems biology.

In 2011 I joined Roche. I am fortunate to work with Clemens Broger and Martin Ebeling and spend my time, besides regular project support activities, on large-scale data analysis of gene expression data and on development of novel platforms to support early drug discovery. In 2014, we characterized an early induced network of four genes that are predictive of toxicity in vitro an in vivo by mining the TG-GATEs database. Early this year, we published a manuscript describing the BioQC software, which detects tissue heterogeneity in gene expression data using knowledge derived from a compendium of gene expression profiles that we collected. A few weeks ago, together with Faye Drawnel, Martin Ebeling, and Marco Prunotto, we published the proof-of-concept study of molecular phenotpying and its application in early drug discovery. The results suggest that by integrating molecular phenotyping, i.e. digital quantification of pre-selected pathway reporter genes shortly after compound perturbation, we can gain insights into both pathways that are associated with disease-relevant phenotype as well as compounds that induce desired phenotypic changes.

Looking Forward

What comes next? I only have a few vague ideas and am open to more new ones

How to build software for data integration and interpretation in order to empower both disease understanding and drug discovery? In particular, how can we systematically and formally integrate genomic, transcriptomic, genomic, proteomic, and chemoinformatic data to inform the drug discovery process?

How to formally generate and test hypothesis about genetic and pharmacological perturbation in silico?

How to utilize single-cell and single-mutation level information for drug discovery?

I sense there is tension between the ever-increasing amount of information that is available to us and the limited time to digest them and to connect between them. In addition, project support activities and research into the questions, which in ideal cases do not conflict with but rather benefit from each other, need constant balancing. As Yuri Lazebnik put it in his legendary essayCan a biologist fix a radio?—Or, what I learned while studying apoptosis, it's time to make good tools and to keep your mind clear under adverse circumstances.

Dienstag, 9. Mai 2017

Thanks to the analysis of an interesting dataset I discovered the drc package for Dose-Response Analysis using R (Ritz et al, PLOS One 2015, https://doi.org/10.1371/journal.pone.0146021).

It comes with a very powerful optimiser for common models such as logistic function (or Hill function). Compared with the native R implementations using the nls function and self start models such as SSfpl, the drc package is far more reliable and robust: both initial parameter estimation and optimisation run without errors due to singularity : at least in the ~3,000 datasets that I tried, on which drc reported no single mistake, whereas nls failed as much as 600 cases despite of manually setting starting parameters with educated guess.

I still have to understand how the package achieves such good performance. However I am already very glad that we have finally a robust and reliable optimiser for curve fitting, which is a common task in computational biology and bioinformatics.

Figure: a 4-parameter logistic fit done and plotted with drc.

P.S. During the try-and-error process, I also accidentally found a website that is quite robust with regard to curve fitting: https://www.mycurvefit.com. Though I will not use it since I need programmatic access to the fitting capacity, the website's fitting function is quite robust in my opinion, at least better than the few examples that failed nls.

I thought about when to use R, python, and C/C++ appropriately and most effectively. I think R is very good at prototyping tools combining statistics and visualization. python is an excellent generic scripting language that has a large code base. C/C++, being quite complex but efficient and powerful, remains my choice when it comes to optimize performance.

My colleague Nikolaus Berntenis let me know about Paintomics, developed by another colleague Fernando Garcia-Alcade and his group. The web tool seems to be able to visualize multi-omics datasets using KEGG graphics.

Montag, 7. April 2014

Dear R users, here I report a curious case of "cyclic namespace dependency error" and its solution, in case you meet the same trouble that confused me a lot.

In my case, I accidentally created a S4-method and another normal function with the same name. The package could be installed. However, as I tried to load the package, it prints the following error message:

For a minimal piece of code see https://gist.github.com/anonymous/10018579. When the file is put in a R file in the R package, and in NAMESPACE "export(myMethod)" is specified, the checking or loading of the package will fail due to the "cyclic namespace dependency".

The error message is unfortunately not particularly helpful. Normally it would point to a problem of reciprocal dependency, but in this case it is only caused by a name that is given to both a S4 method and a normal function.

It is straightforward to solve the issue: check carefully whether you have duplicated function/method names and fix them if there are any.