Language Based Techniques for Systems Biology

1 Computer Science and Engineering, Department of Informatics and Mathematical Modeling, Technical University of Denmark2 Department of Informatics and Mathematical Modeling, Technical University of Denmark3 Language-Based Technology, Department of Informatics and Mathematical Modeling, Technical University of Denmark4 Department of Applied Mathematics and Computer Science, Technical University of Denmark

Abstract:

Process calculus is the common denominator for a class of compact, idealised, domain-specific formalisms normally associated with the study of reactive concurrent systems within Computer Science. With the rise of the interactioncentred science of Systems Biology a number of bio-inspired process calculi have similarly been used for the study of bio-chemical reactive systems. In this dissertation it is argued that techniques rooted in the theory and practice of programming languages, language based techniques if you will, constitute a strong basis for the investigation of models of biological systems as formalised in a process calculus. In particular it is argued that Static Program Analysis provides a useful approach to the study of qualitative properties of such models. In support of this claim a number of static program analyses are developed for Regev’s BioAmbients – a bio-inspired variant of Cardelli’s Ambient Calculus that incorporates all features of Milner’s π-calculus: The property of spatial reachability, which is related to the function of cellular transport mechanisms, is addressed by two traditional Control Flow Analyses (CFAs). The simpler of the two, a mono-variant analysis (0CFA), is context insensitive, while the other, a poly-variant analysis (2CFA), is context-sensitive. These analyses compute safe approximations to the set of spatial configurations that are reachable according to a given model. This is useful in the qualitative study of cellular self-organisation and, e.g., the effects of receptor defects or drug delivery mechanisms. The property of sequential realisability. which is closely related to the function of biochemical pathways, is addressed by a variant of traditional Data Flow Analysis (DFA). This so-called ‘Pathway Analysis’ computes safe approximations to the set of reaction sequences that is realisable according to given model. This is useful in the qualitative study of the metabolic pathways that emerge from a group of connected biochemical agents. Technically, these approaches are complementary, but the analyses all overapproximate the set of run-time enabled reactions. This is used in an iterative narrowing scheme that achieves considerable synergy between CFA and DFA, and dramatically improves the results of both. The specified analyses are proved correct with respect to the semantics of Bio-Ambients, and their strength is illustrated by application to abstract models of biological phenomena: One is a model of the LDL degradation pathway, where it is shown that the analyses are able to pinpoint the effects of certain genetic defects that are known to be associated to cardiovascular disease. The other is a model of genetic transcription that relies only on the π calculus fragment of BioAmbients. In both cases the analyses compute very precise estimates of the temporal structure of the underlying pathways; hence they are applicable across a family of widely used bio-ware languages that descend from Milner’s Calculus of Communicating Systems. The presented set of analyses constitutes a nice toolbox for the analysis of biological models. The individual tools range in complexity from low polynomial to exponential, while the precision scales similarly. Thus, the toolbox may provide useful information at all stages of a models lifetime, including development, where one is interested in frequent quick estimates, verification, and prediction, where one is willing to wait longer for more precise estimates.