Here, we report an update on the ECHA /OECD ontology aimed at standardizing and organizing the chemical toxicological databases in the OECD QSAR Toolbox, a free predictive toxicology software, which implements the OECD Harmnised Templates (HTs) as a standard data format. We use the DL species of the Web Ontology Language (OWL DL) supported by the Protégé OWL editor. The BioPortal import plugin and Excel Import plugin have been also used. At the beginning of the project several related resources has been identified, including already existing ontologies freely available at the Bioportal ontology depository (e.g. NCI Thesaurus, Clinical Terms Version 3, Mouse pathology and Mouse adult gross anatomy). The OECD harmonised templates available as xml schema have been used as the basis for the development of the ontologies.

The project started in 2012 with ontology development for Carcinogenicity, Repeated Dose Toxicity and Reproductive/Developmental Toxicity. The ontologies for Skin Irritation, Eye Irritation, Skin and Respiratory Sensitisation will be released by the end of 2013. Terms collection has been performed from the related recourses in order to define free text fields present in the harmonised templates and to collect as many synonyms as possible. Each entry of the experimental databases included in the Toolbox software has been associated with the ontology using the OWL hierarchy relationships and restriction rules. The complete conversion of the OECD HTs into the OWL semantic-based format is important for more effective experimental data mapping. To facilitate the use of OWL-based HTs we preserved all HT’s identifiers. The regulatory toxicology terminology does not meet completely the principle of naming conventions which is one of the OBO Foundry best practice principles. This point we would like to discuss in the future with ontology community experts. A final goal of the Project is the introduction of the ontology as the basis of data exchange and harmonization within the OECD QSAR Toolbox for better integration and standardization of experimental data.

This work is coordinated by ECHA and OECD and is funded by ECHA: “Multiple framework contract with re-opening of competition for the provision of scientific support services – ONTOLOGY” (ECHA/2011/25), Specific Contract 1 (ECHA/2011/125) and Specific Contract 2 (ECHA/2012/261).

According to the OECD definition, a chemical category is a group of chemicals whose physico-chemical and human health and/or ecotoxicological properties and/or environmental fate properties are likely to be similar or follow a regular pattern, usually as a result of structural similarity. In the context of the development of a new approach to categorize chemicals concerning their (sub)chronic toxicity, we established a bi-dimensional matrix using a multi-view clustering approach. One dimension captures the physical and chemical properties of the molecules, employing a graph-based clustering to group the data. The second dimension, using the toxicological profile, is organized into organ toxicity and within an organ split into subgroups according to similarities at the phenotypic and the mechanistic level. This present approach aims to combine toxicological with structural properties and as such is progress compared to the conventional structure-based approach where similarities are based on a common functional group, common constituents, and incremental and constant change across the category, the likelihood of common precursors and/or breakdown products. Two databases on repeated-dose toxicity are used in which chemicals differ in their physico-chemical properties enabling to cover a broader chemical domain. The number of chemicals used is 1383; the number of oral studies in rat is 2548. For the analysis, we use the following aspects: organ investigated, not investigated, no findings, findings; potency in terms of no observed adverse effect level (NOAEL), organ specificity. Several clustering methods have been tested, e.g., hierarchical clustering and k-means. For the multi-view clustering, a mixture of binomials EM and special k-means was explored. The development of a new approach to detect chemical categories is work in progress with stepwise improvements in the understanding of the possibilities and pitfalls. The presentation will reflect the current status of our study.

lazar (lazy structure–activity relationships) is a modular framework for predictive toxicology. Similar to the read across procedure in toxicological risk assessment, lazar creates local QSAR (quantitative structure–activity relationship) models for each compound to be predicted. Model developers can choose between a large variety of algorithms for descriptor calculation and selection, chemical similarity indices, and model building. We provide a high level description of the lazar framework and discuss the performance of example classification and regression models.

It was very recently when fractional-order calculus was introduced in pharmacokinetic modelling after the prime reasons for the failure of the classical In Vivo-In Vitro Correlation (IVIVC) theory were elucidated. Fractional dynamics can be cast as Physiologically Based Pharmacokinetic Models (PBPK) where the mass balance equations are rewritten using fractional-order derivatives. This offers a mechanistic understanding of the interplay among the main factors of drug distribution, enables us to draw individualised concentration-time profiles and study drug-drug interactions using the fractional calculus approach. Unlike conventional dynamical systems, fractional differential equations can describe highly complex dynamics and processes with possibly infinite memory. It turns out the the diffusion of drugs into the capillaries' microcirculation is governed by such fractional differential equations. The parameters of fractional equations can either be determined from experimental data, when available, or inferred using QSAR methods.In this study we present a methodology for the design and tuning of a fractional-order PID (fPID) controller for the intravenous administration of Amiodarone - an anti-arrhythmic drug - and we demonstrate the advantages of the use of a controller of this class. The introduction of fractional control in pharmacokinetics paves the way towards accurate and effective drug administration and raises new challenges for control theory and QSAR modelling.

The cytotoxic effect of many drugs, e.g. paracetamol and diclofenac, is mediated by the induction of oxidative stress leading to the accumulation of reactive oxygen species (ROS) in mitochondria. For moderate ROS levels, oxidative stress can be handled by the Nrf2 pathway which regulates the inducible expression of several cytoprotective genes. When these cytoprotective pathways are overcome by increasing levels of ROS, mitochondrial membrane alterations can occur leading to the release of cytochrome c and thereby driving apoptosis and necrosis. Here, we present our recent advances in modeling Nrf2 signaling, apoptosis and necrosis by ordinary differential equations. Since many cytokine/ligand driven pathways, e.g. NFkB, MAPK, PI3K, interfere with the apoptosis/necrosis pathway, we established a coarse-grained mathematical model to include cytokine input as concomitant factor for cell death. We calibrate the model based on immunoblot and single cell fluorescence microscopy data from HepG2 cells and primary hepatocytes. The data are complemented by end-point measurements for apoptosis and proliferation. Finally, the calibrated mathematical model is employed to investigate the interplay between cytokine exposure and cytotoxicity. The project is still in an early phase. Data generation, model calibration and model refinement is continuously carried on in collaboration between our experimental and theoretical groups.

Ambit-Tautomer [1] is an open source Java library for automatic generation of all tautomers of a given chemical compound. It is implemented on top of the Chemistry Development Kit (CDK) [2]. The system includes three main algorithms: pure combinatorial method, improved combinatorial method and incremental algorithm. The tautomer generator uses a set of predefined, but customizable rules. The rules are defined by Daylight SMILES/SMARTS line notations and support the basic types of tautomerism (1-3, 1-5 and 1-7 proton tautomer shifts). The pure combinatorial method generates all tautomeric forms considering all possible combinations of the matched rule states. The improved combinatorial method uses sub-combinations based on rules clustering. The incremental algorithm applies depth-first search to handle sophisticated cases of overlapping rules. Additionally, rule pre-filtering and tautomer post-filtering are applied for fine tuning of the generation process. The tautomer generator implements tautomer ranking based on empirical rules defined in terms of relative energy difference. Ambit-Tautomer library is applied to improve the Ambit database storage of chemical structures and accordingly to implement search procedures which take into account the tautomerism information. Also the tautomer sets are used to calculate modified values of the original molecular descriptors in order to improve existing QSAR/QSPR models. Ambit-Tautomer module is implemented as open source Java package as part of the Ambit open source software for chemoinformatics and data management [3,4] and is available as a Java library, command line application [5] and OpenTox Algorithm API compatible Web service [6]. Ambit package is available as online web services and as a downloadable application. A web page providing online tautomer generation by Ambit-Tautomer and several different software packages is available on http://apps.ideaconsult.net:8080/ambit2/depict/tautomer.

Ambit-SMIRKS is a new extension of the Ambit-SMARTS Java library [1], both part of the Ambit2 [2] project. The modules are implemented on top of the Chemistry Development Kit (CDK) [3]. Ambit-SMIRKS performs two main tasks: (1) parsing of SMIRKS linear notations into internal reaction (transformation) representations based on CDK objects and (2) application of the stored reactions against target molecules for actual transformation of the target chemical objects. The transformations can be applied on various sites of the target molecule in several modes: single, non-overlapping, non-identical, non-homomorphic or externally specified list of sites. Ambit-SMARTS implements the entire SMARTS language specification as defined by Daylight, plus additional syntax extensions to make software compliant with SMARTS modifications made by third party software packages such as OpenEye, MOE and OpenBabel. The SMIRKS library utilizes the SMARTS parser and the efficient substructure searching algorithm implemented within Ambit-SMARTS package [1]. Typically most SMIRKS implementations support SMILES plus simple SMARTS syntax features. However, Ambit-SMIRKS module supports full SMARTS syntax for reactions specification. The SMIRKS module is used to enable metabolite predictions in Toxtree (since version 2.5.0) [4], once the site of metabolism is predicted by SMARTCyp [5]. Toxtree is a flexible and user-friendly open-source application that predicts various kinds of toxic effects, mostly by applying structural alerts, arranged in a decision tree fashion. SMARTCyp (Cytochrome P450-Mediated Drug Metabolism) model is originally developed by Patrik Rydberg et al [5] and was included as Toxtree module since Toxtree 2.1.0. Ambit-SMIRKS functionality is available as a Java library as well as OpenTox Algorithm API compatible Web service. We also welcome testing the SMIRKS at the web page http://apps.ideaconsult.net:8080/ambit2/depict/reaction.

8. Linear support vector machines and heat map molecule coloring – an interpretable approach for the in silico prediction of toxicity and mutagenicity, Lars Rosenbaum and Alexander Dörr(University of Tübingen)

A central task in the field of cheminformatics is the development of machine learning approaches which are able to filter out compounds that will likely fail in the later stages of the drug discovery pipeline because of low affinity or undesired properties. Important properties, which might lead to failure in clinical studies, are toxicity or mutagenicity. Nonlinear machine learning algorithms, such as support vector machines (SVMs) and Gaussian processes, have proven to be valuable tools for the in silico prediction of the outcome of an Ames mutagenicity or chromosome aberration test. While such algorithms provide high quality predictions, for a user it is often hard to get insight into the reasons behind a prediction. However, the reasons for a certain prediction can help a medicinal chemist in the optimization of a lead compound. Thus, besides a strong performance, the interpretability of a machine learning approach is a desired property. Here, we present an interpretable machine learning approach [1], which is based on a linear classification SVM [2] combined with sparse molecular fingerprint encodings [3]. In contrast to their nonlinear counterpart, linear SVMs are readily interpretable because they do not perform a nonlinear mapping from the input space to a high-dimensional feature space. However, the restriction to the linear case means that only linear separable problems can be learned. This problem can usually be circumvented with sparse and high dimensional input features. Sparse chemical fingerprints, like extended-connectivity fingerprints (ECFPs) or MOLPRINT, result in such sparse and high dimensional encodings. The approach exhibits a convincing performance on large data sets from assay outcomes. Linear SVMs learn a linear discriminant function, which assigns a weight to each fingerprint feature of the input space. Although the models are not quantitative, the weight of a fingerprint feature should reflect its contribution to the toxicity or mutagenicity of a molecule. If the molecular encoding fulfills certain requirements, the weight can be mapped back to the atoms or bonds which a fingerprint feature represents. Consequently, each atom or bond obtains a score that represents its contribution to toxicity or mutagenicity. Based on the scores, a color on a heat map like color gradient is assigned to each atom or bond. The heat map coloring of a compound enables a chemical expert to identify toxic or mutagenic substructures. Both the linear SVM (www.cogsys.cs.uni-tuebingen.de/software/ChemLL) and the fingerprint encodings (jcompoundmapper.sourceforge.net) are available as open source software. Furthermore, an open source implementation of the visualization approach based on the chemical structure viewer JMol (jmol.sourceforge.net) is available.

Current methods and systems for predicting biodegradation products and pathways suffer from a common problem in bio- and cheminformatics: the tools are designed for their sole purpose and rarely interact well with other tools. The output often has to be parsed and formatted into another format. In this presentation, we present a new implementation of the University of Minnesota Pathway Prediction System (UM-PPS). The UM-PPS predicts the environmental fate of pollutants by applying transformation rules to structures and to generate biodegradation pathways. The new implementation extends the simple approach of predicting a pathway by providing its service as a REST webservice. This provides a wide range of new potential functions. Generated pathways can be stored and accessed using the URI, and be edited using REST calls. Degradation products as well as reactions are accessible via their corresponding URIs. Hence, the degradation products can be directly sent to OpenTox webservices to predict the toxicity of the products, which is an important functionality for environmental scientists, whose task is to identify the hazard of chemicals and their degradation products to the environment. Currently, a base version of the UM-PPS is implemented, supporting all functionality of the previous version combined with a REST interface. In the future, we will include direct support of toxicity prediction using the OpenTox framework (currently, the toxicity can be predicted using external calls of OpenTox services with the output of UM-PPS as input), as well as combination with other services like mass spectrometry databases to identify known products in the biodegradation pathway. Furthermore, we will include models to predict a probability for the product to limit the combinatorial explosion of the pathways caused by the transformation rules (transformation rules tend to be overly general and predict a too large number of transformation products, only a small fraction of them occur in nature). The new architecture of the system provides a vast range of possibilities for new functionality that can be gained via combination of external webservices.

Drugs from a range of therapeutic capacities have shown adverse cardiotoxic side effects in human leading to exorbitant withdrawal of several drugs from the market. Current in vitro and in vivo models have been found insufficient to predict cardio toxicity. There is a need to develop more relevant and predictive preclinical in vitro screening models and tools for cardio toxicity testing. In this study, the pure population (>98%) of human iPS cell derived Cardiomyocytes (hiPS-CM ) were repeatedly treated with sub lethal doses of doxorubicin and subjected to functional readouts using the xCELLigence impedance system. Experimental cells and media samples were collected at different time points for transcriptomics and metabolomics studies. We observed that repeated sub lethal doses of doxorubicin 150nM in long-term induced arrhythmicity in the beating rate of Cardiomyocytes. This observation was accompanied by transcriptomics data showing significant deregulation of some genes involved in metabolic processes and cardiac contraction. Metabolomics analysis suggests significant alteration in mitochondrial metabolism following repeated doxorubicin treatments.

Pharmacokinetics and therapeutic effects of drugs are largely affected by the process of drug metabolism. The most important role in this process is attributed to the Cytochrome P450 superfamily of enzymes. In particular, the most prominent enzymes from this family are CYP3A4, CYP2D6 and CYP2C9, which are included in nearly 70% of phase I drug metabolism in humans [1], [2]. Considering these reasons, it is important to distinguish substrates and inhibitors from nonsubstrates and noninhibitors of CYP3A4, CYP2D6 and CYP2C9 isoenzymes early in the drug discovery process. Multi-target drug design is a new, promising approach in treating diseases. Drugs that target multiple proteins at once have a number of advantages over conventional, highly selective drugs [3]. Multi-target drug design requires multi-target quantitative structure-activity relationship (MT-QSAR) models. MT-QSAR models are able to predict multiple targets simultaneously and are a fast and cost-efficient tool for detecting multi-target drugs from large compound libraries. In this work, we aim to explore the value of multi-target models for predicting substrates and inhibitors of CYP450 enzymes, and compare these models to single-target models, built separately for each of the targets.

In this work we collected 700 compounds from the work of Yap and Chen [2], annotated as substrates/nonsubstrates and inhibitors/noninhibitors of CYP3A4, CYP2D6 and CYP2C9. We represented the molecules by 176 2D molecular descriptors calculated with the Chemistry Development Kit (CDK) software. To build predictive models, we used Predictive Clustering Trees (PCTs), specialized for predicting structured outputs, e.g., predicting multiple targets simultaneously [4]. We construct and compare two types of models: single-target classification trees and multi-target classification trees. Our results show that, by using multi-target PCTs we can achieve predictive accuracy better than or similar to the one of single-target trees. At the same time, multi-target trees provide better comprehensibility. Namely, the multi-target PCTs have considerably lower complexity as compared to the single-target trees, which allows for easier interpretability of the models.

The ToxBank database has established several resources for the SEURAT-1 partners in a single location: ‘a dedicated web-based warehouse for toxicity data management and modelling, a "gold standards" compound database and repository of selected test compounds, and a reference resource for cells, cell lines and tissues of relevance to ‘in vitro systemic toxicity research’. This poster will focus on the biomaterial data using a wiki system and a catalogue of best practice documents both linked to the main Toxbank database. The wiki contains detailed descriptions of commercially available reagents and sources of cell lines with a direct link to the European database of hPSC lines (http://www.hescreg.eu/). How this data is managed will also be outlined in this poster from the perspective of best practices with respect to information requirements, handling quality, security, and privacy issues.

One of the challenges always remaining is the visualization and navigation in the “chemical space”, in a way to get a quick and broad view of its content. One approach to address this problem is using the concept of multidimensional property spaces in which the dimensions are assigned to selected numerical descriptors of molecular structure [1]. Later principal component analysis (PCA) can be used to project this multidimensional property space in a lower dimensionality space, typically a 2D- or 3D-space which can be visualized.

Here we report the development of the MQN-mapplet which is a Java application giving interactive access to the structure of small molecules in large databases via color-coded maps of their chemical space. These maps are projections from a 42-dimensional property space defined by 42 integer value descriptors called molecular quantum numbers (MQN) [2]. MQN counts different types of atoms, bonds, rings, polar groups and topological features in molecules and by doing so, it categorizes the molecules by size, rigidity, and polarity and not by the substructure. Despite its simplicity, MQN-space is relevant to biological activities. In contrast to other database browsing sites, one can start the exploration of chemical space with MQN-Mapplet without using any query molecule. However the option is provided to locate the molecule/query of user interest on the map, which can be further used as point of origin for exploration of chemical space. While navigating in the chemical space, the user can have a access to most of the available structural information. Additionally MQN-Mapplet allows the identification of analogs as neighbors on the MQN-map or in the original 42-dimensional MQN-space. To our knowledge, this type of interactive exploration tool is unprecedented for very large databases such as PubChem and GDB-13 (almost one billion molecules). The application is freely available for download at www.gdb.unibe.ch.

The natural organic compound Eucalyptol, also known as 1,8-Cineol, is a major constituent of Eucalyptus oil which is used as flavor and fragrance in food as well as in medical products like ointments due to its anti-inflammatory properties. Despite its widespread use, little is known about the genotoxic potential of Eucalyptol. The aim of this study was to investigate the genotoxicity and cytotoxicity of Eucalyptol on HCT116 epithelial cells. We observed a significant concentration-dependent increase of oxidative DNA damage starting at 5 µM Eucalyptol in HCT-116 colorectal cancer cells, as detected by the Formamidopyrimidine-Glycosylase (FPG)-modified alkaline comet assay. Pre-treatment of cells with the antioxidant N-acetylcysteine for one hour prevented the formation of FPG-sensitive sites after Eucalyptol treatment, supporting the notion that Eucalyptol induces oxidative DNA damage. Further experiments did not show an influence of Eucalyptol on the viability of the colorectal cancer cell lines HCT-116, HT-29, and CaCO-2 as attested by the MTS assay. In line with these findings, Eucalyptol did not affect cell cycle distribution in HCT-116 cells up to a concentration of 2 mM, which suggests that cells may cope with Eucalyptol-induced damage by DNA repair. To examine this hypothesis, we performed the viability assay in two DNA repair-deficient hamster cell lines, VC8 and RAD51D1, with defects in BRCA2 and Rad51, two essentials players of homologous recombination (HR), and in respective control cell lines. In contrast to the wild-type cell lines with functional HR, we detected a concentration-dependent decrease of cell viability at a concentration of 1 mM Eucalyptol in the HR-defective VC8 and RAD51D1 cells. Based on these findings we conclude that Eucalyptol induces oxidative DNA damage, which is not associated with subsequent cell death. However, cells with deficiency in HR displayed an enhanced sensitivity, suggesting that high Eucalyptol doses may generate persistent oxidative DNA damage that could bear mutagenic potential.

The data from the EPA ToxCast program has been the subject of active discussion in the literature recently. On the poster, we will present the results of large-scale computational experiments and discuss the quality and predictability of the data from a statistical point of view. We will show that, using so-called multi-label classification, dependencies among toxic effects in the ToxCast data set can be exploited successfully. A filtering step by an internal leave-one-out cross-validation filters those endpoints that can be predicted worse than by random guessing and additionally those that do not benefit from a joint prediction together with other endpoints by multi-label classification. As a result of our experiments, we obtain a list of in vivo endpoints that can be predicted with some confidence and a set of related endpoints that benefit from a concerted prediction.