Transcription

1 Molecular descriptors and chemometrics: a powerful combined tool for pharmaceutical, toxicological and environmental problems. Roberto Todeschini Milano Chemometrics and QSAR Research Group - Dept. of Environmental Sciences, University of Milano-Bicocca, P.za della Scienza Milano (Italy) The concept of molecular structure is one the most important concepts in the development of the scientific knowledge of the XX century. As a matter of fact the reasoning based on the molecular structure has been the main engine for the great development of physical chemistry, molecular physics, organic chemistry, quantum chemistry, chemical synthesis, polymer chemistry, medicinal chemistry, etc. By definition, a system is complex when its behaviour as a whole is not derivable from the properties of its parts: a molecule, together with its imbedded concept of molecular structure, exactly fulfils these conditions. In fact, molecule properties do not depend only on the properties of the component atoms but also on their mutual connections: it is in principle a holistic system, i.e. its emergent properties cannot be derived as the sum of the properties of its parts, but they are inherent to the whole molecule organisation and stability. As a consequence of its complexity, molecular structure cannot be represented by a unique formal model; several molecular representations can represent the same molecule, depending on the level of the underlying theoretical approach and these representations are often not derivable from each other. Figure 1 pag. 1

2 Different molecule representations were proposed, such as the 3-dimensional Euclidean representation, 2-dimensional representations based on the graph theory, or vectorial representations (fingerprints) where the frequencies of several molecular fragments are stored. Each representation constitutes a different conceptual model of the molecule and by each model different sources of chemical information become available. The molecule thought as a real object implicitly contains all the chemical information, but only a part of this information can be extracted by experimental measurements. Molecular descriptors are numbers able to extract small pieces of chemical information from the different molecule representations (Figure 1). The role of the molecular descriptors In the last decades, several scientific researches have been focussed on studying how to catch and convert by a theoretical pathway - the information encoded in the molecular structure into one or more numbers called molecular descriptors used to establish quantitative relationships between structures and properties, biological activities and other experimental properties. Figure 2 Therefore, molecular descriptors are now playing a key role in scientific research (Figure 2). In fact they are derived from several different theories, such as quantum chemistry, information theory, organic chemistry, graph theory, etc. and are applied in modelling several different properties in fields such as toxicology, analytical chemistry, physical chemistry, medicinal and pharmaceutical chemistry, environmental and toxicological studies and regulatory tools. Evidence of the interest of the scientific community in the molecular descriptors is provided by the huge number of descriptors proposed until today: more than 2000 of descriptors [1] are actually pag. 2

3 defined and computable by using dedicated software tools. Each molecular descriptor takes into account a small part of the whole chemical information contained into the real molecule and, as a consequence, the large number of descriptors is continuously increasing with the increasing of the complexity of the investigated chemical systems. By now molecular descriptors are became among the most important variables used in molecular modelling, and as a consequence of that they have a strong relationship with statistics, chemometrics and chemoinformatics. Statistics, chemometrics and chemoinformatics are the fields where methods for data elucidation, data mining and modelling are developed. In particular, chemometrics since about 30 years has developed several classification and regression methods able to provide although not always - reliable models, both in reproducing the known experimental data and in predicting the unknown data. In fact, the modelling process has usually not only explanatory purposes, but in particular in these last years predictive purposes, i.e. it aims at developing models with reliable predictive qualities. Figure 3 It has to be noted that the use of the molecular descriptors provided a big change in the scientific paradigm. In fact, while until 30 years ago molecular modelling mainly consisted in searching for mathematical relationships between experimentally measured quantities, now it is mainly performed modelling a measured property by the use of molecular descriptors able to catch structural chemical information (Figure 3). pag. 3

4 To explain the complex relationships between molecules and observed quantities, two main streams were developed, the first related to the search for relationships between molecular structures and physico-chemical properties (QSPR, Quantitative Structure-Property Relationships) and the second between molecular structures and biological activities (QSAR, Quantitative Structure-Activity Relationships). The successes reached by using these approaches have encouraged the scientific community to apply them in other fields and relationships between molecular descriptors and environmental, toxicological, and technological properties are now widely investigated (Table 1). Table 1 Can mathematical models replace experiments? Several people and I am among them foresee that in the future several quantities will be obtained by using predictive mathematical models, avoiding heavy and expensive experimental measurements. This extreme idea is supported by the successes of well established strategies for building regression and classification models, and by the capabilities of the developed models to suggest new active molecules (drug design) and new molecules with the required technological properties, to build priority lists of molecules for toxicological risk, to help in understanding molecule modes of action, to evaluate the environmental risks. In more details, chemometric models are empirical mathematical relationships (functions f) obtained between a dependent experimental property (a measure y or a membership to a predefined class c) and some independent variables which are related to and relevant for the studied property: 1, 2, p or 1, 2, p y f x x x c f x x x In this context, the independent variables x are molecular descriptors and the models are built to reproduce to the greatest extent the known experimental responses by searching for relationships with these theoretical variables (Figure 4, step 1). The validation tools developed in these last 20 years allow to evaluate not only the degree of agreement between the calculated and experimental responses, but also to estimate the future model capability to predict unknown responses. pag. 4

5 If the predictive quality of the model is considered satisfactory, the model can be used for real future predictions of the modelled property (Figure 4 step 2), resulting in a significant cost and animal testing reduction. In fact, the cost of the using the model as a predictive tool (second step) is almost zero (except the costs of the operator time and the current for the PC)! Figure 4 Conclusions We can say that chemical research based on chemometric modelling and molecular descriptors is by now well established and further relevant results are expected. These expectations are in fact well founded on the two basic principles that 1) biological activities, as well as physico-chemical and chemical properties of organic compounds, are related to the molecular structure and that 2) similar compounds behave in similar way. The reality is unique and, certainly, models are an approximation of the reality; however, different models represent different perspectives of the same reality and some of them might be useful. The fact that reality is represented by more than one model is often considered as a weakness of the theory due to our still limited knowledge: the scientific goal should be to reach a unique accepted model. This philosophical position is in my opinion mistaken: being a model only one possible interpretation of the reality, the availability of several models simply reflects the complexity of the studied systems where each model is able to catch a part of the whole information, i.e. it is a point of view: in summary, it is better to watch at a beautiful landscape by different windows. References 1. R. Todeschini and V. Consonni: Handbook of Molecular Descriptors, WILEY-VCH, pag. 5

STATISTICAL CONTRIBUTION TO THE VIRTUAL MULTICRITERIA OPTIMISATION OF COMBINATORIAL MOLECULES LIBRARIES AND TO THE VALIDATION AND APPLICATION OF QSAR MODELS CÉLINE LE BAILLY DE TILLEGHEM Institut de statistique

Computer aided drug design: Basic concepts The objective is to suggest new molecules that can be synthesised or purchased and tested for a desired property, using all available information (and computers).

REGULATIONS FOR THE DEGREE OF BACHELOR OF PHARMACY IN CHINESE MEDICINE [PART-TIME] (BPharm[ChinMed]) These regulations apply to students admitted to the first year of study in and after the academic year

STATISTICS and PROBABILITY Definition Statistics is the science and practice of developing human knowledge through the use of empirical data expressed in quantitative form. It is based on statistical theory

ABOUT THE CAMBRIDGE CRYSTALLOGRAPHIC DATA CENTRE (CCDC) The CCDC is the trusted research institution responsible for the 50-year old Cambridge Structural Database (CSD) and its applications. Used by thousands

CRIME SCENE FORENSICS Description Crime Scene Forensics, which is a laboratory-based course, will promote and cultivate the development of student s scientific inquiry and scientific method skills, which

Chemical Sciences Major Chemistry is often known as the central science because of the key position it occupies in modern science and engineering. Most phenomena in the biological and Earth sciences can

FACULTY OF VETERINARY MEDICINE UNIVERSITY OF CORDOBA (SPAIN) E CORDOBA01 LLP ERASMUS ECTS European Credit Transfer System Degree in Food Science and Technology FIRST YEAR FOOD AND CULTURE 980048 Core 1st

UNR Chemistry Courses Changes to implement the UCCSN Common Course Numbering Initiative Effective beginning with Summer 2003 term Old CHEM courses (2002-2003 catalog) 100 MOLECULES AND LIFE IN THE MODERN

1.1 The student is able to convert a data set from a table of numbers that reflect a change in the genetic makeup of a population over time and to apply mathematical methods and conceptual understandings

COURSE TITLE COURSE DESCRIPTION CH-00X CHEMISTRY EXIT INTERVIEW All graduating students are required to meet with their department chairperson/program director to finalize requirements for degree completion.

Chemistry 2007 Sample assessment instrument and indicative responses Extended experimental investigation This sample is intended to inform the design of assessment instruments in the senior phase of learning.

skills for drug discovery Why is pharmacology important?, the science underlying the interaction between chemicals and living systems, emerged as a distinct discipline allied to medicine in the mid-19th

SS PURE and APPLIED CHEMISTRY PURE and APPLIED CHEMISTRY Why is Chemistry Important? The challenge facing chemists is to understand how and why substances behave as they do through detecting molecules,

An exciting opportunity in scientific knowledge base development Lhasa Limited is a not-for-profit company and educational charity with an enviable reputation for collaboration. Its world leading Derek

These Standards describe what students who score in specific score ranges on the Science Test of ACT Explore, ACT Plan, and the ACT college readiness assessment are likely to know and be able to do. 13

COURSE TITLE AWARD BIOSCIENCES As a Biosciences undergraduate student at the University of Westminster, you will benefit from some of the best teaching and facilities available. Our courses combine lecture,

PSI AP Chemistry Activity Isotopes and Mass Spectrometry Why? In this activity we will address the questions: Are all atoms of an element identical and how do we know? How can data from mass spectrometry

Assessment Chapter Test B Chapter: Measurements and Calculations PART I In the space provided, write the letter of the term or phrase that best completes each statement or best answers each question. 1.

CPO Science and the NGSS It is no coincidence that the performance expectations in the Next Generation Science Standards (NGSS) are all action-based. The NGSS champion the idea that science content cannot

Construction of Three-Dimensional Pharmaceutical Structure Database and Its Posting to the Internet Yoichi Yamamoto, Kaori Kurata, Masaki Hamada and Akira Dobashi* Tokyo University of Pharmacy and Life

SPECIFICATIONS AND CONTROL TESTS ON THE FINISHED PRODUCT Guideline Title Specifications and Control Tests on the Finished Product Legislative basis Directive 75/318/EEC as amended Date of first adoption

Recent Developments in Chemoinformatics Education Val Gillet University of Sheffield Chemoinformatics as a Discipline Chemical Information Systems and Services have been established for many years Chemical

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

undergraduate School of Life Sciences Chemical and Forensic Sciences Why study Chemical and Forensic Sciences at Northumbria? The Department of Chemical and Forensic Sciences has a reputation for excellence

Bachelor of Science in Biochemistry and Molecular Biology Biochemistry is a subject in life sciences whose objective entails understanding the Molecular Basis of life in plants and animals. It is a multi-disciplinary

EUROPASS DIPLOMA SUPPLEMENT TITLE OF THE DIPLOMA (ES) Técnico Superior en Laboratorio de análisis y de control de calidad TRANSLATED TITLE OF THE DIPLOMA (EN) (1) Higher Technician in Analysis and Quality

1 11.1 Definitions and Motivation Lot of research and papers in this emerging field: Visual Analytics: Scope and Challenges of Keim et al. Illuminating the path of Thomas and Cook 2 11.1 Definitions and

CHEMISTRY (CHEM) Updated April 15, 2016 Note: The department/program code CHEM replaces the former code 08. Students cannot hold credit in CHEM-xxxx and the former 08.xxxx having the same course number

148 Bishop s University 2015/2016 Biochemistry The Biochemistry program at Bishop s is coordinated through an interdisciplinary committee of chemists, biochemists and biologists, providing students with

M.Sc. in Nano Technology with specialisation in Nano Biotechnology Nanotechnology is all about designing, fabricating and controlling materials, components and machinery with dimensions on the nanoscale,

Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

The Mole Concept Ron Robertson r2 c:\files\courses\1110-20\2010 final slides for web\mole concept.docx The Mole The mole is a unit of measurement equal to 6.022 x 10 23 things (to 4 sf) just like there

DIRECTIVE 65/65/EEC Council Directive 65/65/EEC of 26 January 1965 on the approximation of provisions laid down by law, regulation or administrative action relating to medicinal products (OJ L No 22 of

Evaluation of Quantitative Data (errors/statistical analysis/propagation of error) 1. INTRODUCTION Laboratory work in chemistry can be divided into the general categories of qualitative studies and quantitative

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS 1. The Technology Strategy sets out six areas where technological developments are required to push the frontiers of knowledge

VCU CHEMISTRY, BACHELOR OF SCIENCE (B.S.) WITH A CONCENTRATION IN BIOCHEMISTRY The curriculum in chemistry prepares students for graduate study in chemistry and related fields and for admission to schools

STATUS OF NUCLEAR TECHNOLOGY EDUCATION IN MONGOLIA General View of Education at the National University of Mongolia S.Davaa and G.Khuukhenkhuu Nuclear Research Centre, National University of Mongolia The

Before You Read Review scientific law Define the following terms describes a relationship in nature that is supported by many experiments theory an explanation supported by many experiments; is still subject

1. Introduction Collection Policy: Chemistry The Library supports the instructional and research needs of faculty and students working within the Department of Chemistry. Instruction in Chemistry has been

Edward Odenkirchen, Ph.D. Office of Pesticide Programs US Environmental Protection Agency 1 Mission Statement Best possible regulatory decisions to protect public health and environment. Rely on all best

Chapter 4: The Mole Atomic mass provides a means to count atoms by measuring the mass of a sample The periodic table on the inside cover of the text gives atomic masses of the elements The mass of an atom

is a knowledge based expert decision support tool for predicting the metabolic fate of chemicals in mammals. Reference 08/15 Features and Benefits Summary Efficient Metabolite Structure Elucidation Early

CHEMISTRY What can I do with this major? ANY CHEMISTRY DISCIPLINE Product Development Process Development Analysis Testing Biotechnology (using living organisms or cell processes to make useful products)