Knowledge Modeling and its Application in Life Sciences: A Tale of two ontologies Bioinformatics for Glycan Expression Integrated Technology Resource for.

Similar presentations

Presentation on theme: "Knowledge Modeling and its Application in Life Sciences: A Tale of two ontologies Bioinformatics for Glycan Expression Integrated Technology Resource for."— Presentation transcript:

6
Challenge – model hundreds of thousands of complex carbohydrate entities But, the differences between the entities are small (E.g. just one component) How to model all the concepts but preclude redundancy ensure maintainability, scalability GlycO ontology

11
Compatibility with existing Biomedical ontologies Top level classes are modeled according to the Basic Formal Ontology (BFO) approach Taxonomy of relationships and multiple restrictions per class accuracy Hence, both GlycO and ProPreO are compatible with ontologies that follow BFO approach Exploring alignment with ontologies listed at Open Biomedical Ontologies (OBO)

13
Multiple data sources used in populating the ontology oKEGG - Kyoto Encyclopedia of Genes and Genomes oSWEETDB oCARBANK Database Each data source has different schema for storing data There is significant overlap of instances in the data sources Hence, entity disambiguation and a common representational format are needed GlycO population

21
Pathways do not need to be explicitly defined in GlycO. The residue-, glycan-, enzyme- and reaction descriptions contain all the knowledge necessary to infer pathways Glycan structure and function Biological pathways

22
The N-Glycan with KEGG ID 00015 is the substrate to the reaction R05987, which is catalyzed by an enzyme of the class EC 2.4.1.145. The product of this reaction is the Glycan with KEGG ID 00020. Reaction R05987 catalyzed by enzyme 2.4.1.145 adds_glycosyl_residue N-glycan_b-D-GlcpNAc_13 Zooming in a little….

24
Formalized domain knowledge is in ontologies Data is annotated using concepts from the ontologies Semantic annotations enable identification and extraction of relevant information Relationships allow discovery of knowledge that is implicit in the data Overview - integrated semantic information system

26
GlycO uses simple canonical entities to build complex structures thereby avoids redundancy ensures maintainability and scalability ProPreO is the first comprehensive ontology for data and process provenance in glycoproteomics Web process for entity disambiguation and common representational format populated ontology from disparate data sources The two ontologies are among the largest populated ontologies in life sciences Conclusions