TY - CONF
T1 - Ontology Driven Semantic Provenance for Heterogeneous Bionomics Experimental Data
Y1 - 2008
A1 - Michael Raymer
A1 - Satya S. Sahoo
A1 - Cory Henson
A1 - William York
A1 - Amit Sheth
AB - Scientific experimental data generated by all the bionomic technologies is characterized by heterogeneity in its representation formats, constituents, and generation processes and, therefore, also in its usage. Using the proteomics domain we demonstrate the important role of provenance information o manage, interpret and analyze experimental data. We present a novel approach that employs an ontology as a knowledge model to automatically create semantic provenance information for high-throughput mass spectrometry (MS) data in the glycoproteomics domain. The Semantic Provenance Annotation of Data in protEomics (SPADE) implementation is based on the ProPreO ontology, a large-process ontology ( ~500 classes, 40 named relationships with 170 class-level restrictions, and 3.1 million instances) that models the complete experimental protocol for MS-based glycoproteomics data analysis. The semantic provenance information created in SPADE enables biologists to query over the semantic provenance information and retrieve exact data using 'train-of-thought' expressive queries in SPARQL query language. We also discuss our current work in extending the ProPreO ontology to support toxicological metabolomics experimentation using Nuclear Magnetic Resonance (NMR) spectroscopy. Our strategic goal is to use Semantic Provenance information by pattern recognition and data mining algorithms for comparative or correlation analysis of Liquid Chromatography MS (LCMS) and NMR spectroscopy experimental data as part of toxicological metabolomics studies.
ER -
TY - CHAP
T1 - Semantic Biological Web Services Registry
Y1 - 2007
A1 - Amit Sheth
A1 - Satya S. Sahoo
A1 - B. Hunter
A1 - William York
AB - There are now more than a thousand Web Services [22] offering access to disparate biological resources namely data and computational tools. It is extremely difficult for biological researchers to search in a Web Services (WS) registry for a relevant WS using the standard (primarily computational) descriptions used to describe it. Semantic Biological Web Services Registry (SemBOWSER) is an ontology-based implementation of the UDDI specification, which enables, at present, glycoproteomics researchers to publish, search and discover WS using semantic, service-level, descriptive domain keywords . SemBOWSER classifies a WS along two dimensions- th task they implement and the domain they are associated with. Each published WS is associated with the relevant ProPreO (comprehensive process ontology for glycoproteomics experimental lifecycle) ontology-based kej^words (implemented as part of the registry). A researcher, in turn, can search for relevant WS using only the descriptive kej^words, part of their everyday working lexicon. This intuitive search is underpinned by the ProPreO ontology, thereby making use of the inherent advantages of a semantic search, as compared to a purely syntactic search, namely disambiguation and use of named relationships between concepts. SemBOWSER is part of the glycoproteomics web portal 'Stargate'.
ER -
TY - CHAP
T1 - Sembowser - Semantic Biological Web Services Registry
Y1 - 2007
A1 - William York
A1 - Satya S. Sahoo
A1 - Amit Sheth
A1 - B. Hunter
AB - There are now more than a thousand Web Services [22] offering access to disparate biological resources namely data and computational tools. It is extremely difficult for biological researchers to search in a Web Services (WS) registry for a relevant WS using the standard (primarily computational) descriptions used to describe it. Semantic Biological Web Services Registry (SemBOWSER) is an ontology-based implementation of the UDDI specification, which enables, at present, glycoproteomics researchers to publish, search and discover WS using semantic, service-level, descriptive domain keywords . SemBOWSER classifies a WS along two dimensions- th task they implement and the domain they are associated with. Each published WS is associated with the relevant ProPreO (comprehensive process ontology for glycoproteomics experimental lifecycle) ontology-based kej^words (implemented as part of the registry). A researcher, in turn, can search for relevant WS using only the descriptive kej^words, part of their everyday working lexicon. This intuitive search is underpinned by the ProPreO ontology, thereby making use of the inherent advantages of a semantic search, as compared to a purely syntactic search, namely disambiguation and use of named relationships between concepts. SemBOWSER is part of the glycoproteomics web portal 'Stargate'.
ER -
TY - CONF
T1 - Knowledge Modeling and Its Application in Life Sciences: A Tale of Two Ontologies
T2 - 15th International World Wide Web Conference (WWW2006)
Y1 - 2006
A1 - Christopher Thomas
A1 - Samir Tartir
A1 - Satya S. Sahoo
A1 - Amit Sheth
A1 - William York
AB - High throughput glycoproteomics, similar to genomics and proteomics, involves extremely large volumes of distributed, heterogeneous data as a basis for identification and quantification of a structurally diverse collection of biomolecules. The ability to share, compare, query for and most critically correlate datasets using the native biological relationships are some of the challenges being faced by glycobiology researchers. As a solution for these challenges, we are building a semantic structure, using a suite of ontologies, which supports management of data and information at each step of the experimental lifecycle. This framework will enable researchers to leverage the large scale of glycoproteomics data to their benefit. In this paper, we focus on the design of these biological ontology schemas with an emphasis on relationships between biological concepts, on the use of novel approaches to populate these complex ontologies including integrating extremely large datasets (~500MB) as part of the instance base and on the evaluation of ontologies using OntoQA [38] metrics. The application of these ontologies in providing informatics solutions, for high throughput glycoproteomics experimental domain, is also discussed. We present our experience as a use case of developing two ontologies in one domain, to be part of a set of use cases, which are used in the development of an emergent framework for building and deploying biological ontologies.
JA - 15th International World Wide Web Conference (WWW2006)
CY - Edinburgh, Scotland
ER -
TY - CONF
T1 - Modular Ontology Design Using Canonical Building Blocks in the Biochemistry Domain
T2 - Modular Ontology Design Using Canonical Building Blocks in the Biochemistry Domain
Y1 - 2006
A1 - Amit Sheth
A1 - William York
A1 - Christopher Thomas
AB - The field of BioInformatics has become a major venue for the development and application of computational ontologies. Ranging from controlled vocabularies to annotation of experimental data to reasoning tasks, BioOntologies are advancing to form a comprehensive knowledge foundation in this field. With the Glycomics Ontology (GlycO), we are aiming at providing both a sufficiently large knowledge base and a schema that allows classification of and reasoning about the concepts we expect to encounter in the glycoproteomics field. The schema exploits the expressiveness of OWL-DL to place restrictions on relationships, thus making it suitable to be used as a means to classify new instance data. On the instance level, the knowledge is modularized to address granularity issues regularly found in ontology design. Larger structures are semantically composed from smaller canonical building blocks. The information needed to populate the knowledge base is automatically extracted from several partially overlapping sources. In order to avoid multiple entries, transformation and disambiguation techniques are applied. An intelligent search is then used to identify the individual building blocks that model the larger chemical structures. To ensure ontological soundness, GlycO has been annotated with OntoClean properties and evaluated with respect to those. In order to facilitate its use in conjunction with other biomedical Ontologies, GlycO has been checked for NCBO compliance and has been submitted to the OBO website.
JA - Modular Ontology Design Using Canonical Building Blocks in the Biochemistry Domain
CY - Amsterdam
ER -
TY - CHAP
T1 - SemBOWSER - Semantic Biological Web Services Registry
Y1 - 2006
A1 - William York
A1 - Amit Sheth
A1 - B. Hunter
A1 - Satya S. Sahoo
KW - biomedical glycomics
KW - ProPreO ontology
KW - Semantic Web Services
KW - SemBOWSER registry
KW - service-level semantic annotation
KW - Web services registry
KW - WSDL-S
AB - There are now more than a thousand Web Services [22] offering access to disparate biological resources namely data and computational tools. It is extremely difficult for biological researchers to search in a Web Services (WS) registry for a relevant WS using the standard (primarily computational) descriptions used to describe it. Semantic Biological Web Services Registry (SemBOWSER) is an ontology-based implementation of the UDDI specification, which enables, at present, glycoproteomics researchers to publish, search and discover WS using semantic, service-level, descriptive domain keywords . SemBOWSER classifies a WS along two dimensions- the task they implement and the domain they are associated with. Each published WS is associated with the relevant ProPreO (comprehensive process ontology for glycoproteomics experimental lifecycle) ontology-based kej^words (implemented as part of the registry). A researcher, in turn, can search for relevant WS using only the descriptive kej^words, part of their everyday working lexicon. This intuitive search is underpinned by the ProPreO ontology, thereby making use of the inherent advantages of a semantic search, as compared to a purely syntactic search, namely disambiguation and use of named relationships between concepts. SemBOWSER is part of the glycoproteomics web portal 'Stargate'.
ER -
TY - JOUR
T1 - GLYDE-an expressive XML standard for the representation of glycan structure
JF - Carbohydr Res
Y1 - 2005
A1 - William York
A1 - Amit Sheth
A1 - Cory Henson
A1 - Christopher Thomas
A1 - Satya S. Sahoo
KW - GLYcan Data Exchange (GLYDE)
KW - Glycan data interoperability
KW - Glycoinformatics
KW - XML-based glycan representation
AB - The amount of glycomics data being generated is rapidly increasing as a result of improvements in analytical and computational methods. Correlation and analysis of this large, distributed data set requires an extensible and flexible representational standard that is also 'understood' by a wide range of software applications. An XML-based data representation standard that faithfully captures essential structural details of a glycan moiety along with additional information (such as data provenance) to aid the interpretation and usage of glycan data, will facilitate the exchange of glycomics data across the scientific community. To meet this need, we introduce GLYcan Data Exchange (GLYDE) standard as an XML-based representation format to enable interoperability and exchange of glycomics data. An online tool () for the conversion of other representations to GLYDE format has been developed.
ER -
TY - CONF
T1 - Semantic Web Services for N-glycosylation Process
T2 - International Symposium on Web Services for Computational Biology and Bioinformatics
Y1 - 2005
A1 - William York
A1 - Satya S. Sahoo
A1 - Amit Sheth
A1 - John Miller
AB - Glycomics is one of the many research efforts currently underway in the biosciences domain, which is characterized by high throughput data generated at multiple experimental stages. For example, analysis of N-glycosylation encompasses stages from cell-culture to peptide identification and quantification. Research groups across the world use diverse cell cultures, separation and spectroscopic techniques, and data identification, correlation and integration methodologies. Thus, data generated at different phases of the process by multiple groups are both structurally and functionally heterogeneous. Automatic semantic annotation of such experimental data with concepts defined in domain ontology can provide detailed process information, including the experimental environment that is critical for the comparison and analysis of such data, thus increasing the opportunity for rapid knowledge discovery. Semantic annotation of scientific data not only allow standard interpretation, but by taking advantage of the rich relationships among concepts in the ontology it is possible to derive mapping, inferences and correlations that may be too obscure for analysis and discovery by humans. Web service technology is a natural enabler for synergistic usage of computational tools developed at different labs using heterogeneous data.
JA - International Symposium on Web Services for Computational Biology and Bioinformatics
CY - Blacksburg, VA
ER -
TY - JOUR
T1 - Simple modification of a protein database for mass spectral identification of N-linked glycopeptides
Y1 - 2005
A1 - James Atwood
A1 - Satya Sahoo
A1 - Gerardo Alvarez-Manilla
A1 - D. Brent Weatherly
A1 - Kumar Kolli
A1 - Ron Orlando
A1 - William York
AB - We describe an algorithm which modifies a protein database such that during a database search deamidation is limited to asparagines strictly contained within the N-glycosylation consensus sequence. The modified database was evaluated using a dataset created from the shotgun proteomic analysis of N-linked glycopeptides from human blood serum. We demonstrate that the application of the modified database eliminates incorrect glycopeptide assignments, reduces the peptide false-discovery rate, and eliminates the need for manual validation of glycopeptide identifications.
ER -
TY - CONF
T1 - Semantic Integration of Glycomics Data and Information
T2 - Human Disease Glycomics/Proteome Initiative 1st Workshop 2004: Functional Glycomics in Disease
Y1 - 2004
A1 - Krzysztof Kochut
A1 - Meenakshi Nagarajan
A1 - Karthik Gomadam
A1 - Christopher Thomas
A1 - William York
A1 - X. Yi
A1 - Amit Sheth
A1 - John Miller
JA - Human Disease Glycomics/Proteome Initiative 1st Workshop 2004: Functional Glycomics in Disease
PB - Human Disease Glycomics/Proteome Initiative 1st Workshop 2004: Functional Glycomics in Disease
CY - Osaka, Japan
ER -
TY - CONF
T1 - Semantic Web technology in support of Bioinformatics for Glycan Expression
T2 - W3C Workshop on Semantic Web for Life Sciences
Y1 - 2004
A1 - Meenakshi Nagarajan
A1 - Amit Sheth
A1 - X. Yi
A1 - William York
A1 - Christopher Thomas
A1 - Satya S. Sahoo
A1 - Krzysztof Kochut
A1 - John Miller
JA - W3C Workshop on Semantic Web for Life Sciences
PB - W3C Workshop on Semantic Web for Life Sciences
CY - Cambridge, MA
ER -