Subscribe to the latest research through IGI Global's new InfoSci-OnDemand Plus

InfoSci®-OnDemand Plus, a subscription-based service, provides researchers the ability to access full-text content from over 93,000+ peer-reviewed book chapters and 24,000+ scholarly journal articles covering 11 core subjects. Users can select articles or chapters that meet their interests and gain access to the full content permanently in their personal online InfoSci-OnDemand Plus library.

When ordering directly through IGI Global's Online Bookstore, receive the complimentary e-books for the first, second, and third editions with the purchase of the Encyclopedia of Information Science and Technology, Fourth Edition e-book.

InfoSci®-Journals Annual Subscription Price for New Customers: As Low As US$ 4,080*

This collection of over 185 e-journals offers unlimited access to highly-cited, forward-thinking content in full-text PDF and HTML with no DRM. There are no platform or maintenance fees and a guarantee of no more than 5% increase annually.

Albert Weichselbraun (Vienna University of Economics and Business, Austria), Gerhard Wohlgenannt (Vienna University of Economics and Business, Austria) and Arno Scharl (MODUL University Vienna, Austria)

Abstract

By providing interoperability and shared meaning across actors and domains, lightweight domain ontologies are a cornerstone technology of the Semantic Web. This chapter investigates evidence sources for ontology learning and describes a generic and extensible approach to ontology learning that combines such evidence sources to extract domain concepts, identify relations between the ontology’s concepts, and detect relation labels automatically. An implementation illustrates the presented ontology learning and relation labeling framework and serves as the basis for discussing possible pitfalls in ontology learning. Afterwards, three use cases demonstrate the usefulness of the presented framework and its application to real-world problems.

1 Introduction

Ontologies, which are commonly defined as explicit specifications of shared conceptualizations (Gruber, 1995), provide a reusable domain model which allows for many applications in the areas of knowledge engineering, natural language processing, e-commerce, intelligent information integration, bio-informatics etc. Not all ontologies share the same amount of formal explicitness (Corcho, 2006), nor do they include all the components that can be expressed in a formal language, such as concept taxonomies and various types of formal axioms. Ontology research therefore distinguishes between lightweight and heavyweight ontologies (Studer et al., 1998). The manual creation of such conceptualizations for non-trivial domains is an expensive and cumbersome task which requires highly specialized human effort (Cimiano, 2006). Furthermore the evolution of domains results in a constant need for refinement of domain ontologies to ensure their usefulness.

Automated approaches to learning ontologies from existing data are intended to improve the productivity of ontology engineers. Buitelaar et al. (2005) organize the tasks in ontology learning into a set of layers. Ontology learning from text requires lexical entries to link single words or phrases to concepts . Synonym extraction helps to connect similar terms to a concept. Taxonomies provide the ontology’s backbone while non-taxonomic relations supply arbitrary links between the concepts. Finally, axioms are defined or acquired to derive additional facts.

Data sources for ontology learning typically include unstructured, semi-structured and structured data (Cimiano, 2006). Ontology learning from structured data consumes information sources such as database schemas or existing ontologies. This process is also called lifting as it lifts or maps parts of existing schemas to new logical definitions. Since most of the available data out there appear in unstructured and semi-structured forms, a major research focus over the last two decades has been the extraction of domain models from natural language text using a variety of methods. Cimiano (2006) presented an extensive overview of ontology learning methods from unstructured data. Many of the methods involve corpus statistics such as association rules mining (Maedche et al., 2002), co-occurrence analysis for term clustering (Wong et al., 2007), latent semantic analysis for detecting synonyms and concepts (Landauer & Dumais, 1997), and kernel methods for classifying semantic relations (Giuliano et al., 2007). Many corpus-based approaches are based on Harris’ distributional hypothesis (Harris, 1968), which states that terms or words are similar to the extent that they occur in syntactically similar contexts. Besides corpus statistics, researchers also apply linguistic parsing and linguistic patterns in ontology learning, building on the seminal work of Hearst (Hearst, 1992), patterns support taxonomy extraction (Liu et al., 2005), detection of concepts and labeled relations in combination with the application of Web statistics (Sánchez-Alonso & García, 2006), or Web-scale extraction of unnamed relations (Etzioni et al., 2008).