Subscribe to the latest research through IGI Global's new InfoSci-OnDemand Plus

InfoSci®-OnDemand Plus, a subscription-based service, provides researchers the ability to access full-text content from over 100,000 peer-reviewed book chapters and 26,000+ scholarly journal articles covering 11 core subjects. Users can select articles or chapters that meet their interests and gain access to the full content permanently in their personal online InfoSci-OnDemand Plus library.

When ordering directly through IGI Global's Online Bookstore, receive the complimentary e-books for the first, second, and third editions with the purchase of the Encyclopedia of Information Science and Technology, Fourth Edition e-book.

InfoSci®-Journals Annual Subscription Price for New Customers: As Low As US$ 5,100

This collection of over 175 e-journals offers unlimited access to highly-cited, forward-thinking content in full-text PDF and HTML with no DRM. There are no platform or maintenance fees and a guarantee of no more than 5% increase annually.

Abstract

We are witnessing an increasing popularity of the Web of Data, which exposes a large variety of web sources that provide their data using RDF. Ontological models are used as the schema to organize this data. These models are usually shared by several communities and, to devise them, there is usually an agreement amongst those communities. As a result, it is common to have more than one ontological model to understand some RDF data; therefore, there might be a gap between the ontological models and the RDF data, which is not negligible in practice. In this article, the authors present a technique to automatically discover ontological models from raw RDF data. It is based on the intensive usage of a set of SPARQL 1.1 structural queries that are generic and independent from the RDF data. The final result of the authors' technique is an ontological model that is derived from the RDF data, and includes types and properties, subtypes, domains and ranges of properties and subproperties. The authors have conducted experiments with millions of triples that prove that their technique is suitable to deal with Big RDF Data. As far as they know, this is the first technique to discover such ontological models in the context of RDF data and the Web of Data.

Article Preview

Introduction

In 2001, there was a movement called the Semantic Web whose goal was to endow the current Web with metadata, and, as a result, had the goal of evolving it into a Web of Data to improve its accessibility by computers (Polleres & Huynh, 2009; Shadbolt et al., 2006). Currently, we are witnessing an increasing popularity of the Web of Data, chiefly in the context of Linked Open Data, which is a successful initiative that consists of a number of principles to publish, connect, and query data in the Web (Bizer et al., 2009a). The consequence of this popularity is the existence of a large variety of web sources, which focus on several domains, such as government, life sciences, geography, media, libraries, or scholarly publications (Heath & Bizer, 2011). Furthermore, these sources offer their data using the RDF language, and they can be queried using the SPARQL query language (Antoniou & van Harmelen, 2008).

Scientists are currently working with the Web of Data as a large database to answer structured queries from users (Polleres & Huynh, 2009). As a result, one the main challenges scientists are facing in this context is coping with scalability, i.e., processing data at Web scale, which is usually referred to as Big Data (Bizer et al., 2011). Another challenge is not only to implement scalable solutions to deal with this amount of data, but also dealing with the steadily growth of sources in the context of the Web of Data, e.g., in the domain of Linked Open Data, there were roughly 12 such sources in 2007 and, as of the time of writing this article, there exist 226 sources (LOD Cloud, 2012).

Ontological models are used to provide schema semantics to RDF data. These models comprise types, data properties, and object properties, each of which is identified by a URI (Antoniou & van Harmelen, 2008). Ontological models are shared and developed with the consensus of one or more communities (Rivero et al., 2013b), which define a number of inherent constraints over the models, such as subtypes, the domains and/or ranges of a property, or subproperties.

In traditional information systems that comprises a back-end database, developers first need to create a data model according to the user requirements, which is later populated. Contrarily, in the Web of Data, data can exist without an explicit model, since the way it is implemented is that data in the Web already existed and models were added later. Not only that, several models may exist for the same set of data. As a result, in the context of the Web of Data, we cannot usually rely on existing ontological models to understand RDF data since there might be a gap between the models and the data, i.e., the data and the model are usually devised in isolation, without taking each other into account (Glimm et al., 2012). Furthermore, RDF data may not satisfy a particular ontological model related to these data, which is mandatory to perform a number of tasks, such as data integration (Makris et al., 2012), data exchange (Rivero et al., 2013c), data warehousing (Glorio et al., 2012), or ontology evolution (Flouris et al., 2008). As a final conclusion, current techniques to perform information integration can leverage from the discovering of conceptual models (Rivero et al., 2013a).

To give an idea that this gap between ontological models and RDF data is not negligible in practice, we provide two real-world examples based on current models and data (see (Arenas et al., 2014) for an in-depth discussion on this topic). The examples are as follows: