Data Validation with OWL Integrity Constraints

Transcription

1 Data Validation with OWL Integrity Constraints (Extended Abstract) Evren Sirin Clark & Parsia, LLC, Washington, DC, USA Abstract. Data validation is an important part of data integration and analysis tasks. The consequences of having invalid data ranges from rather harmless application failures to serious errors in decision making process. Web Ontology Language (OWL) provides an expressive language that facilitates data integration and analysis tasks. However, the Open World Assumption (OWA) adopted by standard OWL semantics, combined with the absence of the Unique Name Assumption (UNA), makes it difficult to use OWL for data validation. What triggers constraint violations in closed world systems leads to new inferences in standard OWL systems. In this paper, we present an Integrity Constraint (IC) semantics for OWL axioms to address this issue. Ontology modelers can choose which axioms will be interpreted with IC semantics and combine open world reasoning with closed world constraint validation in a flexible way. We also show that IC validation can be reduced to query answering under certain conditions. 1 Introduction Data integration and analysis are important tasks in many domains and applications. As IT systems are moving towards a more distributed pattern of implementation and deployment, applications need data to be enriched with more semantics. Richer data semantics enables us to build a unified model over distributed data sources and to perform analysis and reasoning tasks over the unified model. Web Ontology Language (OWL) provides a solution to this problem by allowing the representation of data semantics in a formal logic-based language that is amenable to automated reasoning. The semantics of OWL addresses distributed knowledge representation scenarios where complete knowledge about the domain cannot be assumed. OWL adopts Open World Assumption (OWA) so a statement cannot be inferred to be false on the basis of failure to prove it. Furthermore, OWL does not adopt Unique Name Assumption (UNA) which means two resources with different identifiers might be treated as same objects. This paper is a summary of earlier publications [8,9]. P. Hitzler and T. Lukasiewicz (Eds.): RR 2010, LNCS 6333, pp , c Springer-Verlag Berlin Heidelberg 2010

2 Data Validation with OWL Integrity Constraints 19 The above characteristics of OWL make it difficult to use OWL for data validation in applications where complete knowledge can be assumed for some or all parts of the domain. In such data-centric applications, we would like to use OWL as an expressive schema language to specify the constraints that must be satisfied by instance data. In the literature, OWA has been identified as the biggest single hurdle to understanding OWL [7]. It is a common misconception for newcomers to think that axioms in OWL are similar to constraints in relational databases. However, the axioms in an ontology are meant to infer new knowledge rather than trigger an inconsistency. In many use cases, we need the ability to combine open world reasoning with closed world constraint validation in a flexible way. It should be possible to use OWA for the parts of the domain where complete knowledge cannot be assumed and use CWA for the other parts of the domain where we have complete knowledge. In our previous work [9], we presented an alternative semantics for OWL axioms to enable closed world data validation. The ontology developers can choose which axioms will be interpreted with regular OWL semantics and which axioms will be interpreted with IC semantics. In the rest of this paper, we provide a simple example to illustrate the difficulties in using OWL for data validation, briefly describe our IC semantics proposal and present an IC validation algorithm we developed for this semantics. Most of the technical details are omitted in this paper and can be found in [9]. 2 IC Example There are various types of ICs identified in the literature. Some examples are subsumption constraints, typing constraints, participation constraints and uniqueness constraints. We will use participation constraints as one example to illustrate the difficulties in using OWL for data validation. A mandatory participation constraint states that instances of the constrained class should participate in a relation. If we would like to express that every Product instance should be related by the madeby property to a there is a Manufacturer, we can write the following OWL axiom: Product madeby.manufacturer (1) However, this participation constraint is expressing a general truth about the world. It does not constrain what should exist in a specific ontology or knowledge base. For example, suppose we have an ontology where there is an instance of the Product class is defined: Product(product1) (2) This ontology is not inconsistent according to the semantics of OWL since with OWA we can conclude that product2 has a manufacturer but no knowledge

3 20 E. Sirin about that manufacturer exists in this ontology. For data validation purposes, we might want to detect or even prevent cases when the manufacturer of a product is not known. However, it is not possible to do so with the above axiom. It is possible to close the onotlogy by augmenting it with additional assertions to state that all the relevant information is known. In a preprocessing step, we can check the explicit and the implicit property values for each individual and add explicit cardinality restrictions to assert that there are no more property values. For the above example, we can add the following type assertion: ( 0 madeby)(product1) (3) The combination of (1), (2), and (3) would result in an inconsistency with regular OWL semantics. However, there are couple of problems with the preprocessing approach. First, the preprocessing step can be computationally expensive especially because we need to take the entailments of a KB into consideration. This problem becomes more significant if the data assertions are changing frequently. Second, even if we ignore the efficiency considerations, preprocessing solution cannot address other kind of constraints such as typing constraints (see examples discussed in [9]). 3 IC Semantics for OWL It is apparent, even from the simple example presented above, that OWL ontologies cannot be used in a straight-forward way for data validation purposes. In order to overcome this problem, we started investigating possible solutions. Our goal was to enable using OWL to express ICs without needing a different representation language. The approach formalized in [4] describes one such solution where the axioms in an OWL ontology is partitioned into two sets. The axioms in one set is interpreted with regular OWL axioms to do inference and the axioms in the second set is interpreted with a closed-world semantics based on minimal Herbrand models to do validation. Even tough this approach satisfies many of the conditions mentioned so far we have identified several issues with the semantics that would yield undesirable results in practice [9]. In order to overcome the problems we have identified in existing solutions, we defined a new IC semantics [9] for interpreting OWL axioms. The IC semantics we define extends the model theory of OWL 2 or more correctly the model theory of the Description Logic SROIQ [2] which is the basis of OWL 2 semantics [5]. The semantics extension we propose has many similarities to epistemic DLs such as ALCK[1] but there are also some differences. First, unlike ALCK we do not require the epistemic operator K to be explicitly used. The semantics for ICs is defined as if the K operator exists in front of every class and property. Second, the IC semantics we define is applicable to any SROIQ ontology and not restricted to ALC expressivity. Third and most importantly, ALCK semantics adopt strict UNA which excludes the possibility of stating two names identify the same individual. Even

4 Data Validation with OWL Integrity Constraints 21 though we want to prohibit ICs to infer equality between individuals, we still would like to allow OWL ontologies to include explicit equality between individual names to assist data integration scenarios. In our formalization, we adopt a weak form of UNA where two different individual names are assumed to be different unless their equality is required to satisfy the axioms in an ontology. 4 Data Validation Algorithm We show in [9] that given a regular OWL ontology and an IC expressed as an OWL axiom, checking if the IC is violated by the OWL ontology can be reduced to conjunctive query answering under certain conditions. These conditions require, intuitively, that either the ontology does not contain disjunctive (in)equality between individuals or the IC does include cardinality restrictions. In OWL, only nominals or cardinality restrictions can result in disjunctive (in)equality so if neither exists in an ontology then we can use the query reduction technique as explained next. The ICs can be translated to queries using an approach very similar to the well-known Lloyd-Topor transformation [3].The queries that are produced as the result of this transformation contains only distinguished variables and may also include the negation-as-failure operator. The translation algorithm transforms an IC into a query such that the IC is violated w.r.t. the proposed IC semantics by an ontology if and only if the ontology entails the query. In other words, whenever the answer set of the query is not empty w.r.t. an ontology, we conclude that the IC is violated by that ontology. There are some nice practical benefits of the query translation approach. It is possible to express the generated queries using the SPARQL [6] query language which is the most commonly used Semantic Web query language. Even tough the standard SPARQL semantics is not compatible with OWL semantics, it allows extended entailment regimes to be used and a precise definition of OWL-compatible SPARQL semantics is being developed by the W3C s SPARQL Working Group as part of SPARQL Most existing OWL reasoners support answering SPARQL queries so the SPARQL queries generated by this translation can be evaluated using off-the-shelf OWL reasoners. 5 Conclusions The IC semantics we propose addresses many IC use cases like such as participation, typing and uniqueness constraints. By adopting weak form of UNA, the IC semantics allows explicit equality assertions to be asserted while avoiding undesirable equality inferences due to uniqueness constraints. We have shown in [9] with several examples that our IC semantics proposal provides more intuitive results compared to other proposals such as [4]. Our approach allows ontology developers use OWL both to express axioms for inferencing and ICs for data validation 1

Matching Semantic Service Descriptions with Local Closed-World Reasoning Stephan Grimm 1, Boris Motik 1, and Chris Preist 2 1 FZI Research Center for Information Technologies at the University of Karlsruhe

Common Errors In OWL Alan Rector, Nick Drummond, Matthew Horridge, Holger Knublauch, Jeremy Rogers, Robert Stevens, Hai Wang, Chris Wroe Introduction The examples in this talk are based on courses about

DC Proposal: Automation of Service Lifecycle on the Cloud by Using Semantic Technologies Karuna P. Joshi* Computer Science and Electrical Engineering University of Maryland, Baltimore County, Baltimore,

Logical and categorical methods in data transformation (TransLoCaTe) 1 Introduction to the abbreviated project description This is an abbreviated project description of the TransLoCaTe project, with an

Reasoning Web 2012 Summer School Geospatial Information with Description Logics, OWL, and Rules Presenter: Charalampos Nikolaou Dept. of Informatics and Telecommunications National and Kapodistrian University

Data Quality in Information Integration and Business Intelligence Leopoldo Bertossi Carleton University School of Computer Science Ottawa, Canada : Faculty Fellow of the IBM Center for Advanced Studies

Scalable and Efficient Reasoning for Enforcing Role-Based Access Control Tyrone Cadenhead, Murat Kantarcioglu, and Bhavani Thuraisingham Department of Computer Science, The University of Texas at Dallas,

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical and Mathematical Sciences 2012 1 p. 43 48 ON FUNCTIONAL SYMBOL-FREE LOGIC PROGRAMS I nf or m at i cs L. A. HAYKAZYAN * Chair of Programming and Information

ORE - A Tool for Repairing and Enriching Knowledge Bases Jens Lehmann and Lorenz Bühmann AKSW research group, University of Leipzig, Germany, lastname@informatik.uni-leipzig.de Abstract. While the number

Languages and Semantic Web Architecture The Semantic Web Tower what is the semantic web Problems Layering the Semantic Web The problem in detail and suggested approaches Øyvind Evensen What is the semantic