To link to the entire object, paste this link in email, IM or documentTo embed the entire object, paste this HTML in websiteTo link to this page, paste this link in email, IM or documentTo embed this page, paste this HTML in website

ONTOLOGY-BASED SEMANTIC INTEGRATION OF HETEROGENEOUS
INFORMATION SOURCES
by
Sangsoo Sung
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER SCIENCE)
May 2008
Copyright 2008 Sangsoo Sung

The main goal of this research is to improve interoperability between different information sources. Since ontologies, collections of concepts and their interrelationships, have become a synonym for the solution to many problems resulting from computers' inability to understand natural language, they can capture the semantics of diverse representations in heterogeneous information sources. Thus, ontologies can facilitate the identification of semantic matching between the different representations. Therefore, this dissertation studies the role of ontologies in semantic matching structured data and a method of building ontologies for the semantic matching.; One of the critical problems in the federation of information sources is that similar domains have been expressed in different manners. To address this problem, this dissertation presents an ontology-based federation of heterogeneous information sources. We define a simple yet powerful representation model for structuring ontology which can extract canonical representation from a broad range of meta data models, including relational databases, XML, RDF, OWL, and DAML+OIL.; The second major problem in the federation is that similar domains also have been expressed in diverse terminologies by domain experts who typically have their own interpretations of the domain. To tackle this problem, we incorporate ontologies to identify matches among different terminologies. Since ontologies play a key role in knowledge management by providing solutions to many problems resulting from computers' inability to understand natural language, they can facilitate the identification of semantic matching between the different representations. The basic idea in computing the semantic similarity is that similar concepts share a more specific common parent.; As many thousands of articles are published daily on the Web, neologisms or domain specific terms appear as time passes. Thus, the third major problem in the federation is that the employment of out-of-date ontologies may decrease the accuracy of our matching framework. Also, the ontology learning process where traditional clustering algorithms are involved tends to be slow and computationally expensive when the dataset is as large as the Web. Therefore, it is essential to maintain ontologies to reflect up-to-date knowledge. To address this problem, we present an efficient concept clustering technique for ontology learning that reduces the number of required pairwise term similarity computations without a loss of quality.; This study makes three major contributions. The first contribution is a solution architecture that resolves conflicts in the semantics of existing information sources. The second major contribution of the research is a solution to automatically create semantic mappings. The another important contribution of this dissertation is a solution architecture that provides a well-founded, rapid ontology learning framework based on the reduction of the use of the expensive measure by pre-clustering a large dataset. Our approach can be coupled with any type of federating, matching, clustering methods and can be utilized for making algorithms scalable with respect to the millions of information sources and documents. Therefore, this dissertation has contributed to both understanding the integrating problem on diverse information sources and developing matching framework using ontologies.

ONTOLOGY-BASED SEMANTIC INTEGRATION OF HETEROGENEOUS
INFORMATION SOURCES
by
Sangsoo Sung
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER SCIENCE)
May 2008
Copyright 2008 Sangsoo Sung