Internet Working Group YP. Chen
Internet-Draft H. Xia
Intended status: Informational ZM. Wang
Expires: May 29, 2018 P. Yang
CW. Tang
Shaanxi Key Laboratory of Network Data Intelligent Processing
Xi'an University of Posts and Telecommunications
November 25, 2017
INTERNET-DRAFT
A Unified Description Method for Data Service
draft-chen-ds-description-00
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on May 29, 2018.
Copyright Notice
Copyright (c) 2017 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
Chen Expires May 29, 2018 [Page 1]
Internet-Draft Data Service Unified Description November 2017
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Abstract
The rapid development of Internet has driven more and more
enterprises or individuals encapsulate operations on key data
entities we call data service (DS). Due to the different fields
between enterprise or individual, resulting in the description of
data services appear semantic heterogeneity. In this paper, we
propose a more principled approach to the problems of heterogeneous
data service on the Web. We start with a data service description
document pre-processing. Finally, we propose a unified description
language model for data service, the Unified Description Language
for Data Service (UDL4DS).
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1. Background . . . . . . . . . . . . . . . . . . . . . . . 2
2. Conventions Used in This Document . . . . . . . . . . . . . . 3
3. Data Service Description . . . . . . . . . . . . . . . . . . 4
3.1. Data Service Overview . . . . . . . . . . . . . . . . . 4
3.2. Data Service Preprocessing . . . . . . . . . . . . . . . 5
3.2.1. Data Service Acquisition . . . . . . . . . . . . . 6
3.2.2. Feature Word Extraction for Data Service . . . . . 6
3.3. Data Service Classification . . . . . . . . . . . . . . 7
3.4. Data Service Description Language Design . . . . . . 8
3.4.1. Semantic Annotation of Data Service . . . . . . . . 8
3.5. Data Service Description Model . . . . . . . . . . . . . 9
4. Security Considerations . . . . . . . . . . . . . . . . . . 10
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . 10
6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 10
7. References . . . . . . . . . . . . . . . . . . . . . . . . . 10
7.1. Normative References . . . . . . . . . . . . . . . . . 10
7.2. Informative References . . . . . . . . . . . . . . . . 10
8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . 10
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 11
1. Introduction
1.1. Background
With the development of computer Internet and cloud computing,
various forms of data information have generated. Due to these data
service use different description standards and technology on the
Web, there is no common data model and access method so that it is
Chen Expires May 29, 2018 [Page 2]
Internet-Draft Data Service Unified Description November 2017
difficult to realize the mutual sharing of heterogeneous data source
information. In order to solve the above problems, a large number of
heterogeneous data are published on the Internet in the form of
services to provide data services for service users.
The essence of the data service is to use network service protocols
and standards such as Hyper Text Transfer Protocol (HTTP), Web
Services Description Language (WSDL), XML (Extensible Markup
Language), SOAP (Simple Object Access Protocol), Universal
Description Discovery and Integration (UDDI) to encapsulate
heterogeneous data sources in the Internet by opening up an agent or
interface access and providing data services for users. However, as
data in various fields is continuously encapsulated as services,
data services are becoming more and more frequent, leading to higher
and higher requirements for data services. In the process of data
service release and invocation, there are critical problems of data
service description as following:
The existing promulgators of data service are from different
industries or fields that cause the lack of a unified data
standards and norms as a result of semantic heterogeneity
description in the data service.
With the development of data services and the increasing
complexity of demands requested by service consumers, a single
service can not accurately and quickly satisfy the complex
demands. It becomes an urgent problem about how to effectively
integrate these data services to solve actual demands required by
the customer.
The method of sorting and semantic annotation for data service is
not good enough.
In this paper, we propose a data service description language model
named UDL4DS based on XML Schema, including the classification of
data services, the construction of domain ontology and semantic
annotation to solve the semantic heterogeneity between data service
in different fields. In addition, XML Schema description of the key
elements of the language model was designed to form a common
specification to achieve a unified description of data services.
2. Conventions Used in This Document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
Chen Expires May 29, 2018 [Page 3]
Internet-Draft Data Service Unified Description November 2017
In this document, these words will appear with that interpretation
only when in ALL CAPS. Lower case uses of these words are not to be
interpreted as carrying significance described in RFC 2119.
3. Data Service Description
At present, the data service description is generally based on the
XML specification, which describes the access interface and other
information of data service. As the constant changes of needs
required by users, the description of data service is changing from
the grammatical level to the semantic level, which solves the
problem that computers are difficult to understand for their
semantic expression and provides the best data service more quickly
and intelligently. However, due to the data service providers are
from different industries which have their own standards for the
services they publish, as a result of shared-nothing and
interoperated-nothing with each other.
In this paper, through the division of data service and the solution
of data service published in different fields, we propose a unified
description language for data service (UDL4DS) based on the XML
Schema specification. We complete the description of unity in two
ways: on the one hand, we propose a new data service semantic
annotation method based on the domain ontology library to solve the
semantic heterogeneity between data services. On the other hand,
design a unified description language model, which describes the
data service according to the designed description language.
3.1. Data Service Overview
In different fields, the meaning of data service is very different.
Manu MR and Richard Manning believe that data service layer applies
SOA architecture and plays an important role in data integration.
Carey M J believes that data service is a software service that
provides a unified data model and various access operations to
data resources. WS. Zhang believes that the data Service is an XML
access interface that can access the database and return the Web
Service of the XML format result set. Zhang Peng believes that data
service only encapsulates data resources in the information system.
Before and after invocation, data service does not change the state
of the outside world, and does not have the logic function of
handling any business by itself. Following the principles of Web
architecture[W3C.REC-webarch-20041215].
The data service directly encapsulates the data of the underlying
data source and opens an access interface for the data service
requester to invocate, thus the cost of updating and maintaining the
Chen Expires May 29, 2018 [Page 4]
Internet-Draft Data Service Unified Description November 2017
system will be reduced. In addition, it can facilitate the user to
easily discover and transparently access the data from data source.
Therefore, data services are becoming more and more popular on the
encapsulation of data.
3.2. Data Service Preprocessing
Data service exists in the form of XML specification on the Web. The
service requester accesses the published data service by calling the
open interface of the data service publisher. However, the data
service publishers have different industries or fields, and the data
services perform semantic heterogeneity in service descriptions,
resulting in data service requesters can not exactly and quickly
access the best data service that satisfies their needs.
In order to discover and invoke the data service better, we
implement the preprocessing of data service by analyzing the basic
information described in the data service description document and
extracting the attribute values of the key tags in the description
document, we can obtain the feature word text that can represent the
data service, classifying data services by feature word text,
dividing the fields into which they belong, and providing keywords
that can represent the data service, as shown in Figure 1, which
illustrates the preprocessing of data service.
+----------------------------------+
| Web |
| |
+----------------------------------+
Internet-Draft Data Service Unified Description November 2017
3.2.1. Data Service Acquisition
In this paper, we mainly study the data service described by
WSDL. We find that the existing form of description document is WSDL,
ASMX based on the manifestations of WSDL description document on
the Web. We obtain these kinds of data services through the
preparation of the crawler. First, we set a certain rule according
to our own needs. second, we crawl on the Web to match the rules of
document from a given URL. Finally, end crawl as the number of
crawling documents reached the set threshold. Figure 2 shows the
process of crawling.
+------------------------+
| URL |
+------------------------+
| |
| Regular Expression |
| |
+---> +------------------------+ +------------------------+
| End |
| (more) |
+------------------------+
Figure 2: The process of Crawling
3.2.2. Feature Word Extraction for Data Service
Each data service corresponds to a WSDL description document that
describes the basic information of the data service, such as "What
does the data service do", "Where is the data service", and "how to
invoke data service". In this paper, in order to better and easier
to represent a data service, we extract some of the more
representative tags in the data service description document as
attributes of the document, such as (WSDL: service) describes the
Chen Expires May 29, 2018 [Page 6]
Internet-Draft Data Service Unified Description November 2017
name of data service, (WSDL: operation) describes what kind of
functional information the data service can accomplish. For example,
a data service "Weather Service" whose method name "Get Weather By
IP" can clearly illustrate that the data service is a service that
obtains the weather information of the city or region represented by
the IP address through the IP address.
Each element in the WSDL description document represents a certain
meaning. In order to extract the unique attribute representing the
data service, the elements in the document need to be parsed. In
this document, the content of the name attribute from the (WSDL:
service) and (WSDL: operation) tags are extracted as the document's
unique attribute value.
3.3. Data Service Classification
At present, the ontology construction generally consists of
requirements analysis, information collection, terminology
recognition, formal coding and assessment, as shown in Figure 3.4.
There are many ontology libraries built by the above aspects, but
considering the different fields and projects, the constructed
ontology base not only considers the general process but also
combines with the actual situation.
In order to construct a domain ontology suitable for this study, we
cluster the feature words of WSDL description document for obtained
data service and construct Vector Space Mode (VSM) for all feature
words, that is each WSDL description document feature word as a
column to form a word - document matrix D, the document matrix D on
behalf of N WSDL document, to facilitate the calculation of each
feature word weight in any feature word document.
Based on the prototype model of domain ontology, the ontology was
modeled by OWL ontology description language, the result of
clustering the feature words of WSDL document using K-center
algorithm, combine of domain information and the tool developed by
Stanford University.
We implement the classification of data service based on domain
ontology from three aspects. First, we parse the obtained WSDL
description document of data service, extract the feature word
document that represents the basic information of the data service,
and construct the feature word vector according to the space vector
model. Second, we use the WordNet to calculate the semantic distance
between the feature word vector and the vector formed by the domain
ontology. Finally, we select the appropriate dividing line to divide
the document into its own field.
Chen Expires May 29, 2018 [Page 7]
Internet-Draft Data Service Unified Description November 2017
The extraction of feature words for data service and the
construction of feature word space vector models, and will generate
a data service feature vector (SFV). In order to better calculate the
similarity between the feature word vector of data service and the
domain, domain ontology can be generalize to a domain vector (DV). We
can divide the data service belongs to which field according to the
similarity between two vectors.
3.4. Data Service Description Language Design
In this section, we first improve the formula for calculating the
similarity of feature words in the WSDL description document. Then,
we present an approach of calculating the similarity based on domain
ontology to complete the semantic processing of data service. On the
basis pf semantic annotation, we propose a unified description
language model of data service as well as complete the design of
description language.
3.4.1. Semantic Annotation of Data Service
In order to describe the data service uniformly, it is necessary to
solve the semantic difference between heterogeneous data services.
In this paper, we propose a new semantic annotation method for data
service which combines the domain ontology library constructed above.
The problem of semantic differences between heterogeneous data
services can be solved by semantic annotation for data service.
The idea of this method is as follows: Firstly, we extract feature
word from WSDL description document of data service to form a feature
word set that represents the description document. Secondly, we
cluster the feature word set by using K-center algorithm and construct
the domain ontology library by combining with the domain information.
Finally, we calculate the weight of each feature word combining with
the domain ontology, and the set of feature words and their weights
are stored according to ontology space vector model VSM. The WSDL
document containing these feature words Is associated with the
corresponding feature word, thus the mapping between the data service
description document and the domain ontology concept is formed.
Because ontology is a detailed description of the constraints of
the related concepts, concept attributes and the concepts of various
hierarchies in this field, semantic annotation of data services
based on domain ontology can not only reflect the relationship
between service description documents and semantic relevance of
Chen Expires May 29, 2018 [Page 8]
Internet-Draft Data Service Unified Description November 2017
categories, as well as display the implicit semantic information of
data service description documents. In this way, the data service
description documents have a certain semantic relationship between
them, so as to solve the problem of heterogeneous data services,
provide more accurate and comprehensive data services, and lay down
unified descriptions for implementing data services.
3.5. Data Service Description Model
At present, the data service description methods and standards
published on the Web are different. In order to enable the sharing
of heterogeneous service resources, it is necessary to solve the
semantic heterogeneity between data service resources to make the
data service resources to complete a unified semantic description in
service description as well as automatically judge the service
access mechanism in the implementation of service.
In this paper, we present a unified data service description
language model (UDL4DS), Figure 3 illustrates the model of UDL4DS.
+--> +-----------+ +-----------+ +-----------+ |ClassifyMethod /ClassifyMethod |
|ClassfyTime /ClassfyTime |
|Semantic |
+-------------------------------+
Figure 3: UDL4DS Language Model
Chen Expires May 29, 2018 [Page 9]
Internet-Draft Data Service Unified Description November 2017
4. Security Considerations
In this paper, we mainly focus on the unified description of the
heterogeneous data service described in the existing WSDL. However,
when considering the heterogeneous data sources such as text or
webpage data and other forms of data services, the study is not
comprehensive enough.
5. IANA Considerations
There are no IANA considerations related to this document.
6. Conclusions
This document proposes a unified description method for
heterogeneous data service, which can make data service share to
solve the complex needs of users. We start with a pre-processing of
data service description document. Second, we propose a unified
description language model for data service, the Unified Description
Language for Data Service (UDL4DS). Finally, we implement
description system of data service based on Web.
7. References
7.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
7.2. Informative References
[W3C.REC-webarch-20041215]
Jacobs, I. and N. Walsh, "Architecture of
the World Wide Web, Volume One", World Wide Web Consortium
Recommendation REC-webarch-20041215, December 2004.
8. Acknowledgments
Thanks for comments and suggestions provided by H. Wang.
This document was prepared using 2-Word-v2.0.template.dot.
Chen Expires May 29, 2018 [Page 10]
Internet-Draft Data Service Unified Description November 2017
Authors' Addresses
YP Chen
Shaanxi Key Laboratory of Network Data Intelligent Processing
Xi'an University of Posts and Telecommunications
China
Email: CHENYP@XUPT.edu.cn
H Xia
Shaanxi Key Laboratory of Network Data Intelligent Processing
Xi'an University of Posts and Telecommunications
China
Email: XIAHONG@XUPT.edu.cn
ZM Wang
Shaanxi Key Laboratory of Network Data Intelligent Processing
Xi'an University of Posts and Telecommunications
China
Email: ZMWANG@XUPT.edu.cn
P Yang
Xi'an University of Posts and Telecommunications
China
Email: YANGPING@163.com
CW Tang
Xi'an University of Posts and Telecommunications
China
Email: 1316904833@qq.com
Chen Expires May 29, 2018 [Page 11]