Abstract

User-generated content can help the growth of linked data. However, there are a lack of interfaces enabling ordinary people to author linked data. Secondly, people have multiple perspectives on the same concept and different contexts. Thirdly, there are not enough ontologies to model various data. Therefore, the authors of this chapter propose an approach to enable people to share various data through an easy-to-use social platform. Users define their own concepts and multiple conceptualizations are allowed. These are consolidated using semi-automatic schema alignment techniques supported by the community. Further, concepts are grouped semi-automatically by similarity. As a result of consolidation and grouping, informal lightweight ontologies emerge gradually. The authors have implemented a social software system, called StYLiD, to realize the approach. It can serve as a platform motivating people to bookmark and share different things. It may also drive vertical portals for specific communities with integrated data from multiple sources. Some experimental observations support the validity of the approach.

Introduction

Linked data is a method of exposing, sharing and connecting data on the Semantic Web. It provides the mechanisms for publishing and interlinking structured data into a Web of Data. This forms a data commons where people and organizations can post and consume data about anything. Due to the network effect, usefulness of data increases the more it is linked with other data. Organizations benefit by being in this global data network, accessible to both people and machines. Linked data can be fully realized with existing technologies maintaining compatibility with legacy applications while exposing data from them. Thus, linked data is a significant practical movement towards the vision of the Semantic Web (Berners-Lee, 2006; Bizer, Cyganiak, & Heath, 2007).

However, some issues still remain which need to be addressed for wider adoption of linked data. Firstly, it is not obvious how ordinary people, without any technical expertise, can publish and share linked data directly. Linked data research can benefit from the combination of Semantic Web and social web techniques. A lot of data on the web comes from the people. However, there is a lack of human interfaces to publish linked data explicitly. So people still share unstructured data and it is hard to derive semantic structure and links from such contents.

Secondly, the fact that there may be multiple perspectives on the same concept, different aspects or contexts to be considered, is often ignored. In the distributed web, different parties may have different schemas or conceptualizations for the same type of data because of different requirements or preferences. So organizations usually need to integrate their data at the schema level. However, today data is mainly being linked at the instance level only (Jhingran, 2008) though knowledge of schema is very important for information exchange and integration between systems and for querying data sources. Therefore, we should also link data at the schema level to explicitly encode the knowledge of relations among multiple conceptualizations. Currently, it is not obvious how to link or relate such multiple concept schemas in the linked data web.

Lastly, the state of the art lacks structures that can represent and organize the wide range of concepts needed by the community. There are still not enough ontologies or vocabularies for describing linked data about various things (Siorpaes & Hepp, 2007a; Van Damme, Hepp, & Siorpaes, 2007). There is a long tail of information domains for which people have information to share (Huynh, Karger, & Miller, 2007). Developing individual solutions for the long tail is infeasible because data modeling is difficult. Also it is not always practical for different parties to commit to a single data model or common vocabulary. It may be possible to achieve some level of consensus but the process of collaborative interaction with common understanding is itself difficult and time-consuming.

Considering the above issues, we propose the following contributions.

Social linked data authoring. We attempt to enable ordinary users to publish structured linked data directly through simple authoring interfaces. We have implemented a linked data authoring social software for sharing a wide variety of data in the community.

Multiple conceptualizations. Users may freely define their own concept schemas and share different types of structured linked data. We propose allowing different people to have multiple conceptualizations.

Concept consolidation. At the same time, these multiple concept schemas are consolidated by mapping and linking them at the schema level. This is done semi-automatically, supported by the community, using data integration principles with schema alignment techniques. We propose concept consolidation as a new way of building up conceptualizations from the community. This is a loose collaborative approach requiring minimum understanding and allowing different parties to maintain individual requirements.

Emergence of lightweight ontologies. Besides community-based formation of conceptualizations by consolidation, concepts can evolve and gradually emerge out by popularity. Further, similar concept schemas can be grouped and organized semi-automatically. Together these processes enable emergence of informal lightweight ontologies.