Syndicate

A Domain Categorisation of Vocabularies Based on a Deep Learning

Tracking #: 2073-3286

This paper is currently under review

Authors:

Alberto Nogales

Alvaro García Tejedor

Miguel Angel Sicilia

Responsible editor:

Freddy Lecue

Submission type:

Full Paper

Abstract:

The publication of large amounts of open data has become a major trend nowadays. This is a consequence of projects
like the Linked Open Data (LOD) community, which publishes and integrates datasets using techniques like Linked Data.
Linked Data publishers should follow a set of principles for dataset design. This information is described in a 2011 document
that describes tasks as the consideration of reusing vocabularies. With regard to the latter, another project called Linked Open
Vocabularies (LOV) attempts to compile the vocabularies used in LOD. These vocabularies have been classified by domain
following the subjective criteria of LOV members, which has the inherent risk introducing personal biases. In this paper, we
present an automatic classifier of vocabularies based on the main categories of the well-known knowledge source Wikipedia.
For this purpose, word-embedding models were used, in combination with Deep Learning techniques. Results show that with a
hybrid model of regular Deep Neural Network (DNN), Recurrent Neural Network (RNN) and Convolutional Neural Network
(CNN), vocabularies could be classified with an accuracy of 93.57 per cent. Specifically, 36.25 per cent of the vocabularies
belong to the Culture category.