Syndicate

A Systematic Survey of Point Set Distance Measures for Link Discovery

Submitted by Axel-Cyrille Ng... on 02/22/2017 - 08:51

Tracking #: 1574-2786

Authors:

Mohamed Sherif

Axel-Cyrille Ngonga Ngomo

Responsible editor:

Claudia d'Amato

Submission type:

Survey Article

Abstract:

Large amounts of geo-spatial information have been made available with the growth of the Web of Data. While discovering links between resources on the Web of Data has been shown to be a demanding task, discovering links between geo-spatial resources proves to be even more challenging. This is partly due to the resources being described by the means of vector geometry. Especially, discrepancies in granularity and error measurements across data sets render the selection of appropriate distance measures for geo-spatial resources difficult. In this paper, we survey existing literature for point-set measures that can be used to measure the similarity of vector geometries. We then present and evaluate the ten measures that we derived from literature. We evaluate these measures with respect to their time-efficiency and their robustness against discrepancies in measurement and in granularity. To this end, we use samples of real data sets of different granularity as input for our evaluation framework. The results obtained on three different data sets suggest that most distance approaches can be led to scale. Moreover, while some distance measures are significantly slower than other measures, distance measure based on means, surjections and sums of minimal distances are robust against the different types of discrepancies.

This third version of the article “A systematic Survey of Point Set Distances Measures for Link Discovery” addresses the three major issues that I pointed out in my previous reviews:
- the choice of the orthodromic distance instead of the great elliptic distance or the euclidian distance as a basic distance measure is discussed,
- the question of the topological consistency of data generated to create a benchmark dataset is not solved in this article, but at least its impact on the point set based linking approach based adopted in this article is discussed. This issue will be considered in future work where the authors plan to set up a generic benchmark dataset that can be used with various matching strategies (e.g based on topological properties of geographic data),
- the Fréchet distance value given for the example on Malta has been changed and is consistent with the value I had in my previous reviews. The overall tests have been updated with this new Fréchet distance measure, and the results are analysed and discussed with respect to the new linking results.