Automatic POI Matching Using an Outlier Detection Based Approach

Authors

Abstract

Points of Interest (POI) are widely used in many applications nowadays mainly due to the increasing amount of related data available online, nota-bly from volunteered geographic information (VGI) sources. Being able to connect these data from different sources is useful for many things likeval-idating, correcting and also removing duplicated data in a database. Howev-er, there is no standard way to identify the same POIs across different sources and doing it manually could be very expensive. Therefore, automat-ic POI matching has been an attractive research topic. In our work, we pro-pose a novel data-driven machine learning approach based on an outlier de-tection algorithm to match POIs automatically. Surprisingly, works that have been presented so far do not use data-driven machine learning ap-proaches. The reason for this might be that such approaches need a training dataset to be constructed by manually matching some POIs. To mitigate this, we have taken advantage of the Crosswalk API, available at the time we started our project, which allowed us to retrieve already matched POI data from different sources in US territory. We trained and tested our model with a dataset containing Factual, Facebook and Foursquare POIs from New York City and were able to successfully apply it to another dataset of Facebook and Foursquare POIs from Porto, Portugal, finding matches with an accuracy around 95%. These are encouraging results that confirm our approach as an effective way to address the problem of automatically matching POIs. They also show that such a model can be trained with data available from multi-ple sources and be applied to other datasets with different locations from those used in training. Furthermore, as a data-driven machine learning ap-proach, the model can be continuously improved by adding new validated data to its training dataset.