How to detect suspicious OpenStreetMap Changesets with incorrect edits?

by Pascal Neis - Published: January 20th, 2016

Since its rise in popularity, the well-known online encyclopedia Wikipedia has been struggling with manipulation or, in the worst-case, vandalism attempts. Similarly, the OpenStreetMap (OSM) project suffered several times over the past few years of cases where incorrect map data edits were made. These erroneous edits can stem at times from (new) contributors or illegal data imports (or automated edits) which have not been discussed in advance with the community or the Data Working Group (DWG) and corrupted existing project data. The current OSM wiki page gives a great overview about general guidelines and e.g. types of vandalism. Another page in the wiki also mentions a prototype of a rule based system for the automatic detection of vandalism in OSM, which I developed in 2012. However, the system has never actually been implemented. Today, the contributors of OSM can use a variety of different tools to inspect an area or particular map changes. A few of them are listed below (complete list can be found here):

Based on the database which I use for multiple other services, I created an easy to use webpage to find suspicious OSM changesets with possibly incorrect map edits. The webpage offers some filter options such as the boundary of a country or the object change of interest. In contrast to the other aforementioned webpages you can also filter changesets based on the active “mapping days” of the contributor. A “mapping day” is a day on which the contributor created at least one changeset, independent from the registration date. I am also planning on adding additional user reputation information such as used editors or tagging behavior. And of course I am going to add some RSS feeds in the next version. The first version can be found here.

What makes all of this different from other tools? Well, I think one of the major advantages is the simplicity of the webpage and that you can filter changesets based on the contributor activity and/or the changeset edits. In contrast to other tools, you can find changesets not only based on your area of interest, but also based on potential beginner mistakes and hopefully not vandalism attempts or fictional/ none existing map data.

Nice job! This will be helpful for detecting new bots that aren’t playing by the rules. When I open up to ‘last 24 hours’ or ‘last 48 hours’, my bot certainly shows up! (It’s been discussed on imports and talk-us, and there’s a page for it on the wiki, linked from import/catalogue. I don’t expect your program to be able to tell that, though.)

Nice tool. It would be cool to see edit frequency for a user across multiple changesets. So for example if someone makes 1000 changsets in a day with 10 nodes each, it will be just as obvious as 10000 nodes in 1 changeset.
Filtering by node/way creation/deletion rate would also be nice. Thus it would would be easy to spot obvious automated mass edits.