Persistent Place Identifier

This page is for a systematic review of the theme: a unique identifier to identify an OSM feature, and that never changes. (for short a perma_id).

As working definition for OSM feature we can say that it is "a kind of map feature, a stable thing in some (time-space) scale of reference"...

The theme have its concepts and problems/solutions to be discussed, and this article is is used to express and preserve reference-models, consensus and working definitions. Some parts of this theme are under "diffuse discussion" with no nitid consensus, so this article also reflect the diversity of opinions — the article express, when possible, a neutral point of view — and the lack of some solid definitions.

There are also closed proposals, the Permanent ID and the stable.openstreetmap.org server, with nitid objectives and less diffuse discussion.

Working definitions

As working definition for OSM feature we can say that it is a "stable thing in some scale of reference"... In detail:

is an OSM element: relation, way or node. The element is already the container of the core ID, and would also be the container of the perma_id, but they differ in many characteristics: 1- the datatype (the core ID is an serial integer and perma_id can be non-serial or even hierarchical value like an IP number); 2- the obligation (perma_id is not necessary in all elements); 3- the backup/restore process (core ID will be refreshed with a new value); 4- the move of the perma_id from original element to a new/evolved element, to fit its concept in better editions of the map or reality evolution; 5- the perma_id can be implemented as tag (or even as lookup table) instead core attribute.

has public utility (a concept): as an OSM's point of interest concept, it have some tags associated and can be characterized as map feature. Is possible to to check "importance" (notability or utility-stability) of the feature, through some objective criterion – or, in the absence of criteria, through voting.

has a time-scale of reference to say "is stable about time" ("not changed"). The time-scales for mountains are bigger than a museums, that are bigger than restaurants or pubs.

has a time-class: a practical way to assign time-scale to an object. The time-class can be inferred from element's tags and metrics. PS: "class geographical" (rivers and mountains) and "class administrative" (countries and cities) objects have different global time-scales. And subclasses for smaller objects: a mountain range have a different time-scale than a little mountain, a city have different time-scale than a country.

has creation and extinction criteria to attributes like "creation year" and "extinction year". PS: when a natural object like a island is extinct, its perma_id persist, and by the perma_id its geometry can be restored from some "official OSM backup".

has error-position reference to say "is stable about position" ("not changed its position"). 1km, 10km, 1m, 5m... each kind of object have an admissible error-position.

has error-concept reference to say "is stable about concept" ("not changed its public utility"). Is acceptable to a pub change to a restaurant, but not to change to an hospital. Is acceptable that a city changes its name, but not that changes from "official city" to "non-official" or to "official district of other city".

So, the uniqueness of the perma_id is about this working definition: there are a unique OSM-element with that identifier.

Non-persistent IDs

There are good candidates to "persistent place-identifier", but all fails in the main property, that is to ensure persistence. In this context of non-permanent IDs, the most important example is the Nominatim's place_id that is "independent of geometry".

NOTICE: the Nominatim's place_id place_id is only an internal parameter of the engine. You cannot use place_id for anything, it is a technical database key and depends on a single Nominatim instance.

OSM external persistence implementations

Implementations that are "non-official", where the implemented perma_id is not a tag neither an XML-attribute of dumps or backups. In the case of an API (eg. an ID-resolver), is "external" in the sense that the URL of its endpoint is not implemented with the openstreetmap.org domain.

Query-to-map

See Query-to-map. Preserves the "permanent name" (name and type) of an OSM feature in the service tools.wmflabs.org/query2map. Use name as main identifier, and key (and types?) as "namespace" for name.

OSMLR

As Github's project opentraffic/osmlr (see also blog presentation) is a complex "backup and lookup" system that ensures persistence of the ID of "almost any stretch of roadways in OpenStreetMap".

Have good historical data, so we can use it tho check our stability hypothesis.

Problems and solutions

For each reasonable problem there is a reasonable solution (to be detailed in the future implementation), and so far, within the working definitions elaborated at the beginning of the article, no major problems were detected, which would impair the Persistent Place Identifier.

Defining classes of OSM features

To classify OSM features (when it will be assigned with perma_id) according tags that describe the element, in a more coarse set of map feature, we can imagine some basic groups, labeled by an arbitrary group-number:

Each group have a difference scale-correlation behaviour, so is necessary to characterize group before to characterize time and spatial scales of the OSM feature.

Defining spatial scale of a OSM feature

Examples of scales in Geography

Scale

Length

Area

Local (micro)

1 m … 1 km

1 m² … 1 km²

Regional (meso)

1 km … 100 km

1 km² … 10,000 km²

Continental (macro)

100 km … 10,000 km

10,000 km² … 100,000,000 km²

Global (mega)

> 10,000 km

> 100,000,000 km²

There are some usual spatial scale definitions in Geography, and simple database functions (ref. PostGIS) and approximations as ST_Length(), ST_Area or ST_Area(ST_Envelope()) that will automatically classify the element (way or relation) that represents a taged feature.

The use of the scale, by other hand, is to estimate error position, and the "acceptable error" is an subjective criteria. For example maps of some nations of Africa and souh america can accept big changes, enquanto mapas de certas nações da europa podem não aceitar.

Defining time scale of a OSM feature

Assigning the perma_id

The rules to say "ok this OSM feature can be assigned to a perma_id", because (supposing to) we can't assign a perma_id to all nodes of the OSM map, there are a "preservation cost", so we must to reduce or to avoid exaggerations.

Supposing all elements passed in a simple "stability check" and potential watchers before assign, there are two main ways to assign:

Human decision: voting pull, in a scale-related watchers (city or country) local community.

Ideal and practical position-reference

The "has error-position reference" property (see begin of the page) to ensure that a OSM feature not changed its position — with OSM-user edits in the map, or with some natural evolution of the reality.

The ideal is transformation like TopoJSON, ST_Simplify, etc. but, for practical and low-cost implementation, the only "last position in the map before changes" that we need to check is the centroid (eg. PostGIS's ST_PointOnSurface) or the BBOX, and validate changes against some error-position criteria (see "Defining spatial scale" above).

The "change validation" algorithm is not so simple... And can be implemented in only one or in many moments of the workflow:

On an OSM's editor: ideal as pre-processing some basic validation and warning user...

On the OSM's Editing API: the correct locus for ensure continuous control and quality.

On a quality-control tool: a "long time" checker (eg. each year) and review task. System low-impact, software low-cost, but human high cost. Ideal for first experiments with perma_id.