User:Vid the Kid/On ref formats

There has been a fair amount of discussion over the last several months regarding what to do with ref tags on ways, in the presence of route relations. There are three common suggestions: keep refs in precise, "machine-readable" formats; keep refs in natural, concise formats; or drop ref tags from ways altogether. A fourth suggestion is to keep refs in formats that are both "machine-readable" and "human-readable" but that's really more like the first suggestion, with the assertion that "machine-readable" formats are or can be simple enough for human consumption. I strongly prefer the second approach myself, and I intend to support that preference here by listing the pros and cons of each approach, from different viewpoints. But first I need to set out some definitions, observations, and general viewpoints.

Definitions

User

A human who consumes OpenStreetMap data or, much more commonly, products such as maps or directions derived from OpenStreetMap data.

User agent

A computer program which processes OpenStreetMap data, possibly in addition to data from other sources, to produce a product designed for human consumption; typically, a map renderer or routing/directions engine.

Mapper

A human who generates OpenStreetMap data.

Local mapper

A human who generates OpenStreetMap data from first-hand knowledge and experience of the area being mapped.

Network

A particular class of route designation. In North America, the network of a route is usually identified on signs solely by the shape and color of the route marker, which contains the route's numeric (or alphanumeric) designation. When writing or typing a route designation, however, people often represent the network by a prefix, such as "I-" or "SH". This prefix is usually an abbreviation of the spoken form of the network. In Europe, it seems that the network of a route is almost universally identified by a single-letter prefix which could be considered a part of the route's numeric or alphanumeric designation. This format is used in speech, in written and typed designations, and on the actual signs themselves.

Machine-readable

A strict syntax of writing one or more route designations in a way's ref value, so that a machine can unambiguously parse the routes identified. The syntax would have a defined delimiter to separate multiple values (probably a semicolon or semicolon-space sequence), and a defined delimiter (such as a space) or lack thereof between a network prefix and a numeric route designation. The network prefix would come from a defined list, and may be defined to be blank for some networks.

Human-readable

A style of writing one or more route designations that can be understood by a human. Since even machine-readable ref values will likely be written by humans, they will also technically be human-readable, though often not well-suited for human consumption.

Human-friendly

A style of writing one or more route designations that may have a consistent syntax, but closely resembles how a human might identify the routes.

Locally natural

A casual style of writing one or more route designations, without a fixed syntax. The goal is to identify route designations as a local human would, without regard for machine-readability. Multiple routes of the same network may be collected, for example "US 1-9" or "I-80/90". If a way is part of many routes, some of the less-important routes may be omitted from the ref value in this style, for the sake of clarity or brevity. This style also qualifies as human-readable and human-friendly.

Observations

Expecting User Agents to Do Extra Processing May Be Unrealistic

Beware of setting tagging conventions which suggest or require most user agents to do some processing on tag values. Hypothetically, if the OSM community "decides" to go with machine-readable ref tags on the assumption that most user agents will make them more human-friendly, the desired human-friendly output will depend on each user agent independently developing the necessary processing code. It's much more likely that any given user agent will simply pass the machine-readable ref directly to the user, and justify the lack of processing with the observation that machine-readable ref tags can be deciphered by a human, even if they're not how a human would naturally express the information.

There's precedent here. Apparently it was "decided" that street names should never be abbreviated in the OSM data, even if all of the street signs abbreviate the street name in the same way. (Actually, I don't think such a situation was envisioned by those making the decision, though it's pervasive in the US.) Part of the reasoning was that abbreviations should be done by the user agent. I have yet to see a renderer or directions engine that automatically abbreviates OSM street names.

Route Relations are Good Things

Route relations are good. They can be used to describe designated routes in better detail than simply using the ref tags on individual ways. Different routes that share the same way are kept separate in the data, so one doesn't disrupt the other. Directionality can be specified in a straightforward manner (no pun intended). Creating route relations is quick and easy in the Potlatch editor. If you're going to go to the trouble of "fixing" the format of the ref tags all along a route, you might as well create a route relation while you're at it, whichever editor you use. This is why I expect route relations to become ubiquitous in OpenStreetMap data; they already are in some places. Whatever convention is chosen for ref tags on ways, it should make sense alongside complete route relations.

Viewpoints

Mappers

Mappers usually know well their home areas, and want OpenStreetMap to map their home area as accurately as possible. At the same time, mappers tend to want OpenStreetMap to be generally consistent around the planet, or at least across large regions. Occasionally, a local mapper's idea of accuracy might appear to conflict with a non-local mapper's idea of consistent mapping practices. The solution is to communicate well, so that the commonly-accepted practices can be applied to local knowledge of the area. That's what this wiki is for. In the case of 'ref' tags, what's "commonly accepted" is currently the subject of debate.

User Agents and Programmers Thereof

User agents which care about route designations — essentially, map renderers and routing/directions engines — can process and present these designations in a number of ways:

Take the way's ref value and present it directly to the user. Every user agent I've seen, other than CycleMap, does exactly this.

Search the ref value for a multi-value delimiter (semicolon) and split it into multiple values, then:

Present each ref value directly to the user.

Attempt to strip the network prefix from each ref value, and present each numeric designation to the user.

Attempt to parse each ref value into a network and numeric designation, then present each route symbol in a natural way. (For directions, "natural" would likely be the written form of the route designation, using a common prefix which may or may not be the prefix used in the actual ref tag. For a map, "natural" could be that, or it could be be the route's numeric designation, inside a symbol whose shape and color is determined from the network prefix and attempts to resemble actual route marker signs.) This is the ideal way to treat machine-readable ref tags on ways, but it requires careful consideration on the part of the programmer to write code that works for all situations and is tolerant of noncompliant syntax.

Assume the ref tag has only one value, attempt to strip any network prefix, and present the numeric designation to the user.

Assume the ref tag has only one value, attempt to parse it into a network and numeric designation, then present the route symbol in a natural way.

Iterate through the route relations of which the way is a member, and for each:

Present only the relation's ref value to the user. CycleMap essentially does this.

Use the relation's network and ref values to present the route designation in a natural way. (For directions, "natural" would likely be the written form of the route designation, using a common prefix which may or may not be the value of the relation's network tag; this could be achieved using a lookup table, or guessed by taking everything after the last colon in the network value, if present. For a map, "natural" could be that, or it could be be the relation's ref value, inside a symbol whose shape and color is determined from the network value and attempts to resemble actual route marker signs.) Since the route's network and numeric designations are already separate in the data, and separate from those of other routes on the same way, the user agent does not need to attempt parsing complex and error-prone strings.

Which approach is used depends on a few factors: the intended use of the user agent's output, how much time a programmer has to develop the user agent, limitations of the user agent's operating environment, and the programmer's perceptions of prevailing tagging practices and the completeness of route relations.

Highway / Transport Enthusiasts

Highway and transport enthusiasts can be users or mappers. Often, they are both. For academic reasons, it is important to these people that routes be represented completely and correctly. Route relations are ideally suited for this purpose, in a data storage context. However, only a very few current user agents present information from route relations in their output.

Casual Users

The casual OpenStreetMap user consumes only the output of user agents, and not raw OSM data, for some practical purpose. Often, this practical purpose is to navigate from point to point. Casual users do not need (or want, usually) to know everything about a road; only some basic characteristics (such as a rough idea of its importance) and a way to identify it (a name and/or a route designation). If many routes overlap on the same piece of road, a casual user is probably only interested in one or two of the more important ones. A map or driving instruction that attempts to display many route designations for a single piece of road, particularly in a syntax with the requirement to be unambiguous, will seem cluttered and difficult to read for a casual user.

Pros and Cons

All of these pros and cons are to be considered in the context of a situation where route relations have been completed.

For Mappers

Machine-readable ref values

Human-friendly ref values

No ref tags

Pros

One clearly correct syntax for ref values

Opportunity to produce maps that match well with local vernacular

No redundant route tagging

Cons

For ways that are part of many routes, ref values can get long and error-prone

Data is redundant between way refs and route relations

Remote potential for edit wars over ref formatting (though this is more often caused by current debate over strict vs casual refs)

Removal of data may seem "just wrong"

For User Agent Programmers

Machine-readable ref values

Human-friendly ref values

No ref tags

Pros

Strict syntax theoretically makes ref values easier to parse, if user agent opts to do so, without need to handle concept of relations

Refs ready-made for human consumption can be output with no processing at all

Cons

Writing code to accommodate possible nonconforming ref values can be a programming headache, if ref tags are used rather than route relations

User agent will be "expected" to produce human-friendly output, (from either relation tagging, or ref tag) possibly requiring many network-specific formats to be coded

If user agent is meant to display field-correct North American-style route markers, concept of relations must be supported

Concept of relations must be supported

For Transport Enthusiasts

Machine-readable ref values

Human-friendly ref values

No ref tags

Pros

Entire route topologies are represented without relying on relations (though relations are still better than way refs for that task)

For Casual Users

Increased likelihood of route designations presented in a way that instantly makes sense to the end-user

Cons

Some user agents (the ones that don't process refs into human-friendly formats) could produce route designations that clutter the map/directions and/or are nontrivial to decipher

Machine-readable syntax for Interstates would likely be "I 2" or "I2"; without a hyphen, in print the letter I could be mistaken for a numeral 1 making the designation easy to misread. (However, if the user agent processes the ref into a human-friendly format, a hyphen would likely be introduced.)

Some user agents could produce output with no route designations at all

Conclusion

If one gives each "pro" a value of +1 and each "con" a value of -1, with equal weighting independent of viewpoint, then "human-friendly refs" wins with a total of 1. "No ref tags" has a total of -1, and "machine-readable refs" has a total of -3. These totals may change in the event new pros or cons are brought to my attention.

Of course, my opinion was strong before I listed and totalled the pros and cons. I believe that human-friendly ref values are the way to go. Specifically, I support "locally natural" refs, in cases where the mapper knows how locals refer to roads. The reasons behind my preference are well-represented in the pros and cons above, though I'm not sure it's fair to give equal weight to all viewpoints. I am an OSM mapper and a highway enthusiast. I am also a programmer, though I have not yet technically contributed any code to an OSM user agent project. Yet I give significantly less weight to those viewpoints than to the viewpoint of the casual user. We want OpenStreetMap to help people, right? We want it to be useful to everyone, not just roadgeeks, GIS folks, and open-source fans. The way we tag has both direct and indirect impacts on the experience of the casual user, and we really should keep that in mind when we decide on tagging guidelines.

Reactions and Additional Input

I welcome input from others on this matter, especially additional pros or cons, or viewpoints I might not have envisioned. But, as this document is in my user space, please don't edit this page directly. Instead use the discussion page (remember to begin each new thread with a heading) and I'll make a good-faith effort to integrate others' ideas into this page.