tyr_asd's diary

Recent diary entries

Want to get an idea about what contributions are happening in OSM in your region? Maybe you're even looking for a way to better review map changes (hint, hint)? A good starting point is probably my latest-edits page tool. It displays all objects that have been modified during the last week alongside with their respective changesets.

One major drawbacks of the tool was that until recently, deleted map objects (and changesets consisting solely of deletions) were not displayed on the page. Now, deleted objects are displayed as faint "ghostly" outlines on the map. In the same way, you can now also see how modified objects looked like before the respective contributions happened. This can result in a nice looking "shadow" effect when for example buildings have been realigned to better aerial imagery.

Also, you can now select between the latest changes from the last day, week, or month, and directly zoom to the location of a particular changeset:

The tool still doesn't show intermediate states of objects that have been modified more than once in the selected time period, as well as modifications to relation objects.

Programmatically, this new version is implemented by using an Overpass augmented diff query (switching over from a "regular" OverpassQL query using the newer statement) and some massaging of the returned data to get them properly in shape (it sorts the data into two buckets – one for the state of the data before the respective edits and one for the current state – then uses osmtogeojson to generate GeoJSON which can be put on the map).

Tomorrow is the 10 year anniversary of OSM's API version 0.5. This is the version of the OSM-API that first exposed (among other things) the version number on all OSM objects, making it possible to access the full history of every object modification from this point onward.

This means that very soon, the full history planet file published on planet.osm.org will contain more than 10 years of editing history which can be investigated, evaluated and analyzed (using tools like the OSM history database oshdb that's currently under development at HeiGIT on the University of Heidelberg, which I presented earlier this year at the State of the Map).

Of course, OpenStreetMap as a project exists for a bit longer than that (about 13 years now) and there was already quite some data mapped before the OSM API 0.5 was introduced 10 years ago.

Here's an interesting side note: There already existed a history call in OSM's API 0.4 (and apparently even in 0.3, see the comments below), but unfortunately this historic data apparently hasn't been preserved in the newer versions of the OSM API, meaning that it is also not available in the (relatively) easy to use full-history planet dumps which are available nowadays. As far as I can tell, this "prehistoric" OSM data has basically been lost (though some of it might be reconstructible by analyzing the list of very old planet dumps on planet.osm.org). Does someone of you perhaps know what exactly happened to that data back then?

The latest installation of my yearly osm node density visualization is now online: https://tyrasd.github.io/osm-node-density shows the freshest data from mid 2017 (while the results from previous years starting with 2014 are also available on the site's layer selection menu).

Brand new on overpass-turbo.eu: In addition to storing queries locally on your browser, you can now also save your important queries on your OSM account and access them from everywhere. Here's the new option visible on the save dialog:

OpenStreetMap's standard map layer is used by many people each day. OSM even provides a dump to the access logs from which one can see which parts of the world is viewed how many times for each day. Inspired by Lukas Martinelli's work on Parsing and Visualizing OSM Access Logs, I've worked on a web app to visualize OSM's tile access logs. Contrary on Lukas' approach, I wanted to focus on an accurate representation of the data and wanted to make something works for the whole (Mercator-projected) world.

I've ended up with a viewer that takes an (uncompressed) log file from planet.osm.org and produces a two-dimensional histogram for each zoom level: For example, at zoom level 6 in the viewer each pixel on the viewer represents the number of accesses of the corresponding osm.org-tile at zoom level 14. That's 8 zoom levels further in – or, put another way, each 256x256px² osm.org-tile is represented by a single pixel in the visualization.

The number of accesses of each tile is represented by a logarithmic color palette:

With this one can for example compare map views before and after a specific event, for example the recent earthquake in central Italy:

But one can also see some interesting artifacts in the data, for example the large amount of tile views around null island or those (to me inexplicable) "comet tail" shaped patterns at some Russian cities. Do you have an idea where these artifacts stem from?

Making-of

warning: the rest of this post will be a bit more technical

What bugged me a bit at first was that my initial implementation was quite slow and made the webapp unresponsive. On my machine the first version took about 40 seconds for the initial processing step (between dropping the log-file onto the page and the first displayed results), which is quite a lot! Meanwhile those calculations were blocking the main UI thread and even causing this nasty browser-popup to appear:

So, what can we do about that? As always, optimizing this kind of stuff starts with some profiling and goes through multiple iterations of optimizing and refactoring with more profiling in between. In the end, I managed to cut the time down from 40 seconds to a mere 9 seconds in the current version:

40 s – initial version

24 s – low hanging fruit

15 s – optimized parser (ArrayBuffer)

13 s – default rbush accessors

14 s – web worker to render tiles (1 worker)

14 s – web worker to render tiles (4 workers)

9 s – parsing in own (single) thread (4+1 workers)

Let's go though each of these steps, but let's start with a short overview of the code structure:

Code overview

The code of the visualization isn't very intricate, it basically just parses the tile log files (which are txt files containing pairs of tile-coordinates z/x/y and their respective view counts, see below), put's them into a spatial index (I'm using rbush) and finally grabs the data from the index whenever a tile is requested to be rendered. (Then, the rendering of the tiles is just some pixel-pushing onto a canvas which is quick and wasn't an issue that I had to look much into here.)

Here's what the access logs look like: (this goes on for ~6,000,000 lines, or about 100MB of data)

13/4316/2511 20
13/4316/2512 18
13/4316/2513 16
13/4316/2514 14

low hanging fruit

That's the flame graph of the first profiling session. There are clearly two distinct processing steps, one relatively flat calculation, talking about 25 seconds and another more recursive which took about 10 seconds. The second portion is quickly identified as rbush building up it's indexes (which is already pretty much fully optimized, I'd say). But what really stoked me was that the other part corresponding to the following few lines of code took up much more CPU:

Pretty basic string operations, as it looks at first. Looking at the profiler again reveals that, of course, there's a regular expression (/[ \/]/) in a hot loop and converting strings to integers using the Number constructor isn't the fastest one can do either.

optimized parser (ArrayBuffer)

Now, we're still opening a function scope for each line in the input data and are working with relatively costly string-operations such as split and the + operator (to convert strings to numbers). Getting rid of that was quite fun and resulted in the biggest performance gain after which parsing was 90% faster than at the beginning!

What I ended up doing was to implement a custom parser that works on the raw byte-data (using ArrayBuffers), presuming that the log files are well structured. In its heart is a for loop that walks over all bytes of the data and manually constructs the data:

One interesting line to note is currentInt = currentInt*10 + (view[i] - 48 /*'0'*/): whenever we don't see a separating character (newline, space or /), we assume that it must be a numeral, whose value we can get by subtracting the ascii code of 0 from.

default rbush accessors

The next optimization is a rather small one, but one I came across after the recent 2.0.0 release of rbush: apparently, it's faster to access named attributes of a javascript object rather than elements of a javascript array. Changing the parsed data output to something that can be digested easily by rbush shaved off a few more seconds of the preprocessing.

web workers!

Even after all those optimizations, the calculations (even though they are relatively quick by now), are still blocking the UI. That isn't a big deal during the initial processing, but the main-thread implementation means that rendering of the histogram tiles also blocks the browser. And even though each rendering is quick (typically only a couple up to a few tens of milliseconds), these small interruptions can add up significantly especially in situations when one pans around quickly or zooms in or out. It's a bit hard to see in the gif below, but trust me: it feels quite laggy!

The only solution to this issue is to do rendering in a separate web worker thread. The implementation is a matter of refactoring the data parsing plus rendering code into a web worker and making sure that the returned data is a transferable buffer object. Using a single web worker, this is a bit slower than the non-threaded version, but not too much.

multi threading, first try

When we run a web worker anyway, why not multiple in parallel? That should make rendering of the tiles even faster, right? Well, not really in a naive approach: As every worker needs to have its own spatial index and there's no way to effectively split the input data into distinct chunks that can be rendered independently later on, the total time with 4 workers is basically the same as with a single one (the overhead of having to duplicate the input data eats up any later gains in faster building up of the indices).

multi threading, second try

Doing multi threading properly in this case is a bit of a larger refactoring, but the effort is worth it in the end with additional ~30% faster processing and even smoother map panning and zooming.

Here, I've split the data parsing into a separate web worker which runs single-threadedly (this could in principle also be parallelized, but it's not worth the effort in my opinion, as this step is quite quick with 2 seconds already – but, potentially one could shave of another second or so). The results of this parsing are then divided up into buckets of transferable ArrayBuffers (which are always important when working with web workers) and distributed among the rendering workers.

That's how deep I dared to explore this rabbit hole of code optimization this time. I hope you liked my adventure. ;)

In OpenStreetMap, tags define what an object is. Whether it is a mountain, a river, a house, or a postbox: Every map feature has it's own tag (or set of tags).

OSM doesn't have a fixed set of object categories. Over time, a more and more faceted and diverse set of features got mapped in OSM, thus the amount of different tags grew. At the same time, sometimes, tagging of a specific thing changes: Features that used to be mapped with one tag, get newer, better and more refined tags. That's OpenStreetMap evolving.

Of course, OpenStreetMap is also still growing, but not all the tags are getting more widely used at the same pace: For example, while it's quite possible that most of the world's railway stations are already mapped in OSM, there are still many juicy pastures left to be mapped out there.

While there exist superb tools to get to know about the current state of all tags used in OSM (Taginfo most notably, but also the Overpass API to some extend), until now it was quite difficult to get oneself a good picture of the data evolution process. For example, questions like: from when on a specific tag was getting used, when an obsoleted tag got taken over by a different one or which tags got more traction lately are difficult questions to answer with OSM's current tool set.

For some of these questions, people programmed their own solutions, each answering their own question, like how many km's of Italy's roads were there in OSM over time(link), or how many buildings have been mapped in Austria(link). Similarly, the OSM-Analytics platform has recently started to provide such statistics for arbitrary regions for a limited set of map features (currently one can choose between buildings and roads, but there are plans to add more in the near future). What all of those tools have in common is that they can't handle the full variety of tags that's so essential in OSM.

To step into the gap between tools like taginfo (where the full variety of OSM's tags is so beautifully visible – stay tuned for Jochen's talk on SOTM in a couple of weeks!) and the more specialized tools like osm-analytics, I've created taghistory which allows one to get a historical usage graph for each of OSM's tags (with daily granularity) and to compare different tags against each other:

The tool is currently in it's very early stage, the're many things to do and improvements to be done. It's also important to note that the historical usage of a tag is currently only defined as a the respective number (count) of OSM objects! That's similarly to the statistics produced by taginfo, this metric is subject to the some limitations, most notably the effect that one cannot directly compare the number of tags used for different linear and polygonal features such as roads, land cover, etc. because such features are typically divided up into many OSM objects of different sizes. For example, an existing road may be divided up into two pieces when a new turn restrictions is added, resulting in that the count of each of the tags used on the road (even obsolete ones) is increased by one in the OSM database. That means that one needs to pay close attention when comparing tags that are typically used on such features, even when comparing subtags that are typically used on the same kind of parent object (e.g. different values of the highway tag).

That being said, have lot's of fun while digging into the depths of OSM tags' history. Here's the link of the tool again: http://taghistory.raifer.tech/ (and the link to the project's source code repository and issue tracker: https://github.com/tyrasd/taghistory). What's your favourite tag? I find the created_by graph quite interesting:

A significant difference from the 2014 edition is that the density is now calculated as OSM-nodes per m² on the ground (as opposed to nodes per projected pixel in the previous version). If you want to learn more about why this change makes a difference, I'd strongly recommend the following article by Christoph Hormann (imagico).

The updated slippy map lets you chose between the layers of different years (always created around end of June of each year) as well as a display of what changed during each 12 month interval. For example you can see that in North America, there were some imports as well as some major import-cleanups going on during the last year:

overpass turbo has been around for a little over two years now. In this time, it arguably changed how developers and mappers interact with OSM data. Let us take this opportunity to look back and take a glimpse at some statistics:

users

The user-base has more than quintupled from the initial group of early adopters as can be seen in the following Piwik graph:

Note that the actual absolute number of visitors is likely significantly higher than what is reported here, because surely many of you have the do-not-track flag activated or are using tracker blocking software in your browsers. Speaking of it – as of today you can opt-out from any tracking on overpass-turbo.eu also by simply switching it off in the settings dialog under the privacy tab.

shared queries

Shortly after its release, overpass turbo got the ability to share queries in the form of short URLs (e.g. http://overpass-turbo.eu/s/4). Here is some insight into what queries people have been sharing since then:

This map shows the locations associated with each shared query:

Of course, central Europe is quite the center of activity, but in general the tool seems to be used all over the planet, which is nice.

The next thing we're looking at are the two query languages. In the beginning overpass turbo preferred the Overpass XML variant (in code examples and queries generated by the wizard). Later, this default was switched over to the QL query language. This can be seen in the following graph: red is XML, blue stands for QL, brighter colours stand for queries that are taken or derived from output produced by the query wizard. Each column represents a set of 512 consecutively shared queries. Note that this means that the x-axis isn't a linear function of time [timestamps are not stored in the short-url database as they aren't needed to provide the service].

One can immediately identify two main events: First, the introduction of the query wizard in Dezember 2013, and the above mentioned switch from XML to QL as the default query language in October 2014.

Another interesting fact: about 10% of all shared queries use some amount of MapCSS styling.

The question now is what will the next few years bring? Let's find out! ;)

geometry options

out center; – this additionally prints the center coordinate for every OSM object

out bb; – this additionally prints the bounding box coordinates for every OSM object

out geom; – this additionally prints the full coordinates of every OSM object

The first two options are particularly usefull if one is only interested in the approximate location of some features, rather than their exact outline. For example, one finds that POIs are often mapped on building outlines. By requesting only the center coordinates (out center;), one saves transfer bandwidth and gets an overall quicker query.

Here is an example how this looks like (note that out of the 7 displayed POIs, 6 are mapped on ways in OSM): (try it on overpass turbo)

The full geometry (out geom;) option replaces the need to use object recursions to get the geometry for a certain OSM way or relation. This also saves bandwith and generally comes with faster query execution times. (In this example it saves almost 50% of the data.)

With osmtogeojson version 2.2.0, all this is also available on your command line or to be used as a javascript-library in your projects!

regular expressions for keys

I know that lots of people requested this feature, and now you can finally use regular expression on keys, for example to get all nodes with some kind of name-tag:

node[~"(^|_)name($|_|:)"~".*"]({{bbox}}); out;

Here, the additional tilde in front of the key indicates that the regex-search should be extended to the key as well.

This is also accessible via overpass turbo's query wizard. Try this for example: building=* and ~"addr:.*"~".*"

Overpass QL is now the default

Queries generated by the wizard, a template and the query examples are now printed in Overpass QL. Overpass QL has a more concise syntax, is faster to write and more and more documentation and help is available for it. (Of course anyone can still continue to write, use and execute queries in the older XML syntax.)

Each pixel shows the number of nodes in its corresponding area¹. But this year every point that has data in it is shown (i.e. there is at least one node at that location - last year only locations with more than 1000 nodes were included). Also, the slippy map has two more zoom levels which reveal even more impressive details like on this crop of the central Alps:

Here is a low-zoom image of the whole planet:

¹ Yes, this is Mercator map tile area, not actual on-the-ground area. Keep this in mind when comparing regions at different latitudes!

PS: The visualizations are based on a planet file I downloaded one or two weeks ago. It was processed using some custom scripts based on node-osmium, the graphics were made with gnuplot (just like last years') and finally the map tiles for the slippy map were cut using imagemagick. I could probably explain the individual steps in a separate blog post, if anyone was interested - let me know!

Text Markers in MapCSS

You can now finally use MapCSS to display text markers on the map. This is very useful to show the names of displayed features. The following MapCSS statement adds text markers to each node, way or relation on the map and displays whatever is in the tag name:

Export to Level0

Overpass turbo can now export data into the Level0 editor. Level0 is a very lightweight low-level textual online OSM editor. It can be very handy for example when inspecting tags of a collection of similar objects (such as when checking the names of all restaurants in town).

Additional Stats

When you hover the data-counter box (which shows you how many elements have been loaded and are displayed), you get presented some additional statistics. Currently, only the lag of the Overpass-API server behind the main OSM db is shown, but there is room for more. Is there something you would like to see there? Let me know!

Wizard as URL-parameter

You can now create links to overpass turbo that use the query wizard to create a query (which is loaded on startup and presentet to the user). An example is http://overpass-turbo.eu/?w=pub+in+dublin. Such URLs to overpass turbo are both quite short and at least somewhat human readable.

overpass turbo's query wizard just got a little bit more useful! It understands not only OSM tags, but even interprets object classes: Just use simple terms such as Restaurant, Museum, Hotel, Hospital or any other thing, the iD editor already has a preset for.

This even works in your own language, as you can see in the following screenshot (Trinkbrunnen is German for "drinking fountain"):

Btw: if you can read German, you can find some more information about this in my recent guest blog article on blog.openstreetmap.de.

overpass turbo is a helpful tool for hundreds of mappers every week. But it could be an even better tool for even more people if only there was a way to assist with the creation of actual overpass queries. You know that it's quite tedious to type all those queries that are mostly the same all the time. Not to mention that one must remember all those overpass statements and their parameters all the time.

But here comes a way that makes overpass turbo both easier to use for beginners as well as quicker to use for experienced data-miners:

The Query Wizard

Here is an example: To get restaurants, now all you have to do is fire up the Wizard from the tool-bar (or using the keyboard-shortcut Ctrl+I) and type in the appropriate tag amenity=restaurant:

The wizard is quite powerfull: It understands different kinds of tag searches as well as some meta data filters which can be joined together with logical operations like and or or.

By default, data is searched in the current map viewport. But the Wizard also recognizes some other location filters. For example, you can simply write tourism=museum in Vienna and the query will work just magically.

New Overpass-Shortcuts

If you tried the museums in Vienna example (tourism=museum in Vienna) from above, you may have noticed the following line in the produced Overpass query:

<id-query {{nominatimArea:Vienna}} into="area"/>

As the Overpass API doesn't really know which Vienna it should use to search in, we ask Nominatim instead. The {{nominatimArea:Vienna}} part is then replaced with the details of the Nominatim search results (just like {{bbox}} is replaced with the bounding box coordinates).

Appart from the just mentioned nominatim-shortcut (which actully comes in several flavours depending on your use-case) there is also a new {{date:*}} shortcut, which allows one to specify relative dates (typical use-case: <newer than="{{date:1 week}}"/>). Read more in the docs.

A while ago, the SASA (a local public bus service operator in South Tyrol) gave us the permission to import their bus stops from opensasa.info into OSM.

Here is the result:

Btw: In the meantime SASA is working on getting their real time bus information out to the public. Take a look at their public beta! And of course they are using a beautiful OpenStreetMap background map. :)

It's been a while since I last reported about new features in overpass turbo. Surely, the tool got a noticeable amount of new features in the meantime, so I guess it's time for an update.

export / gist / geojson.io

The export-tool got the most visible changes, which now has a cleaner interface, provides downloadable content (via FileSaver.js) and – brand new – the possibility to publish the data directly as a Gist (the pastebin service by Github that loves maps):

Via Github Gists it's easy to edit an OSM dataset with geojson.io, the online geojson editor. For example, you can go ahead and export your favourite Danish island to create your own geojson.io map based on that:

translations

overpass turbo is now translated through Transifex, an easy to use online translation tool.

For a start, overpass turbo already got a brand new translation into Danish by Søren Johannessen (which is now the third language in which overpass-turbo is available after English and German)!

If you'd like to help translate, simply create an account on Transifex and head over to the project page there.

Edit 2013-09-24: The first bunch of translations just went online: Russian, Vietnamese, Croatian, Dutch and French. Huge thanks to all Contributors! :)

Edit 2013-10-19: Meanwhile, the next two languages are active: Italian and Japanese. Thanks!

Edit 2013-12-12: Spanish follows.

PS: Just drop me a message if you need me to activate another language if it's not already on the list.

Now, you may look for a tool to keep track of all the progress made. Here it is! It lets you compare the new osm-carto tiles with a rendering which is almost the same as the old OSM-style.

The reference rendering is provided by the Wikimedia toolserver. So, please note that the reference rendering may not be as up-to-date as osm.org and there may be glitches that are not related to the stylesheet itself.