joost schouppe's diary

Recent diary entries

Over the passed year, the Belgian community was involved in organizing 10 mapathons. It is an incredibly easy thing to do, once you have the documentation in order. And once you realize you should do as little as possible - just find people who have a location and a recruiting network.

Some time ago, Pascal Neis wrote an article about new mappers recruited through classic channels, Maps.me and humanitarian mapping. I asked and got a changeset dump of all the people who participated in our mapathons.

Here's some stats about that.

Overal, 1925 unique mappers participated in our mapathons, of which 328 were new mappers.

First, did we manage to turn them into returning mappers?
Well... As could have been predicted by Pascal's depressing numbers: not really. The data used was from December 2016. You can clearly see that the percentage having more than one mapping day drops as we approach December. That simply means you need to wait a bit before you can do a decent analysis.

Say we give people 3 months, then we only look at the edits from September and before. We got 23% percent of people to map more than once! 10% mapped 3 days or more.
Unfortunately, that's even slightly worse than the international average. Maybe we just worked for a more difficult audience :)

We usually tell people to map something in their own neighborhood before starting on the mission. Less than 21 of them did so. And in fact, only 4 of the 328 have more than one Belgian mapping day. As a comparison, we had 2059 people mapping for the first time in Belgium in 2016.

Even if that all sounds thoroughly depressing, it should be noted that organizing mapathons still is a great way to build a community, even if it doesn't show in these numbers. The mapathon movement was crucial in turning mappers into organizing volunteers. Especially the two interuniversity mapathons (with 200 participants last year and over 300 this year) are momentum-building moments. For the State of the Map in Brussels, we somehow managed to recruit 20 Belgian mappers to help out. That would have been impossible without the mapathons.

Apart from that, the constant confrontation with people who don't have any idea about OpenStreetMap, is a stark reminder that we should all keep up the missionary work.

There's some very clear patterns there. I really like how you can connect the dots for contributors of the "default editor" at the time: first Potlatch, then Potlatch2, then iD. All three of them reached 80% market share at their peak. But iD went down in relative terms because of Maps.me. That could only happen if Maps.me editors don't use iD much. That's a good thing, as it show they are new mappers. And it's a bad thing, as it shows that we haven't (yet) succeeded in getting them more deeply involved in OSM.

To make some of that more clear, here's three more charts.
Changesets per contributor show that JOSM users are quite productive. There's also a very clear growth path for JOSM users. Merkaartor has a similar pattern. Maps.me hardly shows, with just 4 changesets per contributor.

Some changesets are bigger than other. JOSM changesets are the biggest. Potlatch2 are somewhere in the middel, and iD changesets are quite small. The average Maps.me changeset has only 2 changes.

So what's the overall productivity of contributors? Here JOSM is quite extreme.

Note that this doesn't say anything about quality or amount of work. For example a JOSM changeset editing thousands of objects could have been made in minutes. Someone could have surveyed a day to collect ten POIs and map them with iD.

As one of the few remaining Potlatch users, I had to make this graph too:

As Potlatch2 lost the status of default editor, the remaining users became ever more productive. That makes sense, because "low engagement" contributors won't find the way to that editor. So the only relevant numbers are those for 2011 and 2012. And compared to that, the low numbers for iD are striking. Low numbers may mean that more people with less motivation can be pushed to make at least one edit, so you can call that a success. This is the argument to call Maps.me a editing a huge success. But it can also mean that the editor isn't as inviting to work on more stuff than just on the thing you wanted to do. Anyway, a much deeper analysis would be necessary to draw any conclusions on that. You'd have to take account of previous mapping experience, later shifts to JOSM, and possible differences between 2011 and 2016 newbies, to name just a few controls.
Also: the numbers are rising every year, even as it remains the editor for new contributors.

And then there's the good old Potlatch 1 of course. There's only one reason to open that ugly duckling: go to a place where you think something was deleted, press U, and you can see and recover it. It is amazing that no other editor has a similar feature that makes this so simple.

When OpenStreetMap started, open geodata was basically unavailable. Some governments were quicker than others to release their data. And so some places had huge imports from the start. Whether that was a good idea or not is slowly becoming irrelevant: the map is too full for big new imports anyway. Imports are ever more exercises in conflation: merging sources and using them to validate and improve existing OSM data. The good news is that it means that often the same tools for the "initial" import can be used for keeping the data up to date. Continues synchronization between datasets changes the relation between data provider and OSM.

For a government, a complete and reliable OSM becomes a more valid tool for their projects. The synchronization processes we set up, can form the basis for an extra quality assurance (QA) channel for governments. It might even convince some agencies that there is little to be won by managing some of their data on their own.

During the course of the research for that page, I met Tomas Straupis. I wanted to share what he told me about what they do exactly with government data, and what their relationship is with the government.

Interview with Tomas Straupis

Here's a general idea what we're doing in Lithuania.

Government has datasets d1, d2... dn. OSM has one big dataset O
which could be split into datasets o1, o2... om. We take datasets dx
and oy which could be mapped (have similar data, like placenames,
roads, lakes, rivers, etc.)

Automated importing to either direction is impossible (or not wanted
by both sides). Government datasets need strict accountability
(sources, documents) and responsibility. OSM has different data and
simply overwriting it with government data would be bad in a lot of
ways.

So the way integration between OSM and government (and actually any
other datasets) is done is by synchronisation - checking for
differences and taking action (mostly manual) on them on both
datasets. By doing a comparison both government and OSM datasets are
improved. The point here is that government datasets usually use
official (document) source to update data. OSM uses local knowledge to
update data. None of these methods are perfect, so
synchronisation/comparison helps to get most/best of both.
(as a separate note: here comes OSM strength that everything is in
one layer - it is much harder to have a road going through a lake or
building or having a street A with address B along it. Government
datasets are usually separate and controlled by different
institutions, so doing such topology checks is much more difficult
there)

For this to work government must open datasets and appoint a working
contact point where information about problems in government dataset
could be sent and there this information is ACTUALLY used and feedback given.

Do you have more info on the projects, and the software/queries you use?

All info is in Lithuanian... Maybe google translate can help with
the links to Lithuanian blog site I will provide below (if not - just
tell me I will write the general idea in English).

All OSM data is imported to postgresql database using osm2pgsql and
that is used for comparison/synchronisation.

We're doing two types of comparison/synchronisation:
1. POI (point data, for some types of polygons centroid could be used)
2. Road (multi-vector data)

For POI synchronisation we have an ugly but functional universal
comparison mechanism. We convert external data to xml file with lat,
lon and some properties (or external source provides us information in
xml for example via web-service). Then we provide mapping of this
external data to OSM data. So having external data, mapping and OSM
data we can create reports of differences.

So basically we use postgresql/postgis and php. If you have more
specific questions - I'm ready to answer them or send the code, just
it is a dirty code as I'm a google copy/paste "programmer"... :-)

Does the government use your input, and how? Is there something structural? Or just mailing them and hoping they care?

Lithuania is a small country, everybody knows everybody :) Now we
occasionally drink beer with "government" guys working with gis data.
So we know they do change the data. They also give us feedback which
data sets are "more important" for them, so we can prioritise
comparing those. This way both sides are happy and thankful for help.

Additionally each month we take new/updated government data and do
new comparison, so we can see that data has actually been updated.

From more or less "legal" perspective. This central government
agency for gis data allows submitting error reports online for
registered users (registration is free and open to anybody -
http://www.geoportal.lt - created according to EU directive on spatial
data). And they must check and give feedback in 20 days. We (OSM) are
in somewhat different level - we mail directly to responsible group.
One of the reasons for that is that they physically cannot fix all
errors we report in 20 days, sometimes there are too many of problems,
additionally they know report comes from a "trusted" source.

As per "structure". For point type geometry (for example place
names) we currently create a google doc online, where both sides write
comments and status of errors. When everything is fixed - we take new
updated government data and recreate that google doc.

For roads it is per-case mailing of coordinates and notes... But
there is no reason why that could not be done in more "structural"
way...

Maybe important point here is that OSM data could have some
"bad/incorrect" data entered by mappers with not enough experience.
And we do not want to make government gis people to sort/filter out
such errors. So we go through all errors ourselves and only send
those, which we think are really errors. This is the main reason why
we cannot simply "automatically" run queries and send result to
government people. There are no "technical/IT" problems to send
mismatches automatically.

About amount of work

Initial comparisons of a specific dataset usually produces a large number of differences. Some of those are due to actual differences, some are because of different ways of entering data. So initial amount of work is usually high: both for updating data as well as fine-tuning comparison rules. After that only small amount of work is anticipated, because comparison simply notifies one side about the change in another sides data.

A note from Andrius Balčiūnas, Head of IT departament at GIS-Centras

Georeferenced data is created from ortophoto, but data changes much more often (than ortophotograpy is updated, currently each 4 years in Lithuania). OSM community notices the changes much faster. Therefore collaboration with OSM and their data usage for error checking, allows us to achieve higher data quality and relevancy. As this data is later used in national registries, cadastres, information systems - OSM community helps not only to improve the specific data set, but the whole national spacial data infrastructure content quality.
Important thing to note here is that such a collaboration means that even small road segment or other improvement of OSM data by a community member could later appear in official government data.

A note on the ODbL license, and dealing with it. Government can use our error reports to start their own mapping process, but they can't just copy our features. Do you know what they do at your government services?

Two points here:

Government is not using/copying any features from OSM. They get
reports about problems and this simply attracts their attention on
specific features in their datasets. By using their own sources they
fix the problem. It cannot be done in any other way, because all
changes/all data in official dataset must have an approved/reliable
source. OSM triggers the process, OSM does not give any data.

Any database consists of numerous facts (features/records). Only
the whole database can be protected by law. Single facts cannot be
protected. If any database is publicly accessible, anybody can look at
some facts (place name, street name, hotel name etc.) in that
database. Then those facts become the facts they know/have in their
brain. They can use it to update/insert such data in any other
database irrespective of the permissions of original database. I'm not
a lawyer. This is what I've heard from lawyers here in Lithuania. So
in practice this means I can take this and that from ANY publicly
accessible database (even google), until I do not take "too much" of
the database that it is not just "some facts", but "a considerable
part of the database". The big question here is only what is
"considerable part of the database"...

Several people have written on the subject before: when you look at something like the evolution of road network length in OSM, the shape of the curve can tell you something about how complete the network is (on the condition that there are enough local mappers).

You can clearly see that the larger roads were mapped faster than the smaller roads.
(note: there is a bug in the OSM-history-importer which prevents deleted objects from being removed from a snapshot. This could explain the continued slight growth of main roads. When people improve roads, they will often delete small portions of them.)

Assuming they are all kind of complete now, you can show the evolution of length as a percentage of current length. This shows quite clearly that there are "mapping priorities": the 60% completion mark comes much sooner for motorways then it does for tertiaries.

While this all sounds quite obvious, it really isn't if you look at the map of road evolution in Flanders. From the very beginning of mapping, contributors have been interested in small roads as well as main roads.

If we extend our view to a wider range of roads, we can see that the main roads in general got mapped first, but minor roads soon came to dominate over them. Service roads, tracks and paths (footway, path, steps, bridleway, pedestrian) tell their own story.

The graph clearly shows that main roads and minor roads aren't really growing anymore. However, the graphs for service roads, paths and tracks seem to level off in 2014. In fact, paths and tracks go up in 2016. In turn, that means there is a lot of mapping left to do. It is surprising to me that this holds for tracks too, as they can be mapped more easily from aerial imagery only. Open data sources of paths and high resolution aerial imagery (both provided by AGIV) could explain the upshot in the mapping of paths and tracks. Other explanations might be succesful relations with the GR and Trage Wegen organisations, or increased contribution triggered by data use.

Network growth versus amount of work

One more thing I do want to share now is the amount of work that is being done. While network completeness was achieved quite fast for main roads, that does not mean that people stopped caring after it was finished. In the animated map or primaries, trunks and motorways below, gray means "existing" and black means "been worked on this month".

These edits can be anything, but here are two examples: work on naming roads and on speed limits. From the beginning of the project, most residential roads were mapped with a name. Length of unnamed residentials started decreasing as soon as 2012. It will likely never reach zero, as many small bits and pieces are hard to assign to any one street. Also, there are in fact roads that do not have a name.

For speed limits, the proportion that has a limit is much lower. Total length of untagged roads only started decreasing in 2014. This tagging is probably slower because it isn't as important for routing and is sometimes seen as a consequence of road classification and location.

Measuring edits

These graphs compare the added length for main road types (right) and the number of edits by road type (left). It is quite clear that mapping new roads peaked as early as 2008, but the amount of work done on these roads has in fact only gone up until 2014.

(Note: here, the number of edits is the sum of the number of days a certain way has been edited. The category in which it shows is the last main tag for that day.)

These two graphs show the type of changes for primary and tertiary roads. Traditionally, geometry changes are the most important. As time goes by, their importance starts to lower, and editing tags becomes more important.

What's next?

As usual, I'm torn between answering more and more questions with the data, or scaling it up to more areas. Luckily, for your basic statistics needs, more and more options are finally popping up. See the road statistics provided by Mapbox, Steve Coast or the Missing Maps.

In the case of road network completeness, some efforts have been made to compare current OSM length to CIA stats to measure map completeness. This is problematic, because even if governments have decent stats, they are by their own local definition. Hence the comparison might be off. In the case of Flanders, we have a single, very good source for road lengths. One of the things I want to do next, is to compare local lengths in OSM and official data. This could show is where OSM is probably not finished yet. But you can also calculate this based on the shape of the curves we've seen before. If both approaches give similar results, that would clearly imply that you do not need external datasources to evaluate OSM data completeness.

Another thing is that we have noticed many new mappers first starting to map local paths. I'd like to see if this is a real evolution.

By focusing on road length, you measure both network completeness and level of detail. But neither very well. From a perspective of network completeness, you would have to discount things like cycleways that are mapped as separate ways, or only count dual carriageways once. An analysis detecting really new geometries would do that. I'm planning to do something like that "soon".
On the other hand, from a perspective of level of detail road length lacks subtlety. Take the example of cycleway networks. You would have to count all highway=cycleway, but also all the roads that have cycleway tags as part of the cycle network too.

But I told myself not to write articles that are too long to read in one go :)
I might have failed.

Bonus: more animated maps

Because they are fun to make and to watch, here are some more animated maps.

In my quest to understand the growth of OSM, I had a little fun today.

I took the 1/1/2017 full history dump for Brussels and I extracted a shapefile with all the versions of all the highway=* that ever existed.

Then I wanted to visualize it to see if there was a pattern in how the roads get mapped: "first real roads, then paths" or "everything all the time". So I styled the paths clear green, the roads thin black and used a gray background for the current highways. Then I rendered a slide for every month.

It looks really cool, because it doesn't just show the chaos of our growth. As the black roads are drawn slightly transparent and the monthly slide shows every version of the road in that month, "active areas" show up in heavy black. I think it's really pretty.

On the occasion that it was a featured image in the Weekly OSM, I made a new version without a gray background and with a more logical image size.

Community power

While building the program for State of the Map, the program committee had to say no to several people who wanted to talk about their local community – their successes and their challenges. As a kind of compensation, we added a local communities panel (video) and a local chapters congress to the program.

But during the preparation, I also got a lot of feedback from people who couldn’t make it to State of the Map: money, accidents, visa. I got feedback from Brian Pangle (UK), Felix Delattre (Nicarague), Clifford Snow (US/Seattle), Marco Antonio Frias (Bolivia), Redon Skikuli (Albania), Mohamet Lamine Ndiaye (Senegal), Yantisa Akhadi (Indonesia) and Michal Palenik (Slovakia). Most of them didn’t have a chance to be on the panel, or even make it all.

Some of their ideas did make it to the Local Chapters Congress, and helped put things in motion. For example, finally we have the option to follow comments on Diary posts! And there’s talk of putting some money into OSM.org website development for things like massive local messaging, which was a recurring theme there. That might involve helping Gravitystorm’s project to simplify the OSM.org codebase, as that would make contributing code that much easier. Also the idea to allow OSMF membership without payment was mentioned, which was an obvious frustration during the Local Chapters Congress.

What is important to me, is that it goes to show that focused community action can shift the focus of our dev team to issues that would otherwise be lower on their priorities list. I hope we can repeat efforts like this at the next SotM, hopefully even stronger.

This post does two things. First, it will give you, the local community builder, a lot of ideas about things you could do to work on a tighter and larger community. Second, it tries to set an agenda. It offers you several ideas which you could adapt, promote or realize.

Content

There are three subjects:

What are our main dilemmas when organizing our communities

What kind of tools do we need to build community

What stuff are we doing now, that actually works

It was entirely built around the answers from the people mentioned above, plus our own experience here in Belgium.

Community builders' dilemmas

Relatively little feedback on this, looks like we’re a confident bunch. But their are some interesting points.

The challenge of mobilizing mappers: too soft vs too hard.
We’re all volunteers, and if you push too hard, you’ll push people away. But if you don’t take action and keep it up, you’ll never get beyond three people at your activities.

Building a local community means making decisions. Is it acceptable to offer financial rewards? Do we focus on finding the "mapping nerds" who create huge amounts of data? Or do we need to adapt to less obvious groups - people who often can’t even read a map, but have excellent local knowledge?

Being local means embracing local culture. But we also want OSM to have a unified voice and a unified data model. And what do we do with well-intentioned outside help, who bring their own funding but also their own ideas and priorities?

Where the global community can help

In the answers, local communication needs were a top priority. The mailing lists, forums and IRC are good for reaching hard core mappers. But the large majority of contributors aren't there. So how do you reach the local mapper who isn’t active anywhere on these channels?

We need an easy way to contact local mappers

When you want to organize a local activity, you need external tools like Pascal’s mappers around me. Or you could query Overpass and make a little list of who has been working on that area. Just collecting the info takes a long time, and then you have to send messages one by one. It is impossible to send a message to all your OSM contacts if you just have their username. Allowing otherwise is obviously not without risk, so some anti-spam measures have to be implemented from the start.

We need to connect the new mappers

It is very labour intensive to connect new mappers to their local communities. Several people running a program to send a message to every new mapper in their region have given up, even as this cool little website makes the work a bit easier.
In Belgium, we use welcome.osm.be . It is a simple user interface which takes the New Mappers feed from Pascal Neis and makes it easy to send people a standard welcome message. One is defined as "Belgian" based on the location of their first changeset, which is good enough as a proxy for home region.

The message itself focuses on our communication channels, apart from giving some basic mapping tips. The advantage of using a tool is that you can share the workload, and can see who has been welcomed already. Of course, looking at changesets and giving some pointers is very useful – but a lot of work.
It also thanks you for your contribution, and gives you someone to contact in case of doubt. It gives a human face to the map. This is something that could be entirely automated within the OSM.org ecosystem – a centralized system with the content provided by the local communities.
This would not be an alternative to the Welcome Message you get on subscription, but a complementary message on first edit. Otherwise, it wouldn't be possible to guess everyone's location.

We need a lively community diary stream

Several of us commented on the impossibility of subscribing to comments on Diary posts, which leads to discussion rapidly dying down. This has now been implemented! Over a year ago, after some rather discouraging help, I opened a ticket on github to request this feature. Markus Heidelberg did make a Chrome/Firefox plugin to fix the same problem. It confused me a bit that someone would make an external tool, rather than fix the problem itself. Markus was kind enough to explain that it’s much more simple to write a separate bit of code than to integrate something into our osm.org website. Another argument for everyone to help modernize that codebase. But that won’t fix everything, because people do speak many different programming languages.

Anyway, the ticket remained open for almost a year, and it was only when the idea got wider support during SotM that we got the attention of our programmers. The pull request shows that even a “simple” feature like this is absolutely not straightforward to integrate. It looks like it took quite a bit of effort from Mikel, Ilya, Andy and Tom to do this. Thank you guys!

Still, we could do more to make communications easier. For example, you still need to be a bit of a nerd to find a way to follow the official blog. A subscribe button, anyone? But even to find this blog is a challenge. I find it strange that there are no direct links from the osm.org landing page to subdomains like help, forum., irc. and blog.osm.org .

We need to help new mappers gain experience

Becoming a mapper is not easy. When you often explain OSM to new mappers, you start to realize how many little things you’ve learned over the years. The more developed the map, the harder it will become. Attention for documentation, and making help easier to find will become ever more important. But a human touch might help too.

Godfather program
A recurrent idea to help new mappers is to start a kind of “godfather” program. It might be as simple as sending a welcome message to new mappers, personalized with some tips about better mapping of what they added. But you could go further, and coach people as they grow. You would need some reward for that, because it would reduce your own mapping time. So imagine a HDYC not of your own mapping, but of the people you helped.

#reviewmychange
OSM is easy for very confident people: you have to believe that little old me is capable of improving this big map made by so many people. At humanitarian mapathons, it is often a relief to people that their work will be reviewed. But why not add a simple feature to the iD editor to mark your own work as “please review”. It could be as simple as adding a hashtag #pleasereview to the changeset comment, and making a little tool that collects and geocodes these changesets into a simple website for follow-up.

A toolbox for local communities

This is a broad concept, but here are some examples of what that could mean:

A little money can go a long way. In the US, it can help you set up a a local Meetup group. In Africa or Latin America, a microgrant would be enough to pay for internet access, a mapping device and transport costs. If we’re capable of getting free pizza for our mapathons, we should be able to do this too.

A local web presence is something several people commented to as being very useful. Could we have a local community website starterkit, similar in ease to set up to a Maptime chapter?

Could we build communication and tracking tools (new mappers, QA, stats) built on admin boundaries instead of bounding boxes?

Things that work

A central theme on the answers about things that work, is that none of them are easy. It takes time, it takes effort, and the impact can often be quite disappointing.

Some long-time mappers even believe that we’ve reached our potential: everyone who is interested in OpenStreetMap knows the project by now, so there is little to be won by reaching out. This is typical for a swarm organisation: it’s only those who are at the edges of the swarm that see the growth. It is the networks of the newer people that will help you grow – not your own.

All the more reason to learn about things that have worked for others. This chapter talks about how to grow your community, but also about community consolidation. You might have a lot of people working on the map, but who have never done anything but add info to the map. Minimal community engagement is necessary: how else will they keep their mapping habits in line with the wider community? And of course, they are the first place to look when you want to do stuff to grow your community.

The basics

When it comes to engaging existing mappers, there is no alternative for real life meetings. Even though we’re an online community, it is personal contacts that build ties. And these are the ties you need to turn mappers into organisers.

A good place to start, is by watching changesets and commenting on them. It’s one of the few ways of getting to know the people who add data but aren’t active anywhere else.

Adapting to different communication styles is essential. If you’re only using mailing lists, don’t be surprised that the level of engagement stays flat. Take the Bolivian talk e-mail list that had about two active members for years. Then Bolivia started a Telegram supergroup and suddenly there’s 40 members, of which at least a dozen are quite active. Here in Belgium we adopted Slack during the State of the Map, and it’s still quite active for more informal communication and quick questions.

But of course, having many channels makes things complicated. Especially if what works in one country doesn’t in the next. it will be a lot of work to find the right channel and to get people in the channel that's best for them. An adapted welcome message makes it easier to integrate new mappers.

Where the local map is already relatively complete, there is little enthusiasm for mapping parties. The quaint model of going out collecting data and then mapping over a beer attracts much less people than other activities. But in places where the map is still quite basic, it can be very successful in building engagement and getting attention.

Doing exiting stuff, as Felix Delattre puts it, is effective to find new people. By doing something completely new and unheard of, you can create a lot of excitement about OpenStreetMap. In Nicaragua, being the first to create an online and paper map with all the bus routes in the capital can do that for you. The exposure this gives you, has an effect beyond the original mapping community that made the project possible in the first place.

Lacking big projects like this, showing real life use cases is an obvious way to connect to your audience once you get their attention. If you know your public, focus on what you know they could use. If you don't, show the diversity of cool stuff you can do with OSM.

You need a way out of your inner circle. Engage outside organisations. You are basically tapping into existing networks, rather than building one from scratch. For example, connecting with “data science” people, but also local government, entrepreneurs, IT people.
Working together with Trage Wegen has introduced many new mappers to OSM over the last two years in Belgium. This is an organisation focused on the threatened little paths and tracks that connects our messy towns and villages to the sparse open space. The people who support them are passionate about this subject, and it’s not that hard to take their passion for “slow roads” and turn it into a mapping passion, since a mapped path is harder to disappear.

Meetup

Especially in developed countries, Meetup seems to be a useful tool for creating events. Clifford Snow did an entire session on the subject (video). These events can be as small as a bar hangout, but it can also be used for much larger events. It is quite easy to start a group. As an organizer you have an idea how many people to expect, and Meetup does all the hard communication work for you (maintaining contact list, sending out reminders, thanking for showing up).

Meetup is very local: it will suggest groups to hang out with based on both your location and your other Meetup groups. So you will get a lot of subscriptions from people already active on Meetup, but not yet very interested in OSM. And you will almost automatically find meetup groups which have similar interests, where you might go and talk about OSM.

There are some challenges though. Meetup realizes the value of their network, and so you need to pay to be an organization on their website. Prices depend on the country (3 €/month in Belgium, 15 $ in the US). In practice, this is paid by the very motivated organizers themselves. As there is no free alternative, it might be an idea for central OSM organisations to provide this money instead. The impact is clear, and the investment is minimal. I would dare say that without Meetup, there would probably not have been a State of the Map in Belgium this year.

Humanitarian Mapathons

Both Belgium and Seattle talked about using Humanitarian Mapping as a recruitment tool. It helps attract people who would otherwise not be interested in OpenStreetMap, and gives you a chance to introduce them to the wider project too. It’s also a place to turn your hardcore mappers into volunteers. There are well defined tasks to do, like organizing, promoting, giving talks, making documentation, validating data or helping out individual mappers. That makes it easy to become a volunteer. The repetition of events gives them the opportunity to grow into ever more complex tasks.

Imports!

This will sound controversial to a lot of people, but imports can be a recruiting tool too. Clifford and Jeff Meyer talk about how they used an import to grow their community here. Imports aren’t easy, and having an ‘import party’ is usually a bad idea. But good imports are possible, and they provide an opportunity to recruit more technically oriented people who would balk at the idea of tracing thousands of buildings.

So, what else?

What dilemmas do you want to talk about? What do you think about the proposed needed tools? What worked for you or your local community? How can we make the life of new community builders easier?

And most of all, how do we keep the momentum we seemed to have during and after SotM 2016?

As an employee of the city of Antwerp, I was involved in the recent 'validation' of the Road Registry (Wegenregister) for our city. This registry is managed by the central Flemish government, but final responsibility for the content is with the municipality. Validation means the central government gives us a new dump for us to check for errors. This way of working is only a temporary situation: in the future, we will be live editing in the central database itself.

Some background

There's an amazing amount of cleanup left to do, but we decided to focus on the completeness of the main road network. Before, we did this by comparing with our own city registry of roads. But that is not being updated anymore. So for the first time, we used OpenStreetMap for the validation. Using FME, we identified roads which exist in OSM, but not in the Road Registry. We excluded service roads and "slow roads" (paths, tracks, cycleways), as these are less of a priority right now.

Next time, we will also look at roads that are in the Road Registry, but not OSM. In some case, the lack of road in OSM is really an indication of an error in the Registry. For example when a road has been closed, and the government somehow missed that.
This is more work, because the Road Registry contains a lot of little bits of "roads" that are really just driveways. Because nobody cares about them, they aren't in OSM. But they are quite hard to filter out from the Registry data.

The results

The cleaned up dataset of roads that are in OSM and not in the Registry was really quite limited. Only 138 situations needed manual review. Of those cases, 32 were a simple matter of slightly different geometry. For example when OSM mapped the road as a polygon, which we didn't really take into account.
We identified 33 cases where the Road Registry was clearly wrong. Then there were 31 cases that looked like they shouldn't have been in the selection anyway: they are private driveways, parking aisles, tramways. About half of those needed a fix in OSM. But the "tramways" were actually dedicated bus roads on top of tramways.

Most of the "mistakes" detected in OSM were caused by larger geometry issues. Sometimes the centerline of a road is debatable, but in most of these cases OSM could be improved, sometimes vastly. These were most often roads that hadn't been touched in years. Only in a couple of cases was OSM really vastly wrong. This happened when the city reorganized streets, and somehow, nobody noticed. Most striking was the Troonplaats, which is a quite popular square. In several cases, OSM had already been corrected in the month or two between data download and final analysis (though to be honest, some of those were fixes of mine). A few mistakes were caused by errors in or outdated road classification.

There was one striking case (pictured above), where we were convinced OSM was wrong, but we apparently missed a big change in the road geometry. Fortunatly there was a [Mapillary sequence], of course one of the 1.1 million pictures uploaded by filipc. Even though the aerial photography in Flanders is excellent and recent, the only place this road shows up is on the OSM map.

Legal stuff (edit)

As Stereo pointed out in the comments, OSM cannot be copied by a non-ODbL source. I always translated the license of OSM as "if you merge your private data with OSM data, you have to open up your data". But that's not correct, it should be: "if you merge your data with OSM data, you have to open up your data AND prohibit anyone from ever making it private again". In this case, the Flemish government allows (and explicitly wants) TomTom and Google to take official data and use it to improve their private data.

The ODbL always made sense to me, and it kind of still does. Say I was to download all of OSM to my own server, and redistribute it under a more open license. Then someone else could just take that data and close it off. But this case does help me understand those who aren't very happy about this license a bit more. In the case of government, it means you can't -really- integrate OSM into your processes. For example, you couldn't take OSM, validate it with your own data and redistribute the result under the license of your choice.

Have a look

You can have a look at the cases here. There's a bit of work left on the cases with a difference in geometry. The easiest way to get the Road Registry into your editor is with this (slightly outdated) WMTS:

TL;DR: Government road data, processed to help you map roads in Flanders, Belgium. All the tiled layers are available for use in your favorite editing software.

About the data

The Flemish government has a large project to measure most stuff you find in the public domain, the GRB (Dutch). The data is measured to incredible accuracy, but the project is not focused on maximum recency. Update frequency is once or twice a year. When it comes to roads, only those that need an official streetname are included.

That's a bit limited for some purposes, so they started the Wegenregister (Registry of Roads). The idea is that all roads are included, also "slow roads" (paths and tracks), private roads and even future roads. They started of with the centerlines of roads from the GRB and enriched it with National Geographic Institute (NGI) data for smaller roads. It isn't quite finished yet: a lot of local governments must still validate the data, and there is no automatic procedure in place to feed new GRB roads to the database. So you can expect some of the "future roads" to be quite present. The NGI data is also of varying quality: it is quite complete and has generally good geometry, but it can be quite outdated.

The scope of the Wegenregister is to offer a complete road network, not navigable data. It does not include anything like access restrictions, detailed lane info or max speeds. It does contain road surface information. It is divided into segments, which go from one junction to the next. Only if a new road is added, an existing segment will be split. That means segment ID's are relatively stable. If a segment has a change of attribute somewhere, this is dealt with by dynamic segmentation. Basically, that means you have a table saying stuff like "from meter 0 to 100 asphalt, from meter 100 to end concrete".

Finding missing roads

I did some quick visual checks in my own mapping neighbourhood, and I did find a LOT of missing roads. Some forest paths, several small alleys connecting backyards to the street, some graveyard paths, some driveways. I would say 95% of the missing paths/roads still existed, about 75% worth mapping in OSM.

Enough to warrant some closer inspection.

It is open data with an OSM compatible licence, which you can download through a website. First I tried FME, as we have processes in this software at my dayjob to do similar analysis that I could reuse. Alas, it didn't scale well for larger data. QGIS, after some trial and error, did the job no problem. The main processing operations took about 36 hours on my not-fancy-at-all laptop.

First I took the OSM road data (as a shapefile, from Geofabrik), saved it in our local projection and buffered it by 7 meters. Then I used difference to find the parts of the Wegenregister that were outside of that buffer. Next I threw out segments of under 10 meters (unless they were entirely outside of the buffer). I also calculated the percentage outside of the buffer. The result are A LOT of segments (220.000 out of one million) , which are either missing in OSM or have a very different geometry.

[EDIT: thanks to tyr_asd you can now copy the URL to share your current view :)]

But you don't need to go out surveying for every single change either. In the map I provided, you can combine the view of missing Wegenregister roads with aerial photgraphy, OSM gpx and Strava gpx layers. If they all point in the same direction, you can be quite sure that OSM is wrong and Wegenregister is right.

URLs for mapping

These URLs can be added in JOSM, iD and OsmAND.
In iD, click the layers button (righthandside of the screen), then click on the magnifying glass next to Custom or 'Aangepast' to insert one of the URLs. To use this in Osmand, check my previous diary entry on Strava (only works for layers containg .png). If you use JOSM, you know things like this :)

I can also provide just the bits of Wegenregister that are outside of the buffer, just ask.

Better mapping practices

Now imagine you've checked your whole mapping neighborhood. The map will stay red, at least till the next update of the process. But what about the roads that you surveyed and concluded were invalid Wegenregister roads. They should be removed too. I'm not quite sure how to go about that.

We could tell the government. And they might actually listen, but by the time the road is removed from the dataset, three more mappers might have analysed the same segment.

We could build a list of "untrue" Wegenregister roads and remove these from analysis. There are quite stable unique identifiers available, but it would mean everybody should refer to the same list when marking something in Wegenregister as untrue.

We could map non-existing roads in OSM (ooh, taboo!), analogous to the not:name tag that was used in the UK to mark that the official name for a road was wrong. I was tempted into something similar in this case, where a path is indefinitely closed off, but still quite existent (as seen from the street and aerial photography)

Seizing an opportunity

I know the Belgian heavy mappers like to work on stuff, but I think this might be a nice opportunity for expanding the community a little more. I've noticed how small paths and local trails are really something that can still attract new mappers. The Flemish Trage Wegen organisation is behind that for a large part, and I sense we could work together with them on a project like this. It is also very similar to the local "inventarisations" they do.

It is a very well defined task, it is repeatable, all the tools and pitfalls can be explained quite easily. Moreover, local governments could be contacted with a very clear proposal - to help them solve a problem they would have to solve themselves pretty soon anyway.

I see two main options, which are possible conflicting.

Option one: a maproulette challenge or Canadian style crowdsourcing tool. It's nice and easy, but it might be a little too simplistic for this task. The Canadian style tool would probably allow to generate a vast error report for the Flemish government, which is quite cool. Microtasking like this is not compatible with the extensive local surveying which we need when the reality isn't very clear though. But it might make the job a little lighter for those working on Option Two.

Option two: we set up a Belgian tasking manager (as in an instance of tasks.hotosm.org) and divide the job. It allows for very specific instructions, providing the analysed Wegenregister as imagery to people who have never used iD before and makes it really easy to track progress. Time-out for the tile you picked should probably changed from two hours to a couple of days though :)

One thing I've learned from working on Missing Maps, is that you need to use an existing network to recruit new mappers. You need an easy, repeatable task to make the work easier on OSM supporting volunteers. And you have an opportunity to take their passion (in this case "helping poor people") and try to channel it into a passion for OpenStreetMap. Change MSF for local government, mapping buildings with mapping roads, and a passion for doing good with a passion for local paths, and there you are.

Working on it

To make such a project possible, we should probably set up an online service doing something similar to my analysis. So newly mapped roads in OSM are removed from the "to map" list, as well as invalidated Wegenregister roads.

My analysis is more a proof of concept than anything else. It would be interesting to go further. For example, one could make a map with just roads that have a different name in OSM than the official name. Or just focus on the planned roads. Or suggest surfacing information for inclusion.

It would of course be nice if it were easy to take the Wegenregister geometry and apply it to the OSM data, but that might be a little too much of a challenge right now.

So I've been using the Strava data quite a bit recently. I knew the service from before, but then it was quite empty. The tip came from our übermapilliariator Filip when I was making too much notes mapping a nearby forest.

Strava for forest trails

I have mapped a lot of trails in Flemish forests. We're a densely populated piece of land, with very little forest (in fact, our environment minister literally said that "the purpose of a tree has always been to be cut down"). But even here, I have hardly ever visited a forest where all forest paths were mapped.

It requires local surveying as paths below trees are completely invisible, and we tend to do a better job mapping stuff you can see on sat pics... But even when you do go out to the woods, the resulting GPS tracks can be of bad quality. Strava to the rescue! Several million trips by hiking and biking-nerds are mashed together to give a clear indication of where people run and bike.

The easiest way to use it, is with the [Strava ID editor](strava.github.io/iD/), which comes preloaded with the layers you need. I often switch of the satellite imagery to improve visibility of the tracks. This ID version also contains the Slide tool, which lets you adjust geometry to the available tracks. I haven't had very satisfying results with that myself though.
In Belgian forest, you can basically zoom in anywhere and find missing tracks.
(For JOSM instructions, see the wiki)

Strava and surveying

Of course, you still have to combine this with some satpic reading skills, other sources and/or local knowledge. For example, when Strava, Wegenregister and Groteroutepaden GPX all point in the same direction, you can be pretty sure there's a path present.

I did spot some situations where people seem to be running straight through a meadow where no path is visible. And the standard view does not take into account time. Sometimes, clear changes are visible over time, see this experiment. So just looking at the global heatmap might get you mapping former paths.

.

Strava in Osmand

If you don't have other sources, or just want to go hiking somewhere you suspect mapping is incomplete, you can add this layer to Osmand. It will help you find paths with bad geometry, and help you find unmapped paths.Vague lines on the map, combined with a visible trailhead can be enough to verify the existence of the path. So you can add much more paths with just one survey.
Note: I hid all polygons and road details on my view, which helps keep the map readable.

In the tradition of the app, the feature is well hidden. First of all, you need to have the "Online maps" plugin enabled. This is just a setting, no downloads required. Standard available layers include "Microsoft Earth" satpics and online OSM maps.

Strava isn't standard. To add it as a layer, you need to open the "Map source" menu, available under map settings. Scroll down till you find "Define/edit". The URL example is with blue lines. You can find more about this URL on the wiki

Now your standard Osmand map is replaced with some blue lines. Great! Re-open the Map Source to get your "Offline vector maps" back. Now you can add the Strava layer as an Underlay or Overlay map. In the example above, I used it as an underlay with the basemap completely opaque. Forests (and other polygons) were switched off - but that does make for increased visibility.

Why do we map? It's a question in every OSM mapper interview, and it's often a bit confronting. We do it because we like it, but why do we like it? And in the case of many of us, why we spend such an enormous time on it?

After a brief exchange with a self-proclaimed GIS dinosaur, I felt the need to remind myself exactly what it is I like about OpenStreetMap. I noticed that both for her and me mapping really became part of our identity. It was almost like discussing refugees or social fraud.

This article is very personal. If you like the same things as I do, you're bound to like OSM. But you might like OSM for a completely different set of reasons. If you want a much larger frame of thinking, like why the world needs OpenStreetMap, that's explained somewhere else.

It doesn't wait for anyone

it does what you make it do

it doesn't make big plans about what it will do in the future, it simply does what it can now

There is nothing OpenStreetMap does perfectly. However, you can change that at will. Do you want it to have all the hedges in your town? Just look at the data model adapt and extend if needed, and add them to the map. There's your perfect map of local hedges. Now show off your work and get other people hedging.

OSM does not make big plans about what it will be doing in the future. Instead, it simply does what it can now. I like this mentality. If enough of us follow, we actually accomplish big things [like having more roads mapped in most countries of the world than the CIA believes there are]. But we do it without having wasted money and time on big studies.

We'll have us studying ourselves or have other people collect the funds to study us, thank you very much.

OSM does not tell you what to do. There are no "priorities", so no one has to set them. There is only leadership by example.

Geographically...

Yes, we now have a reference dataset for roads in Flanders, and it has 40 cm accuracy. But look somewhere else if you need hiking trails, and get another dataset entirely if need Brussels (which is physically entirely within Flanders). Yes, they will fix that. In the future. I like the present.

For something as "simple" as an address, there's a service being built on top of all open datasets of the world. But it's a patchwork, empty in many many places. Flanders, by the way, is only there because someone from the OSM community added the AGIV CRAB dataset.

While a service like this might just be the future for the use of authoritative data, it still poses some problems. What happens when government funds dry up? Their service dies, or the data quality starts degrading. There will be no OSM community around to take over their jobs - as communities get built around the actual mapping of things.

Maybe by the time OpenAdresses has a reasonable level of completeness, the OSM community will have learned to integrate external data and find a way to update it with both government and crowdsourced inputs. At that moment, government will have to adapt to a reality where they have to look at OSM inputs as much as to their official procedures. Maybe at that point, some politicians will look at their budgets and think: "We have a crowdsourced free dataset, which we use to keep an expensive infrastructure up to date. Couldn't we just use that data and use a fraction of our current resources to help keep that dataset up to date?".

But even if OpenAdresses works for adresses, it would still mean you'd have to find the best project for your usecase, for every usecase you have. I like having just one repository, where people with very diverse needs and interests are forced to interact.

...or contentwise.

Governments make set-to-stone definitions of what will be in the dataset. But that's like a planned economy. It adapts to the needs of the past, not the future. It's perfect at producing bakelite fixed phones, but could never invent the cellphone.
If I want to go to a building inside a large private area, only OSM will get me there. The private roads are not government managed, so not on the map. The buildings might have no separate address, just an unofficial name or reference. Even if they would, they'll probably start mapping them in separate silos, then think about integrating them. But OSM maps what our mappers are interested in, and the data is integrated by default.

If you need data which has a clear definition, you'll probably be best of using government data. If you value flexibility, you'll probably be better of with OSM data.

Example:
- black: government data where trails are out of scope
- red: government data, including paths
- blue: OSM

Note how there is no way to exit the park in the south-west. Oops! You will not see this kind of error in OSM, as this is one of the most important paths if you want to actually use the park. In fact, this trail was added back in 2008. On the other hand, we missed the service road north east of the path. But who is going to miss that?

It's a challenge to our economic model

Did I mention we're low cost? Based on my back-of-the-envelope calculations, the entire OSM map of Belgium would have cost about 3 million euros at local labour costs. There is no overhead whatsoever, as that is funded by the OSM Foundation. Imagine showing the current OSM map of Belgium to a minister in 2007, and say you will make this with 3 million and nine years time. OK, you might feel obliged to fund a server drive once every few years, maybe donate 20.000 euro? They would probably laugh at your face.

Honestly though, they would also laugh at your face if you would explain that agricultural lands would be mapped in -almost- all of Flanders and just some random parts of Wallonia, oh, and, sometimes with a distinction between meadows and crop fields, sometimes just as one category.

But how did we do so much in so little time? Maybe because of our messy data model - where you go in to correct a street name and wind up fixing ten different mistakes. Maybe because we only do the work when we feel like it and we stop when we're tired of it. If you work for someone else, this part of your job is often only a fraction of your time. The majority of time being used for such things as administration, meeting, evaluation and procrastination. This shows a bit of the Utopian vision behind projects like OSM. Imagine a society where people have the time to do what they want to do. How much more time would you spend on useful side projects like OSM if you didn't need to do the hours somewhere? This is an optimistic answer to the fears for the workless society some people see arriving. Add in a basic income and your all set. An idea popular withing the Pirate Party movement. Which by a matter of coincidence is exactly the same kind of organization as OSM: a swarm.

OSM to me is one big experiment - and I love being part of it.

It's empowering (and fun and quick)

I don't just value using good maps. I value using my own maps. My wife and I once did volunteering an area where hiking guides tried to monopolize the region for their own. We happily started creating and using our own maps to empower ourselves and independent tourists.
Creating a map from scratch is a powerful experience. Where the map isn't empty, I like to be able to fix the map myself. I like the feeling of seeing my fix appear on the map - for everyone else to use. I like that I don't have to wait for anyone else to fix it for me. I like how even a densely mapped place becomes partly "mine" by adding that restaurant I went to. It's a primal thing, almost like putting a graffiti in a public bathroom - we're all tempted. Tools like Pascal Neis's Your OSM Heat Map tempt you to stamp your name onto your local area - or to the places you traveled to.

It is empowering when you spot a mistake in OSM and fixe it the same day. It is not when you spot the same mistake in government data, have to make an official note, and see it fixed a full three months later.
When using OSM data, you dig just as deep as you want to. Using someone elses data, you are delegated your role and that's where it ends.

As a data user, if you use official sources and something goes wrong, you can sue someone. If you use OSM, you can fix the issue and prevent similar future issues.

It broadens the horizon

Using a GPS unit can make you lazy. It can lessen your map-literacy. But often my wife will look at me - are we going left instead of right because left is unmapped? I don't follow the plan, I go to the places where I'm most likely to find something new. at every crossroads, I will take the road that hasn't been mapped yet. Using an OSM based navigation app, you're not just navigating: you're looking out for improvements the whole time.This is especially true while hiking, where one long walk can result in one large changeset.

It's the community, stupid

Google Maps is a company trying to get you or your data to work for them. Governments reluctantly involve citizens in predefined roles. But OSM is a community of people. Rough edged at times, but incredibly helpful - even if you ask stupid questions.

Though I started mapping alone, it wasn't until I met other mappers at the Meetups in Gent that I really became involved. As OSM is such a chaotic community, there is nothing like talking to people to get a feeling for it.

As you learn, you start to teach - made easy with the beautiful help site. OSM is an ecosystem of people with the most diverse interests, and having diverse people work together is the perfect recipe for creativity and progress.

TLDR: Scroll down for some "pretty" maps showing paved and unpaved roads. In between is a wall of text about how and why I made these.

Waiting for a paved/unpaved road map

So I've been waiting for someone to make a useful map for navigating South America for quite some time. When you want to drive from A to B in South America, there is one essential piece of information you want: is the road paved or unpaved. When you want to travel slow and enjoy using your 4x4, you want the unpaved roads. When you feel sympathy for your kidneys or your car, you tend to stick to bitumen. Either way, you need to know.

Surprisingly, there are hardly any maps available that show this. Paper maps are hopelessly out of date even for basic road network completeness. OSM to the rescue! The road network completeness is pretty impressive considering the relatively small OSM communities there. And even the surface tags are mostly mapped - and I can tell from experience: generally correct.

So use the Humanitarian style. That only shows road surface when you zoom in. You tend to make route planning decisions from far away. I'm in Lima and I want to got to Titicaca over Cuzco, that's zoom level 8. I don't want to zoom in to level 11 to see which roads are paved. Also, default rendering is "paved", so you can't tell the difference between paved and untagged roads. As finding an unpaved road in reality is a nastier surprise than the other way around, it would be better to switch it around.

So post an issue to the main style maintenance. Well, someone did that two-and-a-half years ago. And even with the recent road rendering shakeup, nothing changed to address the issue just yet. One of the problems for mere mortals is that you have to develop the solution yourself, then hope the maintainers (and the community) accept it. And for that to happen, your solution should play well with the rest of the style.

Solutions

If there's one thing I've learned from the OpenStreetMap community, is that if you want something to happen, you should do it.

While on the road myself, I used Osmand as a solution. Osmand has road surface and even smoothness rendering. You can tweak viewing so that it almost works for lower scale viewing. I tried editing the style myself, but I found zero documentation as to how to do it, and my simple tests did not work at all. I'm also not sure the needed data is even in the generalized world basemap which one would have to use.

Getting exactly the OSM data you need is hard, until you discover Overpass Turbo. It really is a tool that makes querying OSM data accessible to the non-programmer. This was the best solution I could find while travelling myself. Using this Overpass-Turbo query I downloaded just the paved roads as a GPX. Then move it to the right Osmand folder, change the standard rendering of GPX files and voila, you have a tool for half a country. Just make sure you don't accidentally use the GPX for routing :)

While this helped for me, it's hardly a good solution for less nerdy people (yes, I know, compared to other people here I know next to nothing about computering).

So I've been experimenting with different solutions, when whining at issue trackers didn't help. First try was to use the same GPX I used in Osmand as a layer in Umap. For example, when collecting information about paved roads in Bolivia.
In this case, downloading a snapshot of data and uploading it to Umap was a good solution. The idea was showing the amount of roads added with the project, so different versions of the same query are overlayed there. (you can even download the data for just one country with a query like this)

But I did run into some limitations. When I tried a map of the whole of South America, the amount of data was becoming a problem. First of all, downloading it with Overpass Turbo crashed my browser. The nice people at help.openstreetmap.org were able to offer a solution: though it isn't obvious, you can actually download OSM data with Overpass-Turbo without rendering it in your browser.

Loading this much data in Umap wasn't really an option. The site would tend to crash as you uploaded. And it doesn't really work for a user too, as you have to wait for all the data to download and there seems to be an issue to get background tiles when using larger datasets. And another issue: if you want to use the map as a tool for mappers too, you need to use live OSM data. Surface tags get added everyday, and I'm not one to go update the map often. Luckily, there are some articles on how to use Overpass Turbo directly within Umap 12. But unfortunately, the needed queries are simply too big to use at the scale I wanted. There is an idea circulating to use an intermediate solution between live and uploaded data, which might actually become reality.

Retreat to QGIS

When the question about travelling on paved roads in South America kept creeping up on some forums I'm active on, I tried again. I thought I'd try and make an example of what I want to do in QGIS. The shapefiles provided by Geofabrik are only avaible country by country, and they seemed like overkill for my goal. So I revisited the download-without-render and adapted the query to return the highways for the whole of South America. Not to download too much at once, I split between highway types (example for primary roads).

Getting the data read into QGIS is straightforward - once you know how. The one thing that wasn't obvious to me was that the "Export raw data" option in Overpass-Turbo isn't readable by QGIS by default. You have to change the desired data type in the query to XML from the standard JSON. By the way, you can also change it to CSV if you want to do things like get a list of all named roads in a place.

QGIS is an amazing GIS program that easily beats the un-free alternative ArcGIS when it comes to reading different file formats and rendering large datasets.
But you can't just drag and drop OSM files, unfortunately. As I found out using the Learn OSM pages about QGIS, it is not complicated. You don't even need a plugin. Just go to Vector>OpenStreetMap>Topology from XML. This creates a Spatialite database from your OSM file. Then Vector>OpenStreetMap>Topology to Spatialite lets you create a layer with just the tags you want.

This is where the power of QGIS becomes quite apparent. Secondary roads up to motorways for the whole of South America are rendered in a few seconds - and this is 700 megabyte of vector data.
It took me a little while to understand how defining drawing styles work in QGIS.
Surface tagging is complicated, as the distinction paved/unpaved is in the same tag as detailed information about what kind of pavement or lack thereof is used. But it's easy to make a set of rules.

I could have added things like "asphalt;concrete" or "pavimentado" to that style to use as much possible data. But I don't want to clean data with the visualization - I'll go clean the actual data.

Once you have defined these three types, you can play with rendering quite easily. Adapted to trunk roads, you can save the style as a file and load it to another layer quite easily. Just change the width a bit, and you are starting to build a style.
(A way to simplify this for re-use would be to download all the main-road data in one Overpass query, and add a "highway=* AND ..." rule to the lay-out style, so you can do all the rendering within one QGIS layer. This render rule would then be shareable as just one file.)

Look here, maps!

The maps I was able to produce so far are definitely useful. They helped me map surface tags for several 100 kilometers I had driven but not mapped yet. But once you use the Openlayers plugin to add a background map, it quickly becomes apparent how hard it is to style complicated data. The gray, which is quite intuitive as a color for the unknown, is the same as border colors. Blue is the same as rivers. Red becomes unreadable to 10% of men if on a green background.

A useful map: say you're planning to do a little tour to Argentina, starting and ending in Santiago de Chile. The road to Mendoza looks fine (I added the missing bit by now), but make sure not to take that primary unpaved road for the last part. While driving South, do a little detour to the east. When driving back to Chile, make sure you calculate some extra tome as you need to do a small unpaved part. If you want to drive the coast in Chile, take into account that there are some missing links. You're probably better of driving a bit to the north before heading west.

An ugly borderline useful map: say you want to drive from Caracas to Ushuaia. You can't really make out which road to take yet, but it is quite obvious that you do have some options to stick to bitumen if you want to. Biggest problem is Colombia, where very few roads have the surface tag.

Using data is cleaning data

A data issue: the national communities have made some very different decisions about their national road tagging. In Chile, unpaved roads are almost always tertiary at most, even if they are important. Trunk roads are hardly used at all. In Peru, nationally managed roads are trunk, even if you really need a Lancruiser to make it. In Colombia (and Ecuador to a lesser degree), surface tags seem to be considered unnecessary, as everyone knows all main roads are paved anyway. Ecuador explicitly uses road quality to decide on road classification - surface tags are therefor largely redundant.

This makes styling a lower scale map quite hard. It would be nice if everyone would follow the OSM philosophy that road classification should reflect importance of the road above all else. In Europe, simple rules work, because road quality and importance correlate strongly. But in South America, in some countries it does, and in others it doesn't. Argentina did a great job mapping surface, so it is possible to make a good road map there. But as long as no major map style takes this tag in account at low zoom levels, you still have a large risk of sending people to the unpaved trunk road when there is a paved primary road available for the same trip. Data usability, in my opinion, trumps logical simplicity.

Maps that explicitly use the surface tag are of course the best motivation for mappers to add this info. Hopefully I can get some hints on moving forwards. Otherwise I'm already quite happy showing some of you the quality of the data that's already there - mapped even though there is so little immediate reward.

Getting it online

Back to the main problem: how to share a map like this. While QGIS has a little tool for converting a project to Leaflet, the amount of data involved here excludes that as an option. But even using the built in Print Composer didn't result into anything presentable. One would have to finetune the rendering exactly to the desired scale to make it work. The Openlayers background fail to get rendered properly in the outputs. So far, the best way to make a pretty map out of this, has been to just take a screenshot.

The only thing that would probably work is using something like Mapbox. But Mapbox doesn't come with live Overpass connectivity, and the vector data I would like to use is way too big for my free account. I asked Mapbox for suggestions, and was referred to the QA tiles. But I don't think that's a real solution, as you would still have to upload the data and update manually. So the only real solution would be to have Mapbox include the surface tag in their "roads" layer. There I go again, asking other people to solve my problems :)

Give me a shout if you want to try something similar and think I could be of help. Or even better, tell me what I could try next.

8900 people. That's all it took to make one of the best maps available of Belgium. (*1)

I don't believe there's a decent way to count labour hours, but here's a rough number: 61 labour years, assuming 200 days worked a year, 8 hours a day (*2). Considering Belgian labour prices, I'd guess that represents at least 3.000.000 euros.

I started doing these statistics after someone assumed that the southern/Francophone part of Belgium was underrepresented in Belgium. There's nothing as fun as being able to check these things. Some numbers I published before: it looks like the Dutch speaking part is mapped in more detail.

But the best simple proxy of map quality seems to be contributor density. So where are the contributors at?

Well, they're in Flanders.

It would be silly to stop there: there are more people in Flanders. You could divide them by area, but I believe the amount of data needed to map something is more dependent on people than on space. The Sahara is quite large, but you'll never need as much data to map it as you would for little old Belgium. So here's the same graph, in contributors per million inhabitants:

And there you go: the Flemish are the laggards, Brussels and Wallonia lead. This is really counter intuitive. I started out ignoring this, but it kept nagging in the back of my head. Remember how data density is higher in Flanders.

Then I thought about how one of the most productive mappers in the world lives in Flanders. So what would happen if we just exclude this one guy?

Turns out 44% of all nodes in Flanders were mapped by one person. In Brussels too there is one person who added about 30% of all nodes. Wallonia simply doesn't have someone like this, with the top contributor adding "just" 10% of all nodes. So I made the same graph, but without the number one contributor in each region.

Suddenly, we're all the same. Try and make our politicians believe that!

So that goes to show that even in a densely mapped country like Belgium, one person can still make all the difference.

That takes us back to basic community statistics in Belgium.
Here's the number of active contributors per year per region. The bumps in the curve in Brussels are probably because of the small size of the region - just over a million inhabitants.

If we take into account people with at least 5 sessions (active on at least five different days in a year), the numbers drop steeply. Wallonia is clearly number one here, with Brussels and Flanders quite a bit lower.

When it comes to recruiting new mappers, Flanders comes in last.

Do people cross borders? Well yes. To define "home", I first took a subset of people with at least fives sessions in Belgium over all years. Then I simply looked at the region they had most sessions in. Of course, you will have some foreign people this way.
It leaves us with 83 Brussels mappers, 995 in Flanders and 675 in Wallonia. Of the Brussels mappers, fully 60% mapped at least 10% of the time across the border. Pretty logical of course, because it's small. Only 18% didn't ever cross over.
In Flanders, the numbers are 28% and 50%. In Wallonia a similar 25% and 56%.

I've been working towards creating these kinds of numbers for all regions in the world and dump them into a statistical platform. It'll be some time till I can realize that...

*1. Well, actually, a bit more by now: I used the history dump of january 2015.

*2. I counted every active day per user as one labour hour. It's just a number I made up. You can make up your own if you want. The number of sessions (total number of active days of all contributors) is 97.270.

Three weeks of @mapillary mapping. Most eventful day: aggressive Porches overtaking, goats on the road, snow avalanche, overtaking Porsches with an accident
Just back from a three week road trip, mostly in Italy (here's the complete GPS track in a pretty umap, obviously already available for mapping purposes). Just before leaving, I got a mail from Mapillary asking how come I stopped mapping with them. I explained how I use my smartphone for both navigation and Mapillary, but you can't do both at the same time. This is an Android limit: an app is not allowed pictures while in the background. There was an idea to get around that by making an Osmand plugin, but there doesn't seem to be progress on that. Anyway, I mentioned I do have a second phone I could use, just no mount. So for the second time, they sent me one of their perfect little smartphone mounts. Of course, now I had a moral obligation to be Mapillary mapping the whole trip.

You need a willing co-driver, or stop from time to time. I did have some app stability issues, you need to check the orientation of the camara from time to time, etc. It was probably device-specific, but it took me a while to get the settings right. No background threading of pictures, no Osmand running in the background. That seemed to do it, even for full size pictures.

You need a good camera. Smartphone cameras tend to vary in quality by quite a large margin. My onePlusOne did reasonable, my wife's Samsung S5 was poor indeed.

You need a clean window. This is harder than it sounds. On bright days, you get bugs. On gray days, you have raindrops. Some specks are hardly visible with the naked eye, but act as a kind of lense and make ugly spots. Mostly, it's just irritating reflections that mess up pictures. So I was thinking, maybe one should try to put a polarising filter on the lense?

You need plenty of disk space. Yes, you can take small size pictures, but resolution does have it's advantages, especially for road signs. And the Italians have A LOT of those. Not a problem with my OnePlusOne (64 gig memory), but close to rediculous with the Samsung S5: in theory 12 gig, but in practice you can be happy if you have 2 gig spare space. And on a longer road trip, you are going to need some separate storage anyway. I took 80 gig of pictures in total, so I had to keep moving pictures to my laptop. Which isn't as easy as it sounds, as we didn't have 220 volts that often.
You can just move pictures back and forth between your smartphone and external storage. When you put the pictures back in the proper folder, the app recognizes them. Just don't forget that Mapillary assumes you don't want to keep a copy of the pictures yourself. They are automatically deleted from the device as you upload them.

You need a device dedicated to Mapillary. You can't run it in the background, you have to leave the device in place for as much as possible.

You need good weather. On rainy days and in bad light conditions you get a lot of bad pictures. That proves to be a real dilemma for me. Bad pictures are better than no pictures, right? I don't want to polute the Mapillary database with ugly pictures, but on the other hand, even on a bad picture you can often make out what the traffic sign says. And there is always some info: number of lanes, railgards, bus stops. Who knows what info you are deleting that someone might find useful? And who knows when the next photographer will be there?

And you need time: reviewing 60.000 pictures is always going to take a while, no matter how quickly you go through them. Ideal for those half-asleep trainrides back and forth to work. So it will take some time before all the pictures are online.

After you come back, you need bandwith. I have a monthly quota of 100 gig and about 80 gig of pictures to upload. So I'll have to spread them out somewhat. If you have even larger sets, I believe snail mail will be the faster and cheaper option. As everybody know, no wired connection beats the bandwith of a pigeon with a flash drive.

OSM quality in Italy: pretty good!

The occasional new roundabout is missing, but quite a lot of POIs are there, most forests are mapped, even most trails seem te be mapped. Of course, there's always something to improve. For example, max speeds are often missing or wrong. A lot of fixing is simple (wrong one ways you noticed, simple mess-ups), but often it isn't. Italy has a huge amount of old towns and villages, and these cannot be mapped properly from aereal pictures. There are just to many little alleys, often underneath houses. Not even GPS will help you there. So you either need to print out maps or use a mobile mapping app and get a local data plan.

Hiking and Mapillary

We did do a lot of little hikes, but I didn't take any pictures on those. That really is a different speciality. You need proper gear, as walking around taking pictures the whole time is not easy nor fun. And it would quickly kill the battery. I asked my wife if she would still travel with me if I would wear something like this. She seemed to be OK with that, surprisingly. So maybe we'll have to look into that. On some of the trails we did, a backpack like that would have been rather impractical though.

Somehow, I was able to not worry about multipolygons until recently. You see, if you want to cut up the planet into little pieces according to administrative borders, you are bound to meet those. One expects a place to have a simple border, forming a long closed line. Reality is more complicated. My home country Belgium is a fine example. Brussels is a simple polygon. But Brussels is also a hole cut into Flanders, the northern region. So Flanders is a multipolygon. You need to know the shape of the larger area, the shape of the smaller area within it, and the fact that you need to exclude this inner area. And then that extra non-connected bit in the east, Voeren. We also have the relatively famous Baarle-Hertog, which has bits of Holland within bits of Belgium within Holland. Nothing a multipolygon can't do on a wednesdayafternoon.

However, a lot of software can't handle multipolygons. One of those is the otherwise amazing osmpoly-export QGIS plugin [UPDATE: since March 2016, it does handle it!]. I used that one to convert my shapefile (OGR) archive to the POLY file format I needed for the History importer. POLY is a standard in the OSM community.
I mostly use programs with a user interface, so the QGIS plugin was my tool of choice to build a dataset of all the regions in the world based on Openstreetmap (part of my larger project. And my sloppyness means that these pretty statistics for test-case Flanders were based on this not so pretty image:

I only found out because I learned how easy it was to extract shapefiles from the database created by the amazing OSM history importer. And it was only under the stimulation of the similarly amazing Ben Abelshausen, using his virtual machine, that I actually gave it a shot. Creating a shapefile of all the highways valid on January 1st, 2015 is as simple as this:

Of course there is a solution for the multipolygon problem. It just ain't as easy as a QGIS plugin. For me, that is. There are some tools listed at the Polygon Filter File Format wiki page. What we need is the ogr2poly.py script.

And that's where the wiki seems to stop. It refers to a subsite where you can download it. Within the .py file , the only thing it says about using it is this: Requires GDAL/OGR compiled with GEOS.

There are some tutorials around, I'll try to write this with the absolute beginner in mind. After reading a bit, I decided to try on my virtual Ubuntu machine. The first steps will probably be similar in Windows, but probably not the solutions.

First, you need to know that .py means that this is a Python script. That means you will need Python installed in order to be able to run things. Simple check: go to the command line and type "python".
If you don't have it yet, you can download Windows installers here. Because it's open source, you can choose between about a 100 different versions. I'd go with the first one. On Linux systems, it seems to be preinstalled most of the time.

Next, install gdal ogr. You can check if you already have it, typing "ogrinfo" in the command line. I didn't, so I installed with the help if this nice little manual did the trick:

Then the .py file also said it needs geos. I checked, typing "geos-config" in the command line. It seemed just fine.

So it was time to try the actual script. This guide said something about that, though I didn't really follow it. I just put the .py script into a new folder "OGRtoPOLY" in my home directory.
Note: in the graphical user interface, it looks like OGRtoPOLY is a subfolder of /home. However, the "real" directory would be /home/username/subfolder.
The following command did access the .py file in my case. I put the shapefile and all it's collateral files in this same directory.

That ran error-free after I replaced grass70 with just grass.
Python still returned the same error. More googling told me to do this:

$ sudo apt-get install python-gdal
$ sudo apt-get install gdal-bin

And we struck oil.

The script allows for clever naming of the output files (one poly file for each feature). It can simplify geometry and create a buffer to make sure all the data you need really is in there. You can find the commands for that if you look within th .py file for "Setup program usage" to get the complimentary commands.
For example, this command returned all the poly files I needed with names "europeregions_xxxx.poly", where xxxx is the feature's attribute idNUM. Output files were just dropped in my home folder, I saw no way to change this.

I hope this helps. If you can clarify some of the stranger things I stumbled upon, let me know. if you think this info could be of better use somewhere else, do cop-paste or let le know what to do. If you're trying to do the same and run into trouble - sorry, can't help you! Just kidding, I'll try.

Recently, I came across some villages in Bolivia which have "aldea" for a name. Upon closer inspection, I discovered there were over 600 in the country. The size of the problem is easy to find with Overpass Turbo. Just tell the wizard to search for name="aldea" and it will do everything you want. Thanks to the Argentine twitter feed, I knew that you can search for this in a whole country, as opposed to within a bounding box. Here's what the output looks like. I left some cases as a reference.

Obviously, the name tag is not for the description. These 'aldeas' were already properly classified as hamlet, village, etc, so there was no information in there. These were untouched nodes without a history. After brief consultation with the Bolivian community, I decided to go ahead.

Now, as a Potlach 2 mapper, I didn't know how to fix this in JOSM. I've opened JOSM maybe five times, and every time I shut the thing down after 15 minutes. I know.

I read some use cases for Level0 before, and this seemed to be one. It was much easier than I thought. After running the query, you can hit the "Export" button and choose Level0.
This opens the Level0 editor. You still have to log in and allow the editor to access your account. Apart from that, I just copied the text to Notepad++ and did a Find and Replace for "name = aldea" to "fixme = needs a name". Hit save, and you just fixed 500 villages (max 500 objects per edit!).

EDIT: I did get ahead of myself a bit: you really shouldn't do this before talking to the people who mapped these things. I'll do that now - it is an easy revert if in fact there is some very good reason for abusing the name tag on this scale.

Since the State of the Map in Buenos Aires, Ive been able To try out some possible indicators, I tried out a dataset for my home region Flanders. Here's some examples of things to measure.

The nodes table contains all POI's defined as nodes, but also all the nodes that make up the lines and closed lines (polygons) of Openstreetmap. We can reasonably assume that almost all untagged nodes will be part of lines or polygons. Some tagged nodes are also part of lines. For example, a miniroundabout, a ford, a barrier, etc, should always be part of a line.

The total number of nodes is made up almost completely made up of nodes that belong to something else. That's to be expected of course.

Over time the number of tagged nodes increases. But the number of tags on these nodes increases faster. In 2009, there were on avarage only 1,24 tags on the nodes, now it's over twice as many.

What gets tagged? Here's a quick breakdown in some very wide categories. Road info are all the kind of tagged nodes you'd expect on highways, the kind that adds to better routing and safer driving. POI's are things like banks, schools, fuel stations, etc. These two take top spots, but in 2014 there was a big jump in the first group.

Infrastructure nodes like those belonging to railways and high tension electricity lines are only recently being overtaken by address nodes. The release of open data about addresses in Flanders is probably the cause of the big jump. However, most addresses are tagged on buildings, so they do not show up here.
For POI statistics, it would be best to just take the sum of nodes and points for the same tag combinations. Two problems arrise. One is practical: there seems to be something wrong with the way the history importer handles polygons. It might have to do with the lack of support for relations, but I don't know yet. One more thing for the to investigate list. The second problem is that sometimes the same POI has both a polygon and a node tagged with the same information. This is not good practice, but it happens. You could remove nodes that geographically fall within polygons if the tags are the same. But I wouldn't know how to do that in my setup. It zould take a lot of processing as well. And my available processing power at the moment is way too small as it is.

On to lines. In most cases, the thing to measure is the length of these. The absolute number of lines is mostly unimportant. A river is a river, wether it consist of 10 or a 100 bits and îeces.
A nice example of how crowdsourcing works in practice is the evolution of the waterway network. First we see a quick growth of the river network (length in km). As the growth of the rivers winds down and stops, we see the streams taking off. So the crowd has finished mapping all the rivers, and only when that is finished, the smaller streams get more attention.
Rivers are sometimes mapped as polygons too. Normally the lines are not deleted as this happens, so on network completion this has no impact. Of course the level of detail does increase. A way to measure the detailedness of the river network, could be to count the nodes of all lines and polygons making up this network.

A similar picture for roads. Main roads (tertiary to motorway) start of as the largest category. Minor roads (residential, unknown, unclassified) follow but overtake them quickly. Full network completion seems to be achieved by 2013-2014. Other roads (mostly service roads) grow slower, and steady. Just like "slow roads" (mostly footways etc) the steady growth seems to indicate that it is either more or lower priority work to complete this network. So these might keep growing for many years to come.

Network completion isn't everything of course. A lot of extra information is needed to have a good, rouatable map. This kind of infor is often mapped as tagged nodes on the map. The history importar does not load realtions unfortunately, so the number of turn restrictions can't be counted with my method. In the graph we compare the growth of road info nodes with the evolution of the road network. Again, first the basics get mapped, only as the first prioirty nears completion, real progress is made on the extra's.

So why do we need global statistics like this? To learn if these are general patterns. To see if imports disrupt these patters. Or if they only occur when population density and wealth is high enough. To see how complete maps are - just looking at the graphs, you can often see which features are mapped completely and which aspects of the map need more work. Based on the files generated in the process, it's not very hard to classify mappers: are they local, do they have local knowledge or are they probably remote mappers. The distribution of these is good to know, but more than that might give important insights. What happens when remote mappers reach road network completion? Does this increase the chance a good number of local mappers pick up the mapping that needs local knowledge? That might inform if and when remote mapping should be encouraged - or avoided. A lot of these issues give rise to heated arguments. Wouldn't it be nice to have some data to corroborate opinions?

As I said before, there is a lot left to be done. At State of the Map in Buenos Aires I got many tips on how to move ahead. And that has been quite helpful. I could for example never have imagined how incredibly simple it was to add length and area to lines and polygons. As old problems get solved, new ones show up. I just found out that the number of adresses in my polygon analysis is way smaller than other peoples results. SO there goes another day in finding out what goes wrong.

So even though my set-up is still not really finished for a more complete analysis, it would be nice to start some basic worldwide analysis (see the links at the start of my previous post on the subject) available soon. For those who don't know my little project, the idea is to provide these kind of statistics in an interactive platform, making them available for every region, every country, every continent and the whole world. There's also a video available (which I daren't watch yet) of me mumbling through the idea at State of the Map.

One little detail: my computer can't really handle the denser regions. Flanders was on the limit of what I can do. And there are much larger areas which are just as dense. So if you can spare a little server, I'd be happy to use it :)