Introduction

OpenCellIDs is an OpenSource project complementary to OpenStreetMap with the aim of creating a world wide collection of GSM cell tower locations. It enables location services in situations where GPS is unavailable. Please see check the official OpenCellIds web site for more details.

Measurements can by collected by a number of different mobile apps, including Keypad Mapper 3 (or short: KPM3). Berlin based company ENAiKOON provides the collected data free of charge to the community either as (aggregated) collection of base cell tower locations or as (non-aggregated) raw measurement file including actual GPS measurements by anonymous users.

On this page we import raw measurements data into PostGIS database and try to run different (rather basic) data analysis steps on the data. This is by no means an official documentation. It is rather intended as a possible starting point if you're interested in experimenting with the GPS measurement data. Note that GSM cell coverage it outside of the scope of this page. Please feel free to correct any bugs/omissions you find.

One aspect we will cover in more detail is an estimation of the actual KPM3 coverage in today's OSM house numbers. We assume that every house number should have a nearby OpenCellID GPS measurement in order to qualify. Also measurements should be taken by pedestrians rather than cars. Besides the OSM object's last changed timestamp should not be earlier than OpenCellID measurements.

Background information

This section provides some background information about the data to be imported.

File structure

The official measurement file structure documentation can be found here. However the actual file contains a number of additional (still unknown) fields or some fields might be in a different sequence. Please enhance this section, if you have more details about the file format.

Additional Information on the format can be found at downloads.opencellid.org/ [1]

Extracting all measurements from year 2013

Measurements are restricted on purpose to the year 2013 as KPM3 was only introduced early 2013. Earlier measurements are out of scope for this analysis (although you easily include them by removing/adjusting the grep statement).

Removing Duplicate Entries

The measurements file contains about 21 mio. actual GPS measurements in total (client gpsSuite, no date restriction applied). For some unknown reason some raw measurements appear up to 8429 times in the database with all fields identical except for measurement id and the sent timestamp. This could be a result of some client malfunction, sending the same raw measurements over and over to the OpenCellIDs server.

Note: house.style is just a stripped down version of the default.style, where most (unimportant) tags have been commented out.

Data Analysis

QGIS could be used to further analyze the data. In the following screenshot a sample KPM3 mapping session is depicted. Coloring schema used is based on creation timestamp's calendar week (red=later in 2013). You can clearly identify a mapper walking along the streets.

Sample Keypad Mapper 3 mapping session in Germany.

In a more sophisticated analysis we could estimate housenumber mapping coverage by KPM3. Our assumption here is that housenumbers created in OSM were mapped after the corresponding OpenCellID measurement occurred. Also all housenumbers should be well within a 30-50m distance to actual OpenCellId measurements in our database table.

Identifying Hotspots

When working with OpenCellID measurements you will notice that in some areas, a huge number of measurements were collected over a course of up to several months. The following screenshot highlights a location where 100.000 (!) measurements were taken.

Roughly 100.000 measurements collected over a period of several months

We can find out more about similar locations using the following SQL statement:

Preparing the top 50 location with the most measurements based on a 500m grid. This table will be later used to exclude some OpenCellID measurement from more expensive calculations, such as 'number of nearby probes in a given timeframe'.

Calculating number of nearby measurements in a given timeframe

First we add another column nearcnt to store the number of GPS measurements in a 2 minutes timeframe within 100m distance. The purpose of the number is to distinguish cars and pedestrians collecting cell tower information. The interval was chosen to approximate typical walking speed.

altertableopencellidsaddcolumnnearcntint;

As calculating these values is very expensive, we restrict the analysis to a bbox which fully covers Germany. But before starting this step we mark all OpenCellID measurements in our hotspot table with a very large nearcnt value. This way we can exclude these points in the next analysis step as they seriously impact the overall processing time.

Also the workload is split up in several hundred chunks to be processed in parallel using GNU parallel. Note that we exclude all measurements where we already determined a nearcnt value. This script can be restarted in case things go seriously wrong. If a chunk was already processed in a previous run (=a nearcnt value <> 0 exists), it will simply be skipped. Parameter offset controls the size of each tile, inner defines an inner margin in a tile (avoids skipping of tiles because of rouding issues on tile borders), overlap defines a small overlap to neighbour tiles (to make sure no values are ignored on tile borders). In addition nearcnt may be null, if the column was just added by an alter table statement.

As different parallel processing areas are slightly overlapping and could be updating the same table rows, deadlocks cannot be excluded completely. Postgres will detect this situation automatically and aborts the current transaction. If you receive a non-zero value for the nearcnt query, simply re-run the shell script until the row count drops down to zero.

Calculating number of OpenCellIDs in close proximity to housenumbers

In the last step we look for OpenCellID measurements in up to 50m distance to addr:housenumber polygons + points. GPS probes might origin from cars as well as pedestrian mappers. As the raw data doesn't include any kind of identifier we use a heuristic based on nearcnt introduced before. This way we can count the number of GPS measurements in a 2 minute time interval / 100m distance. As KPM3 frequently takes GPS probes (at least every 10s), this turns out to be a reasonable (albeit not perfect) filter for pedestrian mappers.

As discussed before we introduce another level of validation and only accept a mapped housenumber, if the last changed timestamp in the OSM data is later than the surrounding OpenCellIDs measurements. On our sample screenshot a number of houses are now colored in white, as the last change occured in year 2011, while the OpenCellId measurements are from July 2013. Clearly, KPM3 cannot be the original source of those housenumbers.

Refined calculation: Check existence of relevant OpenCellIDs in close proximity to housenumbers

After publishing the first version of this analysis some mappers remarked that they were listed in the Top 25 list, although they never used KPM3. After looking into the data a large gap of several months was identified between the OpenCellID measurement and the added housenumber on the OSM object. A more reasonable upper limit between collecting house numbers and adding them to OSM might be 2 weeks - in some cases OSM was updated a couple of hours later already. House numbers added after this cut-off time are likely not related to OpenCellID measurements.

Unfortunately the Germany Extract only includes the latest version of an OSM object. Up-to-date OSM Full History extracts don't seem to be available at the moment. Effectively we can only apply the 14 day restriction to v1 OSM objects. For later version we don't have the actual date when the house number was added in the first place. Consequently we cannot exclude false positives for OSM v2 and later versions by this kind of time restriction.

Also note that column cnt was changed from type int to boolean, i.e. you need to replace cnt > 0 by cnt is true for Top25 table + History chart.

Missing History information revisited

Due to the lack of an up-to-date Full History Dump for Germany we resort to downloading the respective information from the Main API. Note that this procedure is not at all recommended for larger amounts of requests as it may impact other mappers. In any case please refer to the API documentation.

In a previous analysis step we've already cut down the number of objects from 3 mio. down to 50.000. First we extract the OSM object ids for those nodes and ways where history information should be retrieved for. This step assumes that column cnt was already populated in the previous processing step.

A small XSLT mapping extracts version details about first addr:housenumber appearance, such as OSM Object id, timestamp, version number, user and user id. It doesn't check if the house number matches the latest version. We will use this information to get a more precise picture when a house number was originally added to OSM.

Sample screenshots

Depending on the area some mappers might have tried Keypad Mapper 3 at home but never actually used it for mapping house numbers (why?). The first screenshot gives such an example, where most housenumbers don't have any nearby OpenCellId measurements. On second screenshot we can see an area with lots of GPS traces, a potential mapping area. Looking at the timestamps, we find out that most buildings were already tagged a few years ago. The last screenshot shows a rather widespread use of KPM3 in Southern parts of Munich where almost whole suburbs were mapped using this tool. We didn't evaluate ourselves if this was a result of a mapping party or one dedicated house mapper. However, most numbers were created by a single account in 2012 and 2013.

Only few housenumbers with OpenCellId coverage

Potential area with KPM3 usage (in red) + buildings in blue. However, most buildings were already tagged with house numbers in 2010/11.

Estimated Daily Active users and contributed housenumbers per day

The following chart looks at all addr:housenumber objects previously identified as likely mapped by a Keypad Mapper 3 user. Throughout the year 2013 a constant increase in daily active contributing KPM3 mappers is visible (daily values in blue, 14 day moving average values are shown in black). The actual number of relevant OSM addr:housenumber objects last changed on a given day is shown in orange as 14 day moving average value. Interesting to note is that this number peaks at about 400 house numbers per day regardless of the steady increase of active users since May 2013. Daily peak value so far in 2013: 1500 added house numbers on May 12.

Friendly reminder: This chart is based on a number of assumptions to estimate OSM housenumbers from nearby OpenCellIds measurements. It may be inaccurante, incomplete or simply plain wrong.If you come across possible issues, please leave a comment on the discussion page. Thanks!

Estimated Top 25 Keypad Mapper 3 Users in Germany

For our Germany extract (approximate bbox: 4, 46, 16, 56) we could also give a very rough estimate on top KPM3 users. We count addr:housenumber nodes and polygons previously identified as KPM3 candidates. Nevertheless these numbers are really impressive for a single mapper contribution!

Friendly reminder: These figures are very likely not very accurate, use with caution.If you appear in this list but don't use Keypad-Mapper 3 at all, or if you think these figures are too low, please leave a comment on the Discussion page. This will help to validate some of the assumptions on this page. Please also take a look at the FAQ section below for further explanations.

FAQ

Is this an official usage statistic by ENAiKOON?

No, this page is not in any way related to the makers of Keypad-Mapper 3, it is a private fun project only. ENAiKOON provides global download statistics at this time (see talk page) but no usage statistics. In a way this wiki page tries to close that gap.

The number of house numbers appears to be too low, hdyc reports much higher numbers.

To qualify as KPM3 housenumber some nearby OpenCellId measurements need to be available. If you added some housenumbers without walking around with KPM3, chances are that these housenumbers are not counted in this statistic. Also (similar to hdyc) only the last changed user is taken into account. If you added a housenumber and some other mapper added some 3D tags, this housenumber won't show up in your total figure.

I never used KPM3. Why does my username appear in the Top25 KPM3 mapper list?

For OSM v2 and later objects, it is currently not possible to find out when the housenumber was originally added to OSM. Assuming someone walked around in your area with KPM3 and you added some house numbers months later to an OSM v2+ object, the OpenCellID measurements will be linked to your user name. Due to the lack of an OSM full history file, this kind of false positives can only be handled for OSM v1 objects - in this case, there's a cut off time of 14days between OpenCellID measurement and OSM edit.

Real life example: 117086617(XML, iD, JOSM, Potlatch2, history) was last edited by rolandg on Sep 29 (version 4), the house number was added by ysae on July 30 (v3). Nearby OpenCellId measurements indicate that someone used KPM3 on the evening of July 29. Using a full history file, this way would have been credited to ysae instead of rolandg.

Update Oct 25: Now that the chart/top25 list also takes the full history information into account, the number of false positives should be smaller. However, if you happen to have added a new house number in an area where someone collected some OpenCellID probes in the last 2 weeks before your edit, you might still be considered as a KPM3 housenumber mapper.

Summary

That's the end of our little quick and dirty OpenCellID experiment. We tried to match actual addr:housenumber objects to OpenCellId measurements and filter out measurements by cars, which are rather unlikely to contribute to existing housenumbers (creating false positives) and also consider timestamps, etc.