This is the same phone metadata that the NSA has been having so much trouble with. However, when identities are properly protected and mobile phone metadata are provided to non- intelligence data analysts, real science can be performed. New urban scientists can now get a peak into activities they could never see before.

The MIT article talks about mapping commuting activity, supplying population density maps over time, the distribution of wealth in Africa and additional science being discovered about cities and only revealed through phone metadata.

Urban science started decades ago

When I was in college we were just beginning the quantification of geography leading to a prototype of urban science. We used to use newspaper subscriptions, traffic statistics, travel statistics, etc. to determine which cities were more connected to which other cities. Of course, newspaper subscriptions would no longer suffice in the Internet world we live in today. Nowadays, mobile phone metadata have taken over the role of all this data.

We need more and better data

There are two problems with the current level of mobile phone metadata today.

Mobile metadata aren’t that freely available. Yes there are a few datasets which are widely available but these aren’t even the top 50 cities of the world and are only for a relatively short period of time. Mostly these datasets are available on an ad hoc basis at best. What we need is ongoing data, for the 50 largest cities world wide to have a chance at really understanding urban science.

Mobile metadata must be both anonymized AND uniquely identifiable. We don’t want to know who owns the mobile phones but we would need to know something about them, such as zip/postal code, sex, age and perhaps other information. (Although scientists have already figured out a way to identify a mobile phone owner’s home cell tower which is probably even more precise than their postal code).

For the later problem, it shouldn’t be too hard to come up with a reasonable way to protect phone owner identities but still provide some way of uniquely identifying a phone. I would guess that the current mobile phone datasets used in the studies above have already solved this. However, if they haven’t I should think a random ID number assigned to uniquely identify each phone in the dataset would suffice, with an associated database table that mapped these random IDs to some set of phone owner demographics.

Supplying more metadata

But for the earlier problem, the lack or freely available, continuous mobile phone metadata, I would suggest some ongoing type of dataset, periodically published by telecom providers in each city, say once per quarter for the prior year (or even two years back) phone activity. Such metadata would be freely available to anyone who wanted it.

Perhaps in the US this mobile phone metadata collection could be mandated and maintained by the FCC or some other federal entity or maybe the telecom providers would host the database for themselves by location. It doesn’t much matter where the data is as long as it’s Internet accessible, provides anonymity, and is supplied in an ongoing basis for a representative set of large cities in a country.

Given the troubles the NSA has been having, it has been suggested that some entity host mobile phone metadata outside the government which could process “approved” queries by intelligence and other agencies. It seems to me that if such an entity were to exist this would be the perfect place to provide an anonymized version of this data on a periodically delayed basis to the world’s scientist.

Once the data becomes more freely and more periodically available, there’s probably an awful lot that scientists and even commercial endeavors could make of it.

A finer grained urban physics could emerge from such data if it were more freely available. Even commercial interests would be interested in the phone metadata as well.