There are more and more location-aware services in the world. Twitter prompts you to “Add your location” to every tweet, Facebook will ask “Where was this photo taken?”, Foursquare will tell you how many other people have checked in to your current location, and there are countless social / meetup / tracking applications eager to broadcast your location to the world.

Now, understandably, some people are rather cautious about letting everyone know where they are at any one time, and if you’re concerned about location privacy you can (generally) opt-out of all these systems (or simply not sign up to them in the first place). But, they do sometimes provide a useful service… so, I’ve been wondering if it would be possible to design a system that provides the benefit of location-based services, while still protecting your location privacy?

For example, rather than having to “check in” to a location to find out if any of your friends are at the same bar as you (which involves, by definition, declaring where you are), suppose that there was some secure cryptographic system that could anonymously compare your location to that of your friends. Each individual’s actual location remains secret, but it was possible to determine whether you were geographically close to your friends, in which case both parties could be informed. If you were not at the same location, nothing is ever revealed. Note that, I’m not just talking about keeping your location secret from other users, but from the service provider as well – nothing that identifies your location is ever transmitted from your device.

In other words, could you design a system that calculates whether two locations are the same as, or near to, each other, without knowing where either of the locations is…?

Encryption/Decryption Algorithms

There are a range of encryption algorithms that you might normally associate with keeping information secret. Sensitive information held in databases might be encrypted using a symmetric key system such as AES, or an asymmetric key algorithm such as RSA, for example. So perhaps the users could exchange their location, encrypted using one of these algorithms and perform some computation on the encrypted values to see if they were close by?

Unfortunately, that’s not going to work – encrypted data may be secure, but it’s not directly usable because, in a good cryptographic system, the ciphertext (i.e. the encrypted data) reveals nothing about the plaintext message from which it was created. So, to tell whether two users were at the same place, it would be necessary to decrypt both locations first and, as soon as we’ve done that, the users’ locations have been revealed…

Comparing Hash Values

One way of securely testing whether two values are the same is to compute the hash of each value and test whether the hashes match. Hash functions are deterministic, so if two hashes match you can be (relatively) sure that the two values that generated those two hashes also match. It’s also (relatively) secure, because hashing is a one-way function – there’s no easy way to retrieve the plaintext location from the hash alone.

The problem here is that comparing hash values only allows you to test for exact equality of the two input values. Sometimes this is what you want – such as testing whether the password entered by a user is exactly the same as the password saved in a database, for example. However, what about if the location of two users was very close, but not quite the same? To demonstrate:

The SHA-1 hash of POINT(1.261 52.623) is 4a3b98975e699fbb2734d8401407e1d02d9ed5e3

The SHA-1 hash of POINT(1.262 52.623) is c52e07584cfb83ef546eccb5714fe5de524939fe

Clearly, even though the input location strings were very similar, the output hashes were significantly different. This is normally a desirable property in cryptographic hash functions – any tampering with a secure message will change it’s hash value. However, it doesn’t help Alice and Bob to find out if they are geographically near each other – unless they happen to have exactly the same coordinate location, they won’t be able to compare hashes to see how close they are.

Generalising a Proximity Test to an Equality Test

If hash matching is secure but only works to compare whether two values are exactly equal, is there some way of going from a proximity test (i.e. Alice.STDistance(Bob)<= x) to an equality test (i.e. Alice.STProximate(Bob) = True) so that we could use hash comparison?

One obvious solution would be to simply quantise a user’s location by rounding coordinates to a certain number of decimal places. For example, suppose that Alice was at POINT(1.2613242 52.5234234), and Bob was about ten metres away, at POINT(1.2613244 52.5234244). Rather than test whether the hash of the two locations were the same (which, clearly, they are not), we could instead compare the hashes of the locations rounded to some number of decimal places of precision. If both locations were quantised to 5 d.p. of precision, say, as POINT(1.26132 52.52342), the two users could exchange the hash of their locations, see that they match, and know that they were both in the same location to within the corresponding degree of accuracy. If the two hashes didn’t match, neither user has learned anything about the other’s location.

This would work, but it would require the coordinates of the two locations to be quantised to the same degree of precision. In practice, it might be preferable to allow users to decide the granularity with which their location was revealed; perhaps Alice would be happy for Bob to know that she was in the same town as him, but not necessarily that they were in the same bar. Bob, however, might want to find out exactly how close he was to Alice. The degree of precision with which two users’ locations were compared would have to be agreed on a contract basis between every pair of users participating in the system.

The next problem concerns the fact that, in real life, our movements are relatively predictable. Let’s say that Bob checked his location to Alice’s at some point in the middle of the night, at which time he knows Alice will be asleep in her house. At that point, he receives the hash of her current location – at home. It won’t match the hash of his current location(unless he happens to also be at her house), and nor will it reveal the location of Alice’s house to Bob, but he will know that, any time he receives that same hash response in the future, Alice is at home. By repeating this approach several times when Alice is at known locations (e.g. at work, her favourite coffee shop etc.) he can build a lookup table that will reveal many of her likely possible whereabouts. In order to solve this problem, perhaps it would be possible to introduce a random nonce, agreed by both parties and prepended to their location prior to hashing. For any given test between two users, the same nonce would be used so that, if their location matched at that time, the hashes would match. However, the nonce would change for each subsequent query – ensuring that no user could build up a lookup of responses of other user’s locations.

There’s still several issues to address here – the main one being that we’re not really performing any secure computation on the two location inputs – we’re merely testing whether they are the same when rounded to a certain precision and then hashed. So, while this may work in the case of simple proximity tests, suppose that instead we wanted other secure spatial calculations involving private data – perhaps Alice had defined a secure zone (Polygon), and Bob had a location (Point), and both parties wanted to test whether Bob was within Alice’s zone, without either one revealing what their secret information was. For this we need something more complicated than simple equality testing…

Garbled Circuits

During the (excellent) Udacity CS387 Applied Cryptography course, I was introduced to the concept of “garbled circuits” as a method secure computation. It’s a bit hard to explain, but the basic premise behind a garbled circuit is to allow two parties, who each have some secret data, to calculate and share the result of a computation involving both sets of data, without ever revealing their secret information to each other. This sounds ideal for this purpose – the only problem is that I’ve yet to find a practical implementation of the garbled circuits in anything like a useable form – there’s a lot of theory and academic discussion, but there’s nothing like a simple ready-made library out there (at least, not that I could find).

Wow – great example, thanks Xavier! It’s amazing to think that, although we all believe we make free choices about where we go, we exhibit such predictable behaviour that with enough data, our location in 24 hours time can be predicted within 20 metre accuracy…. Really, really interesting stuff. I’m just getting into machine learning myself, so examples like this are always welcome 🙂

Sounds similar to the Millionaire’s Problem where two millionaires want to find out who has more money without actually revealing how much money they have. I believe Off-the-Record Messaging uses an implementation of the Socialist Millionaire Problem (in which two millionaires want to know if their funds are *equal*) to compare secret keys. I don’t know how helpful this is, or how in the world one would translate this to work with spatial data. Just some more ideas to throw in the pot.