I am building a system to collect a unique identifier (MAC addresses from 802.11 probe requests). The system will have several collection points over a large area, submitting to a central database. I want to do traffic analysis without allowing a known MAC address to be mapped to the points where it was seen.

My goals:

Anonymize the MAC addresses the same way for each device, to allow tracking of an anonymous device over time, regardless of which collection point sees the device.

Anonymize in such a way that an attacker with a user's MAC address and a copy of the database can't trace the user's movements.

Ideally, the addresses are secret from shortly after the moment of collection and unknown even to me (I can be considered an attacker, especially by the users being collected).

It's like I'm designing a web service where the username and password are the same. Anywhere I store instructions for generating the hashed MAC address (either a per-address salt or a general hashing instruction) I'm providing an attacker with the means to de-anonymize users.

Could I encrypt the MAC addresses using a public key for which I don't have the private key? That's really just a fancy way of hashing, and considering there are only 334 billion addresses in the registered IEEE namespace (considerably fewer for common manufacturers like Samsung and Apple), it's only partly resistant to attack.

I'm just not sure where to go from here, except to start making security tradeoffs.

1 Answer
1

Storing per address salt (assuming you are using it for encryption) is not same as storing the mapping with the hash. Salt could be public.

You could encrypt anything with a public key for which there is no private key yet, Checkout more on Identity based encryption techniques.

Few additional inputs for anonymization

Any anonymization technique for the MAC Address (either encryption, hashing or tokenization) could be broken through inferential attacks based on rest of the parameters that are not anonymized. Since you intend to do traffic analysis, you would need to deterministically anonymize (retain the one-one mapping either through hash or deterministic encryption), any adversary who has access to same anonymized database can identify the same patterns!

Other than inferential attacks on the logs/database, attacks based on implementation, key management, poor entropy etc also used in practice.

Micro anonymization techniques (say MAC addresses, IP Address and other field level anonymization alone ) are recommended only when any analytic s needs to be performed on the data, else it is better to encrypt the whole record.

Thanks for this considered response. As Pang et al put it: "No perfect anonymization scheme exists and therefore [...] anonymization of packet traces is about managing risk." I think it will probably be acceptable if we anonymize with a per-address salt (deterministically, it's true) and then delete the salts when our collection is done (or even every couple of weeks, to provide a rolling anonymity window).
–
Tim BennettNov 3 '14 at 22:27