With billions of us browsing the web everyday the web has become a major source of revenue for advertising companies. By exploiting characteristics of web browsers, companies can fingerprint users and perform more targeted advertising. Most (tech savy) people are nowadays familiar with the concept of a cookie (and not the type left out for Santa on Christmas Eve). In short, this is a file left behind on your computer when you visit a particular website and is used to recognise you the next time you re-visit the website. Recent legislation has meant that websites must make clear that a cookie will be stored on your machine (see the EU Directive on Privacy and Electronic Communications). A cookie, however, is not the only way to fingerprint a user. Mayer in 2009 [pdf] and Eckersley in 2010 [pdf] showed that it was possible to exploit characteristics such as screen dimension and the list of installed fonts, in order to invisibly fingerprint a user (see below for more details). In total Eckersley was able to uniquely fingerprint 94.2% of half a million users. In last week's study group Luke discussed the paper "FPDetective: dusting the Web for Fingerprints" by Acar et al. [pdf]. This paper presents a framework which can be used to detect when websites are using these (alternative) fingerprinting techniques and presents results which show that such techniques are used in the wild.

How does fingerprinting work?

There are several ways that device fingerprinting could be performed, the most common being based on either JavaScript or Flash. In the case of JavaScript there are two objects which are most commonly accessed to perform fingerprinting:

navigator - This will contain information about the browser vendor and version, plugins supported, MIME types and some information about the OS and architecture of the machine running the browser.

screen - Information about the resolution of the monitor, and number of colours and pixel depth.

By hashing the concatenation of the content of these two objects (together with the list of plugins and MIMEtypes), Mayer was able to uniquely identify 96% of a total of 1328 clients. Eckersley extended this in the Panopticlick project by including further contents, one of which was the list of fonts. Surprisingly the list of fonts plays a big role in the uniqueness of fingerprints. As mentioned earlier Eckersley uniquely fingerprinted 94.2% of half a million users. For readers interested to see if their own browser fingerprint is unique they can visit the Panopticlick website to test it out.

FPDetective

Acar et al. created the FPDetective framework to discover when websites are performing fingerprinting. It consists of a web-crawler using the PhantomJS browser to collect data related to JavaScript fingerprinting and Chromium for data related to flash. Next their parser was used to extract data from the obtained log files. Following this the extracted data was stored and combined on a central server. Source code and further details are available here.

Using FPDetective Acar et al. analysed the top one million website (as listed by Alexa). They found that 404 of these were running JavaScript-based fingerprinting scripts, with a total of 13 unique scripts being used. One of the scripts even went a step further and tried to remove itself after running to hide any traces. Following this Acar et al. ran the experiments to look for flash-based fingerprinting and found that 95 out of a total of 10000 websites were active in doing so. Interestingly one of these was the Turkish Ministry of Education.

Mitigation techniques

The Tor Browser lists cross-origin fingerprinting unlinkability as a security goal, so unsurprisingly it has strong defences against fingerprinting. It does this by trying to make all browser sessions look the same. It also limits font-based fingerprinting by bounding the number of fonts that a page can requests (Acar et al. found a vulnerability against this but it has since been fixed in recent versions of Tor).

Firegloves was a proof of concept Firefox extension created for research purposes which tries to randomise the fingerprint. Acar et al. found that it was quite ineffective against preventing fingerprinting but since it is no longer maintained this could be because of changes to the Firefox extension system. Ironically since Firegloves itself is detectable this could actually become part of the data used to fingerprint a user due to its limited user base (1750 users).

"Do Not Track" (DNT) is an HTTP header field which allows users to request that websites do not perform any tracking and is implemented in many modern web browsers. Acar et al. ran their experiments again, this time with the DNT field set to 1, but found that this yielded exactly the same results. This shows that websites are simply ignoring user's tracking preferences.

Thursday, December 5, 2013

I've been at Asiacrypt 2013 in Bangalore this week, and taking a bit of time out from going to the talks and checking out India I thought I'd write about Nadia Heninger (@nadiaheninger) and Tanja Lange's (@hyperelliptic) presentation on their research into factoring 1024-bit RSA keys. The paper can be found here, and there's a set of slides here. This discovery was presented first at the Crypto 2013 rump session.

The one sentence summary is that they were able to recover at least 184 of the 1024-bit RSA keys generated by a certified smartcard and used in a Taiwanese national database of digital certificates for citizens.

Other that the fact that they were able to this is quite alarming, the most important thing to note here is that they didn't need anything like the computing power of a special purpose cluster or a large botnet, or a highly optimised version of an implementation of the general number field sieve to factor those 1024-bit keys; the trick here is to exploit the randomness used to generate the keys themselves.

What's the problem with randomness?

To explain this it's necessary to go back to some research by Lenstra et al. (Crypto 2012) and Heninger et al. (Usenix 2012). An RSA moduli is a large integer that is the product of two large prime numbers, so if you're given a public RSA moduli and are able to find one of its prime factors, then you can get the corresponding private key. Obviously the size of the RSA modulus is chosen to be large enough such that this should be impossible to do (note: there seems to be a general recommendation at the moment to move away from 1024-bit RSA keys up to 2048-bit keys).

The critical observation made in the referenced research was that it's easy to check whether two RSA moduli share the same prime factor (and to find that factor). So if you have a set of RSA public keys, it turns out that using the greatest common divisor (GCD) algorithm it is relatively cheap to check for the presence of any shared factors and if found, recover the private keys.

On paper this shouldn't ever be a concern, as all our prime factors should be chosen uniformly at random, and so the chance of any two being the same should be vanishingly small. However, this is not always the case---some devices appear to have entropy problems with their random number generation that result in some factors being generated more frequently. Heninger et al. were able to retrieve private RSA keys for 0.5% of TLS hosts and 0.03% of SSH hosts on the Internet, and Lenstra et al. managed to recover 12720 keys from 4.7 million moduli.

The underlying problem appears to stem from entropy shortfalls in the random number generation methods used in devices that are typically either headless or embedded. I'd highly recommend checking out bothpapers for more detail.

Going further than finding shared primes

The research team here initially tried to find shared factors within the smartcard signature dataset, and were able to recover 103 private keys. What differentiates this particular attack is that Bernstein et al. were able to factor a further 81 keys by looking in more detail at the particular properties of the flawed random number generation.

The authors hypothesise that in this instance it appears that the shared factors are likely to arise because of a glitchy hardware random number generator. By inspecting the shared primes, the authors observed that the rng seems to create primes that have a particular structure, for instance containing repeated binary patterns (e.g. "001001001001" or simply "00000000").

The trick used here was to use the observed patterns as a profile for trying to guess additional primes that are not shared with any other moduli but do closely match one of the observed patterns, and then to simply use trial division across the remaining unattacked moduli in the dataset to see if the guessed prime factor actually is a match.
There's not really space to go into detail here, but the authors then go a step further and use Coppersmith's method to find more prime factors of a particular form, getting us up to the final value of 184 recovered private keys.

One interesting aspect to this is that these smartcards passed both FIPS and Common Criteria certification, and so one might wonder why and how this randomness issue managed to sneak by the evaluation processes. According to the authors the Taiwan government department responsible has initiated the task of issuing replacement cards.

Finally, this is another example of how the lack of quality randomness can spectacularly screw otherwise perfectly secure crypto up. Random number generation is a key element of a large number of currently used protocols, and so it remains to be seen how large this particular attack surface actually is (for instance, see Nadia's Asiacrypt rump session talk on ECC), especially given the certification issue here and the ongoing discussion about Dual_EC_DRBG.

Much more detailed info on this particular attack can be found here and for a look at the problem of bad randomness producing weak keys in general, check out factorable.net---both sites are well worth a read.