Posted
by
timothy
on Monday December 15, 2014 @01:22AM
from the your-unique-aroma dept.

An anonymous reader writes How identifiable are you on the web? This updated browser fingerprinting tool implements the current state of the art in browser fingerprinting techniques(including canvas fingerprinting) to show you how unique your browser is on the web.
Good food for thought when three-letter agencies talk about "mere metadata."

I agree that fonts seem to be the worst offender when it comes to browser fingerprinting.
Surely browsers shouldn't need to send lists of installed fonts to web servers; a web page should simply list the desired fonts and the browser should decide, based on that, what font to use.
Is the current behavior part of a standard? Even if it is, I hope browser makers are planning on stopping this leak.

What are you talking about? Browsers don't send installed fonts list to anybody!

The detection occurs when in CSS you specify font-family: XYZ. This is going to be displayed in the default font, unless the font XYZ is installed. By analyzing the width of the element you specified the font for (or drawing it into a canvas element) you can distinguish the cases where the font is installed from the case where the default font is used instead.

It's easy to prevent. The browser should only expose a whitelisted set of system fonts to the web, which would then tell you nothing that wasn't in the user agent string to start with. With the widespread support for Web Open Font Format, it's easy for designers to provide additional fonts if they want to use them. I don't want something random on the web to be rendered in, for example, the Quake font, just because I happen to have it installed - it's almost certainly not what the developers intended, an

I, for one, find it nice that I can use all the very nice fonts available on ALL iOS devices without a 100kB payload to my users. Specially for mobile devices where 100kB payload can take quite a while, depending on the network conditions.

I know, but you don't need flash (or Java for that matter) to detect fonts in a browser. And the browser doesn't "send" a list of fonts, you have to have dynamic code to list the fonts on the client side.

But I wonder why my browser needs to provide details about the plugins I have installed to any website I visit. What kind of legitimate use could that have?

Sites recover the plugin list to see if you support whatever content they want to send you. If you don't have a certain plugin the site can fallback to some other way of displaying the information or it can refuse to do anything. For example, trying Flash to diplay a video then falling back to html5.

Is it useful ?
Somewhat, albeit less and less with html5. Also, there's many plugins sites don't need to know about, as for example a pdf plugin. Some plugins should be totally transparent because they don't interact with the site.

Even without direct enumeration, it's still relatively easy to find. The object / embed tags can be nested for fallback and the resource is only requested if you have that plugin installed. You can provide a load of 1px objects with different nesting and just check in the server which cookies show up in the requests.

Most of the plugins don't interact with the content. There's no reason my browser should announce that I have a VoIP client installed that will allow me to place calls from web sites, where the browser (via plug-in) detects a valid phone number. There are others that are similarly obscure or single-use that make it so that you are unique. More interesting to me was that I installed over half the list, but never allowed or setup browser plugins.

"Your browser fingerprint appears to be unique among the 4,789,097 tested so far.

Currently, we estimate that your browser has a fingerprint that conveys at least 22.19 bits of identifying information."

Unique amongst the browser's tested. Is there a selection bias amongst people who would check to see if their browser is unique going on? I tried with IE from a generic install of windows 7 and still get the "you appear to be unique" message.

[...] but would it not be smarter to include a list of things to make your browser less unique?

Yes, recommending what to do to improve anonymity is one of our next possible steps, but to do so you need to have data to know what to recommend, hence the site. We're looking into a recommendation system for future work.

I already ranted about fonts, but amiunique decided that my browser version (the one supported my the IT department at work) and time zone (UTF-8) and language (en-US) were enough to get uniqueness. Apparently everybody on the West Coast are running newer browsers:-)

Google serves my computer ads for mens watches, it serves my wifes computer, on the same NAT, (the same PC, same screen resolution) ads for shoes. Both have cookies blocked and flash is disabled by default. Mine also blocks lots of google sites, yet I have yet to find a way to block doubleclick. Our browsers are both set to tell sites to "do no track". Neither of us uses Google for search these days, switching to Duck Duck Go.

So the fingerprinting is enough for Google to send us personalized adverts.

Now if someone can tell me the full list of domains I need to block to prevent DoubleClick (also from Google) from serving ads, I'd appreciate it.

Actually, no. Web surfing involves visiting a multitude of sites. Whitelisting would be painstakingly difficult, especially with the wife. Even whitelisting cookies is tedious, but cookies are what you should be whitelisting. After your accept all the cookies you need (bank, Slashdot, etc...) then block the rest. Simply visiting a web site is no reason to accept a cookie. If you can identify any sites to block (DoubleClick) then blacklisting is the way to go. We're not talking about a server here, it is a web browser. Imagine whitelisting 20 sites per hour while shopping for a pair of shoes.
What I do is to identify what sites are serving me ads, surf those sites while capturing packets using your favorite tool (NetworkTrafficView from Nirsoft if using Windows is easy) and block those sites using your firewall (IPs) and/or hosts file (FQDNs). I haven't seen a DoubleClick ad in years. In Windows my hosts file looks like this:
0.0.0.0 ad.doubleclick.net
0.0.0.0 ad.uk.doubleclick.net
0.0.0.0 ad.n2434.doubleclick.net
0.0.0.0 doubleclick.net
0.0.0.0 a.doubleclick.net
The Slashdot filter made me cut quite a bit out, but you get the idea.
This work has already been done and gets updated for you here: http://someonewhocares.org/hos... [someonewhocares.org]
My Windows Firewall is more extensive. I block massive subnets in Russia, Ukraine, and China (ex. LACNIC Latin American and Caribbean 190.0.0.0/8). This is all for a laptop that leaves the house. For an in-home solution you should get a better router and block them at the gateway so your iPad is safe too. pfSense is very flexible, but DD-WRT can do some neat tricks.

Sorry, but blacklisting involves blocking a lot *more* sites, and ongoing maintenance to keep that list updated to account for changes that are out of your control. A whitelist needs initial setup, and only requires changes based on your needs.

My browser is whitelisted. I do what I described.

It is not 'painfully difficult', your wife acceptance factor notwithstanding. I (and I expect most people) visit the same sites day after day. I am

"YOU WOULD HAVE TO WHITELIST TO ACCESS THEM @ ALL (you're talking a practically never ending battle there))."

For the love of Pete, did you not read my original post?

For casual exceptions to the whitelist you use a free filtering proxy. My example was Startpage.com/ix-quick. You don't need to update your whitelist except for sites that you have an ongoing relationship with.

Once again, I *do* this. It's mostly a set-and-forget. And the great thing about it is that if some new tracker/adfarm thing comes al

Not being funny, but that's hardly tracking unless you are actually after a watch or shoes. I imagine a watch / shoes ad is the kind of thing that a company will push to everyone this near to Christmas.

Also, I once got several months of leotard adverts because I happened to click something in our (school) web logs to check it was okay for pupils to see. There's just a correlation on the ad networks between your IP and something you may have clicked / searched / been on. It doesn't mean they are tracking you, per se. They just realise that you are two separate browsers with two separate signatures. Lots of things can do that, even being a single plugin different. Just being logged into a certain account on one site might push certain ads your way.

Load up Ghostery and visit your normal sites. See how many of them are also serving up ads etc. that can form correlations between your browser and a certain product. Cookies blocked everywhere? I don't believe it, you'd never be able to log into anything. Flash disabled? Well, yes, I have that by default but for security not tracking. "Do not track" is an absolute waste of time. And just because duckduckgo doesn't track you, doesn't mean the sites you land on don't.

Take this "for instance" - your wife went on a shoe shop once. You went on a watch shop once. Both the same IP. But one of you was also logged in elsewhere on a single other site. Bam. You get different ads. Just being a 0.1 version out on your browser will distinguish one from the other. Or having slightly different plugins. Or even just having different source port numbers (as NAT'ing will ensure).

Sorry if you don't realise this, but the amount of effort you're putting into making your life hard and hiding, is actually just making you stand out just the same. How many hours have you wasted trying to block this stuff, and still you're identifiable?

Either start fresh every session with a Privoxy proxy and fake user-agent strings, or don't bother. And even that won't hide you. And even then, you'll never know if the watch advert was for something you clicked years ago, or random spam because they know nothing about you and pick a random product. Hell, do you even know if you haven't each separately cached a random advert?

Why do you even.... >>> "Cookies blocked everywhere? I don't believe it, you'd never be able to log into anything." Try wiping your cookies on session or window close. You can accept cookies and not keep them longer than necessary.

>>>"Flash disabled? Well, yes, I have that by default but for security not tracking. "Do not track" is an absolute waste of time. And just because duckduckgo doesn't track you, doesn't mean the sites you land on don't." The sites will track them only if yo

Now if someone can tell me the full list of domains I need to block to prevent DoubleClick (also from Google) from serving ads, I'd appreciate it.

I use a HOSTS file, it serves me well; quite large at this time. I also take the time to read a sites TOS they will tell you what to block (though one link says they don't mention Flash Cookies (or one of the three mentioned).

Read the TOS of ROVIO.COM (Angry Birds); "sent overseas" well where?When I last read it long ago it gave me a lot of sites to block; the most important being Flurry.com.

Angry birds (all of ROVIO.COM programs) collect your information then sells it to Flurry.com (It's Google) who in tur

Apparently, Ghostery is pretty effective at blocking doubleclick. I do not get those personalized advertisements. The ONLY place where "ads" are even somewhat accurately aimed at me, is Amazon. If/when I clear cookies, and browse without signng in, their limited accuracy disappears.

The problem is that they don't differentiate categories well. Having bought some kind of computer thing means that I might be interested in buying some kind of computer thing again, but having bought one hard drive probably doesn't mean that I want another very similar (but not identical) one soon. In books, it's very different - if I've bought one novel then I probably want to buy another very similar (but not identical) one next time I shop. The same is true for a lot of things on Amazon - DVDs, CDs, a

Below are the results I got. Really? So I'm the only person who speaks English, running Chrome on Windows 7, in the Central time zone? If that's enough to identify me, then I'm feeling pretty exposed.

Google, on the other hand, can probably tell me my life history, with all the data they have on me.

Yes! (You can be tracked!)34.59 % of observed browsers are Chrome, as yours.22.54 % of observed browsers are Chrome 39.0, as yours.58.71 % of observed browsers run Windows, as yours.40.04 % of observed browsers

Well, they claim 1 in 11000, as opposed to 1 in 20000. I question their math. (And yours). You don't get to multiple the liklihood of Chrome and Chrome 39 together, they are highly correlated. See also Windows and Windows 7.

Your understanding of their last statement is mistaken. The 1 over 11099 has nothing to do with the above statistics. It only says that of the 11099 browser tested, there are only 1 with the union of the above elements. How big a set is, is irrelevant when considering its union with one or multiple other sets.

However, what the statistics do tell you is which of those parameters is more or less common with the ensemble. Eliminating a rarely occurring parameter could move you to a more common set intersection

Your understanding of their last statement is mistaken. The 1 over 11099 has nothing to do with the above statistics. It only says that of the 11099 browser tested, there are only 1 with the union of the above elements.

You're spot on, that's exactly what it says.

How big a set is, is irrelevant when considering its union with one or multiple other sets.

However, what the statistics do tell you is which of those parameters is more or less common with the ensemble. Eliminating a rarely occurring parameter could move you to a more common set intersection, making you thus less traceable. But deducing the union probability from the set statistics is not trivial, if possible at all without further constraints.

We're looking into putting in a recommendation system to help users improve their anonymity.

But I am wondering if 11099 trials can be considered significant in this case. There are looking at 6 or more parameters which have countless possible values.

It's sufficient for us to do quite a bit of analyses on the data and to possibly implement and provide the recommendation system. The data is however highly skewed towards geeks and towards user's in France (a.k.a french geeks!).

Disclaimer: a couple of colleagues and I created amiunique.org to get some data to understand fingerprinting better. It's a small student project b

But that probably says more about the people who would visit the site than it does of AdBlock users.Especially with the sample size so small at is is. https://panopticlick.eff.org/ [eff.org] has a much much higher sample base.

Other things that could be checked but which aren't include whether the browser allows SSL2, SSL3, TLS1.0, TLS1.1, and what kind of encryption.Also, the ballpark speed at which it evaluates Javascript.

Pray tell us how to use hosts files through a proxy server.It's the proxy server that looks up the host names, not your local resolver.

Also, how well does it work with wildcards? There are ad companies that use thousands of random hosts, of the form 47db.adcompany.com, 1a74.adcompany.com, 357f.adcompany.com. With a hosts file, you have to fill out every single possible entry ahead of time, because it doesn't take a wildcard like *.adcompany.com.

- Any host on the 123.64.0.0/11 network.- Any host that ends with.2o7.net regardless of hostname[*].- Requests that embed a hostname or IP address in the URL

[*]: You are aware that some trackers use pseudo-random hostnames that are resolved through wildcard DNS entries, right? That way they can track exactly where you came from too, because the hostname will be unique for just you.

All you have to do is give examples that do the above. It's you who claim hos

No, Privoxy won't help if you have to go through an external proxy. You know, one that you don't have control over, but where work can log who visited what pages. Work, like what you don't have because you're a kook and unemployable.

With a remote proxy, no local resolving takes place at all (other than the address of the proxy server). No matter what hosts tables you have set up on your local machine doesn't matter because the resolving doesn't happen on your machine at all.

I'm unique as well, however the it gave a list of what items I was unique in Namely the only thing that I did not share with the vast majority of others was the exact nature of my plugin list. The exact version and names of all enabled plugins apparently had a unique configuration..Personally I don't see a need to broadcast my plugin list is there anyway to prevent it?

... of course they know who you are. You need an IP to send and receive information, just the nature of making a connection leaves a trail all by itself. Next it's not that hard to develop mathematical techniques to analyze text and language in posts since they can analyze that most people have limited memory and interest by nature of them being finite beings and can simply build profiles by simply combining all the little tiny bits of different info into some unique ID if they wanted to.

The nature of our technology has augmented our ability to see and detect so much it's increasingly difficult to hide anymore. I shudder to think how small cameras are becoming and how they will be all pervasive where it matters. We're basically moving into a "tripwire" society where hidden and not so hidden automated track wherever you go what you do and all that data can be stored, analyzed, etc.

Next it's not that hard to develop mathematical techniques to analyze text and language in posts...

Budget projects much? "Doable" and "easy" are not the same words. I'm guessing one person out of a hundred in the general population could take a reasonable stab at developing such an algorithm, and only one person out of a thousand could be considered a natural talent.

The first 20% of the work gets you to sqrt(sqrt(7e9)) as your mean perplexity, which is simultaneously impressive and yet not terribly actio

Standard Mozilla behaviour last time this question came up is to include a list of fonts that your browser can display; I don't know whether other browsers do the same, or if they've changed it, but it's the kind of "feature" that hopelessly breaks your chances of non-uniqueness if you've ever installed fonts.

My work laptop has a font that's the Official Corporate-Branded font for $DAYJOB's corporate logo. Almost every Windows machine at my company has that (at least, every physical machine and the virtual machines running on the hosted virtual desktop cloud; there may be some lab machines that don't, and maybe some contractors, etc.) You might work for a smaller company that does the same. In my case, I've installed all sorts of other random fonts, either to see what they looked like, or simply because back in the 80s of course you wanted Elvish and Dwarvish fonts on your computer, or because I wanted a better monospaced programming font than the default MS one or Courier New.

Lots of other things leak information as well (cookies, etc.), but fonts are a quick and dirty way around identifying people who block those.

It really doesn't matter to anyone except people who block cookies (and that's not you, because you're logged in). Those people are so rare, I don't think anyone's using any alternate method to track people. Cookies work well enough for tracking.

It really doesn't matter to anyone except people who block cookies (and that's not you, because you're logged in). Those people are so rare, I don't think anyone's using any alternate method to track people. Cookies work well enough for tracking.

Actually there are commercial fingerprinting services. The Cookieless Monster [ieee-security.org] does a good job at analyzing them. Many sites like Google, Twitter, Facebook and others mention the colleciton of "device information" in their privacy policies too.

It seems to me that it would be simpler for Firefox (and other browsers) to just whitelist a default set of fonts and those are the only ones it uses regardless of what might be installed on the system on any site you are trying to limit tracking. (It can allow for web embedded fonts; it just won't load anything but the default set from the system.)

If MS wanted to do it for IE, they'd just have the non-default font set blocked for the "Internet Zone" and allowed for the "Trusted Zone" which should cover most intranet scenarios where they've got custom fonts.

I suppose an "exceptions" list could be managed separately as well if was really necessary; or it could be tied to the cookie exceptions list -- which would be logical from a "privacy reasoning" perspective... but counter-intuitive from the "why are local fonts not loading for this site just because i blocked cookies" perspective.

In any case the upshot is that any given version of any given browser on any given platform will have the same fonts available as any other instance of that version of that browser on that platform -- then "font profiling" adds nothing to the basic platform information they already had.

According to that site "[I] Can Be Tracked!" because my fingerprint is the same as 11,775 others. That number seems to be generated only by people visiting the site meaning the pool would most likely be larger.

Obviously Browser Fingerprinting is a real thing, but that site seems to be geared toward hyperbole than actually educating.

GIven most of the data is what's reported by a browser, why don't browsers filter the data?

Especially if "Do Not Track" is set to on - why don't they limit the data to send back?

Fonts - Microsoft released 6 fonts for the web over a decade ago - just report those 6 across all platforms and maybe a few standard system ones (you can get this from the User-Agent anyways). Make it whitelist of fonts.

Sure, some data is gathered through plugins, but I thought many are now click-to-run so you can't get that data unless you specifically run those plugins.

Especially if "Do Not Track" is set to on - why don't they limit the data to send back?

You have misunderstood what "Do Not Track" means.

It turns on a flag always telling remote websites "this user does not want to be tracked". It has nothing to do with telling your browser to change its behavior, it gives remote sites a piece of information about your wishes.

Whoever came up with the idea was a dumb shit, and whoever let it become implemented as a browser option was even dumber - it was blindingly obvious from the star that in real life, it's just sending the remote site one more bit of infor

No, I don't think he did. He was suggesting that browsers truly act on that option selection in a useful way. You misunderstood his post.

The Do Not Track option is defined in the RFC draft [ietf.org] as not doing anything except sending the DNT: 1 header to a remote server. Having it do more goes against the specification.Of course, browsers can implement other functionality to thwart tracking, but not as part of Do Not Track, which has a very specific meaning.

Actually, Google are decidedly fearful of DNT being on by default, because unlike muggers, they have to obey the law - they can't actually willfully violate expressed user preference without risking a major class action of sorts. That's why they fought so hard to effectively kill any hope of DNT being useful (remember, they were part of the standards committee for standardizing it - the wolf guarding the henhouse).

Most of it is fed to your browser and then your browser regurgitates it as it's expected to.

If I modify a web server to send only you a random numbered URL, and then watch for that random-numbered URL, I've formed a correlation between your IP and your browser session. If I can get that to tie in with other sites, or give me the slightest hint about those, I can correlate the information.

If I get your browser to go to a random link, and you have history settings t

Because the info sent is used by some sites to determine how to deliver content to you... and when several websites stop working in the latest browser, the users will be the first to say 'what a crappy browser, I'm going to use a different browser'.

I have the same nick/password on several sites. Including, but not limited to,/., soylentnews, fark, ultimate-guitar, ars-technica, a couple dating sites, a gay dating site, a site dedicated to midget transvestites, and petitions.whitehouse.gov. Feel free to track me.

Dang, I should change the latter. Some of the petitions I sign could be embarrassing.

That said, I assume the original article meant something more subtle. I wouldn't know, the link is dead to me.

A Reprieve team investigating on the ground in Pakistan turned up what it believes to be a confirmed case of mistaken identity. Someone with the same name as a terror suspect on the Obama administration’s “kill list” was killed on the third attempt by US drones.

What this tells me is that what I really should worry about is to accidentally having metadata that correlates with someone that the government wants dead.

I just tried https://panopticlick.eff.org on my iPad and Windows PCs. The Windows PC was uniquely identifiable with Firefox or IE but the iPad came out as 1 in 24 million. Looks like there is an advantage to Apple's locked down standardised platform.

Yes, like many, my result was "Unique". I noticed that one item being measured was browser resolution. Since I was running my browser at less than full screen and the exact window size is a low entropy parameter, I decided to try again after maximizing my browser window. As expected, the result was a lower uniqueness score. That led me to wonder if some technique like modifying the exact size of the browser window by a few pixels each time it's refreshed might help somewhat to hide from these evil trackers.

I just spent far too much time playing around with this, on an extended lunch break. I note the following things:

- You had better disable explicit tracking services (Ghostery), or it all doesn't matter anyway.

- Fonts are a big factor. Fonts are identified through Flash. There is a configuration file "mms.cfg" that can disable this. The location of this file depends on your operating system and on your browser - it took me a good half-hour to find it for my particular configuration.

- However, even after disabling fonts, and even using a "user-agent switcher" to look like a Windows/Chrome combination (instead of Linux/Chrome), I was still uniquely identifiable. The biggest factor were my language preferences, the list of plugins, and the precise browser version. Refusing to report system fonts was also pretty important:-/

In short, there's not much way around it - if you include other information available, like your IP address, you will be uniquely identifiable, and trackable across websites.

What is missing from this picture: Browsers provide an "incognito" mode. This mode needs to be extended to provide only absolutely essential information to the server. The server needs to know roughly what level of standards support you have (e.g., "Mozilla/5.0"), and what language to send content in (one language, not a list with weights). Everything else could be omitted, and virtually all websites would work perfectly.

Go a step farther and disable JavaScript in incognito mode, to prevent explicit sniffing. That will disable more websites, but if those sites start losing traffic, they'll offer versions that don't require JS.

First, the flippant comment:I find it astonishing that in this day and age when apparently they can track everything I do, want, and own online without my permission, my ATM still asks me WHAT LANGUAGE I want to use? Seriously? After I've answered that once, it's done. I'm not changing my native language guys. Offering it subsequently is doing a favor only for the foreign-language dude that steals my card.

Second, the serious one:a) the site itself is fairly vague and misleading:"Yes! (You can be tracked

... that has used their website so far. They've only got 24000ish data points; I can well believe that at this stage, small correlations result in apparently weird results. Give them a few million samples and I bet that those factors won't make you unique anymore.

Of course I am unique from their sample, I used an unreleased test version of a browser - I had to be unique. However, that version of tracking is useless as I have... 7 different versions of browsers on my system, they would not know they were the same person on the same computer. (And I have 3 other computers plus a couple of tablets.)

But most of my "uniqueness" seems to be about the fact that I'm a Mac user, using Safari. They also extracted a lot of fonts. What I wonder is, how useful is this information if I'm blocking ads and trackers, tossing cookies regularly, and using a VPN? To whom would it be useful?