The Real Deal

Persistent cookies are nothing new. Essentially the strategy works like this: Store data everywhere you can on the users footprint, and if data it deleted in a few locations, you copy it back from another location the next time you can. It’s regenerative by design. A popular example is evercookie which uses:

Standard HTTP Cookies

Local Shared Objects (Flash Cookies)

Storing cookies in RGB values of auto-generated, force-cached PNGs using HTML5 Canvas tag to read pixels (cookies) back out

Storing cookies in and reading out Web History

Storing cookies in HTTP ETags

Internet Explorer userData storage

HTML5 Session Storage

HTML5 Local Storage

HTML5 Global Storage

HTML5 Database Storage via SQLite

Note that several of these aren’t HTML5 specific. More than one of which isn’t cleared by just “erasing cookies”.

HTML5 does add a few new possibilities, but they are also by design as easy to control, monitor and restrict as your browser (or third-party add-on) will allow. HTML5 storage mechanisms are bound to the host that created them making them easy to search/sift/manage as HTTP cookies. Much worse are some of the more obscure cookie methods (Flash Cookies, various history hacks). They don’t really provide any more of a privacy risk than what the browser already has been offering for the past decade.

To Shut Up The Geolocaiton Conspiracy Theorists

Before someone even attempts the “Geolocation API lets advertisers know my location” myth, lets get this out of the way. The specification explicitly states:

User agents must not send location information to Web sites without the express permission of the user. User agents must acquire permission through a user interface, unless they have prearranged trust relationships with users, as described below. The user interface must include the URI of the document origin [DOCUMENTORIGIN]. Those permissions that are acquired through the user interface and that are preserved beyond the current browsing session (i.e. beyond the time when the browsing context [BROWSINGCONTEXT] is navigated to another URL) must be revocable and user agents must respect revoked permissions.

Some user agents will have prearranged trust relationships that do not require such user interfaces. For example, while a Web browser will present a user interface when a Web site performs a geolocation request, a VOIP telephone may not present any user interface when using location information to perform an E911 function.

To my knowledge no user agent implements Geolocation without complying with these specifications. None.

No HTML5 Needed For Fingerprinting

Even if you do manage to wipe all the above storage locations, you’re still not untraceable. Browser fingerprinting is the idea that just your system configuration makes you unique enough to be traceable. This includes things like your browser version, platform, flash version, and various other bits of data plugins may additionally leak. The EFF recently did a rather impressive study to learn about the accuracy of this technique. Computers with Flash and Java installed sport 18.8 bits of entropy and result in 94.2% of browsers being unique in the EFF study [cite, pdf]. Of course their data was likely skewing towards more experienced web users who are more likely to have an assortment of customizations to their computer (specific plugins, more variety in web browsers, operating systems, fonts) than the average internet user. I’d wager that their data downplays the effectiveness of this technique.

The idea that HTML5 is a privacy risk is FUD. It doesn’t provide any worse security than anything else already out there. It’s actually easier to counteract than what’s already being used since it’s handled by the browser.

The Future

I still believe all browsers out there can do a much better job of protecting privacy when it comes to local data storage for the purpose of tracking. What I believe what needs to happen is web browsers need to start moving away from the “cookie manager” interfaces that are now a decade+ old and move towards a “my data management” interface that lets users view and delete more than just cookies. It needs to encompass all the storage methods listed above as supported by the browser. Hooks should also exist so that plug-ins that have data storage (like Flash) can also be dealt with using the same UI.

Additionally it needs to be possible to control retention policies per website. For example I should be able to let Google storage persist indefinitely, Facebook for 2 weeks, and Yahoo for the length of my browser session should I wish.

My personal preference would be for a website to denote the longest storage time for any object on a webpage in the UI. Clicking on it would give a breakdown of all hostnames that makeup the page, what they are storing and let the user select their own policy. With 2 clicks I could then control my privacy on a granular level. For example visiting SafePasswd.com would give me a [6] in the UI. Clicking would show me a panel this:

I could then override googleads.g.doubleclick.net to be for the browser session via the drop down if that’s what I wanted. I could optionally forbid it from saving anything if that’s what I wanted. I could optionally click-through for more detail or view the data to help me make my decision. Perhaps this would also be a good place for P3P like data to be available. One of the notable failures of P3P that impeded usage was it was never easy to view so it never caught on.

The browser would then remember I forbid googleads.g.doubleclick.net from storing data beyond my browser session. This would apply to googleads.g.doubleclick.net regardless of what website it was used on.

This model works better than the “click to confirm cookie” model that only a handful of people on earth ever had the patience for. It provides easy access to control and view information with minimal click-throughs.

It also makes a web page much more transparent to an end-user who could then easily see who they are interacting with when they visit one webpage with several ads, widgets, social media integration points etc.

One click to view data policies, two clicks to customize, three to save.

HTML5 is not a risk here. The web moving to HTML5 is like going from the lawless land to a civilized society where structure and order rule.

Share

8 thoughts on “On HTML5 And The Future Of Privacy”

For the geolocation part, the people are rightfully worried but target their attacks on the wrong part: the browser. The scenario is following this: 1. An app requests to share your location at this time t inside your browser. 2. “Yeah, ok, I’m fine, I really want to have the driving directions” 3. The browser sends along the MAC Address of your computer along other information helping to geolocalize you. 4. The Web service *keeps* this information in their database. 5. There is no way to opt-out the MAC address of your laptop. Yes laptop, not the router.

This one is a difficult issue. Most people might find it cool to be able to geolocate themselves, but not cool to be geolocated. What is the status of MAC address on our laptops. private, public? etc? plenty of interesting issues around that.

I understand the frustration of html5 used for everything, though every marketing departements use it for good or bad purpose and then the crowd is confused. Maybe the best way would be to create tools using “html5” or more exactly APIs helping users to have a better control. For now all add-ons and browsers are suboptimal to help us in this task.

The MAC address of your laptop really doesn’t give away much more than the manufacturer of your NIC. It’s unique, but hardly locates you (in fact it’s the same no matter where you go). What’s really important is the MAC address of the WiFi access point you connect to, which may be in a database that can help locate where the hotspot you are is located. That however is disposable. Once you leave that location, you leave that access point. Knowing the address is of no use to anyone, and no privacy concern to the user. MAC addresses are only shared to the next hop on the network, so no website can track you by it. The only ones who can possibly track/ID you are a hotspot provider or your ISP/school whomever you connect to.

This is hardly a security risk. It’s just a misunderstanding of how the technology works. Your MAC address is not your IP address. It doesn’t reveal your location. It can’t be shared with websites unless you opt in every time via the geolocation API. Again, it only travels one hop. This is a fundamental part of the OSI model.

If you choose to allow it, the Firefox Location- Aware Feature first collects one or more of the following relevant location markers: (i) location provided by a GPS device built into or attached to your computer or device and/or geolocation services provided by the operating system; (ii) the wifi routers closest to you; (iii) cell ids of the cell towers closest to you; (iv) the signal strength of nearby wireless access points and/or cellular phone towers; and/or (v) your computer or device’s IP address. Next, it attempts to determine your location using these location markers. Any information Firefox uses, receives or sends as part of this Location-Aware Feature is not received by any Mozilla servers or by Mozilla. Firefox does not track or remember your location. Firefox does remember a random client identifier, the temporary ID assigned by our third party provider to process your request, for two weeks.

That’s an XSS attack against a vulnerable FiOS router. It’s not indicative of HTML5 or 99.999% of the world’s infrastructure. In fact that’s not even using the geolocation API. It’s taking the MAC address and passing it into the Google Location Services API to get a location. Again, not HTML5.

And yes, you are correct regarding what Firefox collects. I believe “wifi routers closest to you” includes the MAC address of those routers. Just like what you’d find in NetStumber or iStumbler.

Because of existing security problems, I run browsers and other software on a non-persistent Virtual Machine. When done, I power down thereby erasing all file changes. I advise anyone engaged on online banking to do the same.

Eh geolocation now is down to even just using the IP address to just about pin point anyone to a general geolocation. With a simple cross reference table you can get down to the city / division of a large city just from the IP address. So tracking you is just too easy when you combine a few already existing methods, between data cookies, IP address, etc. It’s simple to know where you are.

Now my problem with HTML5 is there are wholes / leaks in the html5 security model. They even mention it in the specifications and ask for resolution suggestions. Problem is it’s easy to fake the cross host steal all the data stored. Not to mention running a user through a proxy server really starts messing up the local storage with html5. We all have had the drive by proxy attack. Where suddenly any time you launch the browser our spammed by ads etc. It’s drive by malware, which will become even easier to setup with html5 cause you will have to have javascripts allowed to run to view html5 sites. (assuming the site wants to use html5 for animations, etc.) Instead of using the proxy for spamming ads, use it to fake host location to the browser. It will be invisible to the end user, minus possible slow internet, yet anything locally stored html5 data storage can be easily extracted.

Now is that different than what has existed? No no real new threat coming from the browser. The THREAT IS THE WEB DEVELOPER. What I mean by this is… 90% of the current data store is all about just tracking you for marketing data. With html5 your Joe Blow web developer that doesn’t understand cross domain attacks etc, will store lots of personal data into the html5 storage. He doesn’t know better, and you as the user doesn’t know its being stored / how / what is being stored. That is the problem, the fly by night developer that puts a users last 10 orders on their site into your local html5 storage that as long as the request is coming from their host has access to. Well lucky me I have started a javascript applet that you code into your site and for every visitor I pay you one penny. We will leave off the javascript steals all of that local stored data and ships it to my server. That is why I worry about html5 and its storage options. You just took the “simple” web and made it into a security nightmare of checking any and everything that goes onto your site / host, cause its looking to exploit. For a real web developer not a big deal. For the webpress site where the user throws in random modules they got for free, that look cool and flashy. Are actually stealing local storage from html5 data. Not a problem except that new log on module Jeff (with all good attentions) built uses html5 storage to keep all your profile information handy and your username and password.

That is why I worry. Now yes it still exists in todays world, nothing new is happening, except html5 and its proponents keep pushing all of these “great” features without talking about all the negative impacts it can have. They keep showing the shinny side of the coin, and forget to go into details about how all of the new html5 improvements have a negative side and some / a major chunk of them are serious. The problem of this is simple I have already had about 10 of my web clients, come to me and want a pure html5 site cause their friends said the new features can remove the need for the back end server and database. (I charge for a sql database backend $5 bucks a month) but they still want to save. Technically yes it can be done, is it at all smart? Nope so now I have clients thinking I am trying to rip them off, cause they heard all of this great things with html5 and how its replacing current technology. Without hearing why its bad. So I have had to setup numerous meetings to cover the problems.

I think html5 is great and is needed, but its being marketed in a way that makes it a security nightmare that is dangerous.

Most of these can be counter-acted quite easily (with the big exception of browser finger-printing which can be tricked by changing the user agent of the browser and other values that it gives out to all websites. This is very easy for Firefox.)

Use Ghostery for Firefox to block known web bugs (up to 408 now). Even easier, use Flashblock, Adblock, and NoScript to block ads, flash objects, and JavaScript.

Do all this and regularly clear your cookies (or manually set expiration policies) as well as cache, history, and all other forms of browser saved data. You will be reasonably protected against all but the most stubborn ad companies and other goons interested in violating your privacy.

A proxy can also be used to hide your IP address but this is slow and overkill.

Oh yeah, don’t use HTML5 if you are worried about privacy since no browser offers a decent method of clearing HTML5 storage yet.

I just listed several broad methods to increase your privacy on the Internet. Basically, use a browser that does not use HTML5, don’t use JavaScript or Flash unless in a case-by-case basis or by clearing LSOs, use NoScript / Flashblock to regulate JavaScript and Flash, regularly clear browser information like cookies/web history/cache, use a proxy to hide your IP address, and modify the browser to trick browser fingerprinting.