Posted
by
CmdrTaco
on Monday October 11, 2010 @07:45AM
from the are-you-scared-yet dept.

Hugh Pickens writes "The NY Times reports that in the next few years, HTML5 will provide a powerful new suite of capabilities to Web developers that could give marketers and advertisers access to many more details about computer users' online activities. The new Web language and its additional features present more tracking opportunities because the technology uses a process in which large amounts of data can be collected and stored on the user's hard drive while online. Because of that process, advertisers and others could, experts say, see weeks or even months of personal data that could include a user's location, time zone, photographs, text from blogs, shopping cart contents, e-mails and a history of the Web pages visited. 'HTML5 opens Pandora's box of tracking in the Internet,' says Pam Dixon, the executive director of the World Privacy Forum. Meanwhile Ian Jacobs, head of communications at the World Wide Web consortium, says the development process for HTML5 will include a public review. 'There is accountability,' Jacobs says. 'This is not a secret cabal for global adoption of these core standards.'"

Browsers are still going to be the ones in charge of that kind of storage, just like history, cookies and other current way's of tracking user information. It's just going to require users to CONTINUE being careful about their web usage. I don't see that anything is changing.

I think the XP commands [microsoft.com] still work, I don't use them all but some of them are fun. Try:Taskkill/im [program_image_name]/fas a batch file to kill those programs that want to stay running in the background

What features does HTML5 include that let one server access any data other than that created by that server, or by the client user through the HTML GUI sent by that server? Why should any client state be available to the server, except the same kind of client-side feature list of supported media types and browser version that we've had since HTML1.0?

1- Browsers not enforcing restrictions, or bugs allowing short-circuiting them, so even if only originating sites "should" see their own databases, maybe they won't, in reality. What really happens when a page loads an ad frame which launches a Flash applet that tries some HTML 5 gimmicks ?

2- Ad servers and web sites collaborating to circumvent restrictions. I activated Opera's "only accept cookies from the site I'm visiting", I've been getting somehow flaky behavior from some sites. Will t

well, html5 local storage is one more venue where sites, advertizers and trackers can store data. Not different in theory from browser and flash cookies, but still, in practice, yet another mechanism open to the abuses i listed ?

You're referring to the "same origin policy" and you're right. There are 3 new mechanisms in HTML5 for remembering something across page loads (added to the older 4th mechanism, cookies), and all 4 of them are subject to the same origin policy.

Many of the new features of HTML5 exist to allow browsers to do the same things as plug-ins. A poorly written plug-in is a much bigger security vulnerability than the well-thought-out new features of HTML5, which were largely contributed by browser vendors themselv

I agree with you. And a single browser is easier to ensure is secure from attacks than some ever changing collection of plugins from different developer groups. And it's much more likely that someone will see a hole in Firefox's open source that implements an HTML5 feature, than will see all holes of its kind in each of the plugins that are more likely to contain at least one closed source component.

Why not just have a script that clears it out every so often, ie once a week or whenever your reboot? Your internet connection may be fast, but using the cache where applicable is presumably still faster.

For one, it continues the schizophrenic dissonance of trying to separate content from presentation on the one hand while merging content and presentation on the other. It needs to be simplified, not get yet another layer of lard.

It's a very similar problem to the privacy concerns over Flash about 6 or 7 years ago. When people realized you could store a lot of information separate from standard browser cache, people started taking advantage of the situation until it was patched. Similar things with HTML5, breeches will be discovered, then much later get patched after the damage is done.

It's a very similar problem to the privacy concerns over Flash about 6 or 7 years ago. When people realized you could store a lot of information separate from standard browser cache, people started taking advantage of the situation until it was patched. Similar things with HTML5, breeches will be discovered, then much later get patched after the damage is done.

It's not similar at all. The problem with Flash is that users who clicked "Clear my cookies", or even used their browser's privacy mode, would still not clear Flash's data – because Flash is a totally separate program. The HTML5 browsers are part of the browser and integrated into it, so clearing localStorage/sessionStorage/WebDB/etc. is no additional effort when you clear your cookies.

Try it yourself and see. On Chrome, for instance, it's wrench -> Preferences -> Under the Hood -> Clear

Is the change that there is more space for cookies? up till now its been like a few 100 kbs right?

Roughly speaking, yes. Cookies are sent on every HTTP request, so they can't be longer than a few kilobytes in practice (100 KB is unlikely to work). localStorage and friends are typically more in the ballpark of a few megabytes per site, and since it's only accessible to local JavaScript rather than being sent over the network, it can really be unlimited. It's only kept to a few megs by default so that you don't get zillions of sites each storing a bunch of data and eating up your disk space. So withou

Because of that process, advertisers and others could, experts say, see weeks or even months of personal data that could include a user's location, time zone, photographs, text from blogs, shopping cart contents, e-mails and a history of the Web pages visited.

Folks, I thought this isn't new at all. Don't cookies do the same thing? I have a cookie that will 'never' expire unless I delete it. What am I missing?

Flash has already become a problem. As in those zombie cookies that Adobe didn't feel inclined to offer a way of getting rid of or deciding to decline. Being able to store things with flash is fine, as long as the end user gets to decide and is aware of it.

That's the first result for a Google search on 'flash prefs', but that is pretty much an incantation, not something most people will think of right away. Getting rid of existing flash cookies requires visiting another page there:

The quick and painless answer would be to download a Flash Cleaner [cnet.com] from a reputable host site. There are quite a few that have popped-up in the last year.

I cannot really comment on the efficacy b/c the one I use [softpedia.com] never finds anything thanks to SandboxIE & NoScript. I run it to double-check occasionally, it's portable...no install required.

But does the uninstaller of Flash Player for Windows remove LSOs and other Flash settings (like apt-get --purge remove packagename)? Or does it remove only the plug-in and leave the LSOs and settings behind (like apt-get remove packagename)?

Creatures such as Flash should never be able to store or read anything. They should be locked in their sandboxes with only the input the browser chooses to give them.

The browser chooses to give them a sandbox within whose confines they can store or read what they want. It's called "offline support". Otherwise, web applications would stop working when the client machine disconnects from the Internet.

If corporations want an offline app, they have to develop offline software.

How do I develop offline software and deploy it to Windows users, Mac OS X users, and GNU/Linux users without tripling the development budget? Then how do I deploy software to people who aren't root on the machines that they regularly use?

I know how to get to Google, but I don't always know how to choose the right keywords given that so many words have synonyms. Nor does Google have an index for the reliability of sources found in search results, apart from whether or not they cause malware to be downloaded.

HTML5 can generate and save 5 meg of data locally, but as far as i know there isn't a limit to how many megs of resources they can cache from the server. The server could easily save tons of information in the form of images and text files or whatever you want in your local cache. I don't think it can be read by other sites though.

i also don't really understand the html5 part of this. for example, one claim is that websites will be tracking your gps location via html5. Ok. some people don't want that tra

He said that some thing stores exponentially more data than some other thing. He certainly didn't say that Y stores more data than it did at time X ago. If there is a mention of time in the post then kindly point me to it.

It's like saying that a machine gun causes exponentially more casualties than a crossbow[1]. You might interpret that as meaning that "through history the lethality y of weapons has increased as a function y = k**x where x = the number of years since yada_some_event". However nobody wi

It just needs a couple hundred bytes to insert an URL to your personal tracking record.

Not all portable devices have cellular data plans, especially in the United States of America. PDAs and netbooks, unlike smartphones, usually disconnect from the Internet when used by passengers in a vehicle. So a web application needs a lot more than a couple hundred bytes to save the objects that the user has chosen to download for offline use. It can use the rest of the space to collect statistics on what the user does inside the offline application.

Genuine question - if people honestly don't care, then is it really a problem?

Is it that they don't care, or don't understand?

If people honestly don't understand the problem, then it's up to a government to protect the people, or up to the producer of a particular product to protect its customers (enforced by laws to protect the people).

Privacy is an abstract concept, which is difficult to understand for most people. Privacy for most people still means "to be able to close the curtains at night", and has nothing to do with the internet or any other digital technology.

I don't think this is the case. I think a lot of people do understand that the internet can be a danger to their privacy. It's not that they don't care about it, it's that they are taking a (reasonable) calculated risk. For the vast, vast majority of people the value of hassle-free surfing far outweigh the dangers. The lack of privacy on the internet has never seriously damaged them, and realistically never will (regardless of what a bunch of tinfoil hat-wearing libertarian nerds might say).

Genuine question - if people honestly don't care, then is it really a problem?

The problem is that users are given a tradeoff: either they enable cookies and let people track them, or disable cookies and break all the sites they use. Offered that decision, most people will rationally opt for the latter. The goal is to give them a third option: let sites work properly without privacy or security problems.

Web standards try to give apps as much power as possible without hurting privacy or security more than before, so you don't have to trade off here-and-now features to fend off abst

We're not talking about a civil rights issue, we're talking about an option you can turn on or off in your browser. It's not a problem for most people, so they don't turn it off. It's there to be turned off if you like. We're not even talking about getting rid of that option, we're just discussing sane defaults.

Can you give a decent explanation of how this relates to police brutality?

If users attempt to protect their privacy by clearing cookies without also clearing data stored in the local storage area, sites can defeat those attempts by using the two features as redundant backup for each other. User agents should present the interfaces for clearing these in a way that helps users to understand this possibility and enables them to delete data in all persistent storage features simultaneously.

The browsers should come out of the box with those settings. There is no good reason for 3rd party anything (cookies, flash, images) other than bad web development, injection of bad content or tracking for nefarious purposes. Same with HTML5. There is no reason that website x needs to be able to read the content of website y. It also doesn't need to access your browser settings or anything outside of the window where the website renders (that is buttons, history, other cookies, preferences or bookmarks).

The browsers should come out of the box with those settings. There is no good reason for 3rd party anything (cookies, flash, images) other than bad web development, injection of bad content or tracking for nefarious purposes.

This might have been tenable if it had been the policy since day one, but now there are billions of sites that expect third-party content to work. Browsers can't just disable that, or their users will say "All my websites don't work anymore!" and switch to a competitor, or refuse to upgrade.

Same with HTML5. There is no reason that website x needs to be able to read the content of website y. It also doesn't need to access your browser settings or anything outside of the window where the website renders (that is buttons, history, other cookies, preferences or bookmarks).

I'm glad you think so, because HTML5 doesn't allow any of those things, any more than any previous web technologies did. The exceptions are minor and carefully crafted: e.g., websites can communicate using postMessage(

The browsers should let users control their data and privacy settings. Let users disable the new features just like the users who are truly concerned shut off 3rd party cookies and JS.

I'm guessing they do. If you restrict cookies, I'm going to bet that most browsers will apply the exact same restrictions to other forms of client storage that they control. The same button to clear cookies clears localStorage and so on, you can check that.

So your saying that a more powerful internet will require more powerful internet security?!? Dear god, we cant have that, it would be too much like progress. Quick, everyone smash the magic box before it steals your soul through the webcam (to support terrorists)!!!

Article reads like it was written by someone who has no idea about the time and effort taken to sandbox sites from each other. Sounds like he's talking about LocalStorage or client side DBs, which can hold more data but are no more privacy risks than a single unique ID stored in a cookie linked to an unlimited REMOTE database. Accessing web history is not a part of HTML5, more FUD there, and browser vendors are working to block JS from being able to access that information. They also seem to refer to geolocation, which in Chrome at least has to be explicitly granted to sites unless you turn it on globally.

The "supercookie" thing is perhaps the one legitimate thing mentioned but browsers should (or probably will if they don't already) clear out most of those locations (except Flash, but you can't blame the browsers for that really) when you clear your private data, which at least Firefox and Chrome can do for you.

As for "buckets to put tracking information into" why bother relying on "buckets" on the client which may or may not exist, are limited in size, may change or be emptied at any time, etc, when you can buy as many "buckets" as you want server-side and store virtually unlimited data about them?

browsers should (or probably will if they don't already) clear out most of those locations (except Flash, but you can't blame the browsers for that really) when you clear your private data

This is the only part of your post that I disagree with - if a browser allows a plugin to write to a location on disk in any form, then the browser should be responsible for further access to that location, and the maintenance of that location, not the plugin. Saying its Flashes fault that these things don't get removed is simply excusing the browser from its responsibilities.

The browser doesn't allow the plugin to write to the disk, the OS does. Plugins are just libraries - they can do anything that any binary can do. If you are using nspluginwrapper on *NIX, you can make plugins run in a chroot and clean up after them, but file accesses do not go via the browser and 'modern' operating systems do not provide any facilities for running subprocesses that validate system calls via the parent.

Bullshit. Seriously, bullshit. The browser provides the interface through which the plugin can work - just because currently plugins have near free reign on most browsers does not mean that that is acceptable.

Javascript is blocked from writing to disk, and indeed doing a lot of things in certain circumstances (IE blocks a lot of JS when the page is opened locally and not through a remote server).

So again, to say its not the browsers fault is falsely excusing it from blame - the browser can certainly

Bullshit. Seriously, bullshit. The browser provides the interface through which the plugin can work

No it doesn't. It provides a set of interfaces that allow the plugin to interface with the browser, but as long as the plugin is native code it can issue system calls. If it can execute an interrupt instruction, it can do anything that any other application can do.

There are only two possible ways of preventing this. One is to require plugins to be compiled by the browser using a language that does not allow 'unsafe' operations. At a minimum, such a language would need to be garbage collected (otherwise dangling pointers could be used escape) and no pointer arithmetic. Good luck getting plugin writers to rewrite their entire codebase in such a language.

The other alternative is for the operating system to provide a mechanism for isolating the plugin. UNIX provides chroot(), but it requires root privileges, so you'd need a plugin launcher that was setuid root, which makes it very attractive target for exploits.

Javascript is blocked from writing to disk, and indeed doing a lot of things in certain circumstances

Entirely different. The limitations of JavaScript are inherent in the source language. There is no way for JavaScript code to issue interrupts or to make system calls. There is no way for it to call arbitrary C functions in the current address space. The browser's interpreter or compiler for JavaScript simply does not produce any code that can escape the sandbox (modulo bugs).

the browser can certainly lay down a strict set of rules by which the plugins can and cannot work, and that certainly includes local file access.

As you so eloquently put it; bullshit. The browser can make any rules that it wants, but it can't enforce them - that was my point. Unless it is intercepting any system calls that the plugin makes (most operating systems don't provide a convenient facility for doing this - you could do it via ptrace(), but the performance hit will be horrible), then it can't prevent a plugin from accessing the filesystem.

Microsoft got shat on for this a long time ago about ActiveX

Plugins are an entirely different issue. The problem with ActiveX was that it was downloading arbitrary untrusted code from the Internet and running it with normal app privileges. Plugins, in contrast, are supposed to be trusted code. Installing a plugin requires user action, just like installing an app. If you don't trust the plugin author, you can simply not run their plugin.

But unless the plug-in is run in a process with sufficient privileges, its system calls will fail with "Permission denied".

And then it nags the user and demands higher privileges, which he will of course grant, since he wants the plugin to work.

What's needed is a mechanism for "transient success": allow a creation of an execution context where all changes made within the context are not visible outside of it and will be discarded completely as soon as the context exits. In other words, make it easy for the

And then it nags the user and demands higher privileges, which he will of course grant, since he wants the plugin to work.

I'm not sure how this works on Linux, but it definitely doesn't work the way you describe on Windows (Vista/7). If your process does not have sufficient privileges for a system call, then you'll just get an error result without any popups nagging the user etc. All the UAC stuff is there because the application explicitly issues a special system call that pops up the elevation dialog. That's why old apps written in pre-Vista days never display those, but just fail silently (with a few hacks that come from th

I'm not sure how this works on Linux, but it definitely doesn't work the way you describe on Windows (Vista/7). If your process does not have sufficient privileges for a system call, then you'll just get an error result without any popups nagging the user etc. All the UAC stuff is there because the application explicitly issues a special system call that pops up the elevation dialog.

Yes, so once the plugin gets back "permission denied" or whatever when trying to store a cookie, it uses those calls to get t

UNIX provides chroot(), but it requires root privileges, so you'd need a plugin launcher that was setuid root, which makes it very attractive target for exploits.

UNIX also provides inetd, which runs setuid root to listen on well-known ports but drops privileges as soon as they're no longer needed. Likewise, a plugin container can cwd(), chroot(), and setuid() before processing any untrusted input.

'modern' operating systems do not provide any facilities for running subprocesses that validate system calls via the parent.

Uh, hello? Have you never heard of Apparmor and SELinux?

I have an Apparmor wrapper for Flash which prevents it from doing pretty much anything other than playing videos. It literally cannot write flash cookies to the local disk because the kernel only allows it to write to its own config directory.

And given the insane amount of denials I see for access attempts to random files (it even tries to write to a root-owned font directory), that's a really good idea.

... when you can buy as many "buckets" as you want server-side and store virtually unlimited data about them?

Because it costs money? My fear is considering what spammers may or may not do with this local storage. I'm not opposed to local storage but I think it needs more user notification when and what is accessing it. Not requiring user intervention but knowledge about who and what is storing that data. I would prefer a browser to let me know if some no name advertiser were storing data there than, say, Slashdot or New York Times doing something to better my reading experience. I welcome it. It needs to ha

say, Slashdot or New York Times doing something to better my reading experience.

You must be new here:-p

Seriously, we already have latency problems caused by multiple sites doing their crap on every page load (look at the source for any page that includes tracking and ad javascript includes). We don't need web sites sifting through 5 meg of local storage (which they'll grow to 100 meg, just like the original cookie limits specification quickly succumbed to hyperinflation) because they'll want to store i

The "supercookie" thing is perhaps the one legitimate thing mentioned but browsers should (or probably will if they don't already) clear out most of those locations (except Flash, but you can't blame the browsers for that really) when you clear your private data, which at least Firefox and Chrome can do for you.

Also they have nothing to do with HTML5, and can be implemented in flash-enabled browsers

As for "buckets to put tracking information into" why bother relying on "buckets" on the client which may or may not exist, are limited in size, may change or be emptied at any time, etc, when you can buy as many "buckets" as you want server-side and store virtually unlimited data about them?

Mainly, caching and offline access. When I access Gmail from my Android phone, I can read my e-mail offline, and browsing my inbox is instantly responsive even if my connection is very slow. This is because it just caches my whole inbox, any other tags I tell it to, and all mail from the last few days. If the Gmail website did that, which it can using localStorage, I wouldn't have to wait ten seconds sometimes to flip to the next e-mail while it retrieves it from the server.

Browsers should no longer be allowed to frisk about in the general operating system,scattering data willy nilly throughout your computer into wildly obscure folders.

I propose robust sandboxes.You want to delete all the tracking information? Delete the sandbox.Honest websites won't be spending their efforts to break out of the box andmalicious websites were going to pwn you anyways, so does it matter if they do?

I'm not proposing sandboxes as a security measure, merely a way to keep all the cruft from your browser & plugins locked down in one (easily deletable) place.

This neo-luddite fear-mongering must end!!! Properly secured browsers negate these "new" threats. The only "problem" as I see it, is the likely-hood that in browser manufacturers (Apple, Google, Microsoft, Firefox, Opera, etc.) rush to get these new capabilities, they'll put security on the back burner and we'll have a few years of this nonsense. This is no reason to not implement compelling features. It just raises the stakes for people to do it right. Having spent some time developing some HTML5, I f

HTML5 -- is it a new language? Is it a set of extensions to HTML, Javascript, or is it more of a concept/phenomenon, like "Web 2.0"?

I read it as an extension of the HTML standard, but quite often its treated as a "new language" as opposed to an extension, upgrade, etc. I wonder if that's half the problem -- I think generally speaking, people are a little weary of many new things, technology wise, and failure to cast this as more of an upgrade than a wholly new entity (even if the new features make it so) probably has a lot to do with some of the scaremongering associated with it.

There are two HTML5, if you will. First, there's the 5th revision to the Hyper-Text Mark-up Language, which includes extended mark-up for semantic organization of HTML documents, among other things. Second, there's HTML 5, the concept/phenomenon. The latter, just like the "Web 2.0" is a catch-all phrase encompassing various new features, technologies, and concepts used on modern web applications.

As much as I appreciate their intended purpose...they should really get a talking head that has a clue about technology. Their previous fear mongering topics have been rfid, cloud computing, social networking, etc. The one thing their "warnings" have in common is that all seem to have been put together by someone with a complete lack of understanding of how things already work.

HTML 5 will certainly allow for more flexibility for developers but will also allow browser vendors to provide better security sim

i don't have a problem with a website seeing everything i do on that website. i have a problem with a website seeing what i do on other websites

let foo.com have evercookies on my computer about everything i do... at foo.com. not a problem. but i don't ever want foo.com too see what i do at fubar.com, and visa versa

of course, foo.com can sell my info to fubar.com through different channels, but that's a problem that predates the internet, and has nothing to do with browser privacy. and i know if doubleclick has their ads on foo.com, they can infer certain things about my activities at foo.com... actually, now that i think about it, that's a fatal hole in any browser privacy: if a webpage is serving content from another website, such as with advertising networks, we're pretty much doomed no matter what the markup language, aren't we?

to really have browser privacy, you'd have to destroy the entire possibility of webpages serving content from other domains. how the heck do you enforce that? a rule like "when loading content from foo.com, everything on this page must come from foo.com"? is that a viable concept? no more google analytics, no more iframes... i don't know, we're just doomed

but... even if you had that rule, foo.com could just agree with double click to proxy their ads, running them through their servers, so everything is coming from one domain, even though it really isn't. then they can simply see how one particular ip address walks across the web where they have similar agreements with other sites. no escape. you'd have to spoof your ip with every request, which breaks all sorts of functionality on most websites. maybe you could have a new ip for every tab, every session... what a nightmare

basically, the concept of privacy on the internet is void. if you type it on the web, it is known, end of discussion. crap

actually, now that i think about it, that's a fatal hole in any browser privacy: if a webpage is serving content from another website, such as with advertising networks, we're pretty much doomed no matter what the markup language, aren't we?

Yep. If the sites you go to can store info about you, and they include ads, the ads can also store info about you, unless the site takes efforts to stop it (which the ad companies wouldn't allow).

The last thing corporate interest wants is a video format which is open and available to everyone. Expect the barrage of crap over HTML5 to continue. The article says nothing about the details of what's so "bad" about HTML5. The best they could come up with is:

"which large amounts of data can be collected and stored on the user's hard drive while online. Because of that process, advertisers and others could, experts say, see weeks or even months of personal data. That could include a user's location, time

So will IPv6, Semantic Web, Social Web, Facial Recognition, and any P2P protocols coming in future seriously invade our privacy. Neither did HTTP, IPv4, and SMTP cared about privacy.

Get over of the privacy FUD and face the reality: We the programmers who design the architecture of the Internet don't care about privacy. Tell me brilliant slashdotters, if you have the manpower and time, how would you redesign IPv6, Semantic Web, or any other protocols from the ground up to protect users' privacy, and wheth

The article is nonsense. Every privacy problem mentioned either doesn't exist or predates HTML5. Every browser has a security team that carefully reviews any new features for privacy breaches and reports problems back to the standards bodies before implementation. Everyone involved in web standards is well aware of all of these issues and tries to head them off at the pass. No website can read another website's data, none can store things without the user's permission, and nothing stops users from clear

Didn't the 90s (And early 2000s) teach us anything? If HTML isn't implemented in essentially the same way across all browsers the Internet will stagnant again and we will turn to cross-platform plugins like Flash to actually get stuff done.