Slashdot videos: Now with more Slashdot!

View

Discuss

Share

We've improved Slashdot's video section; now you can view our video interviews, product close-ups and site visits with all the usual Slashdot options to comment, share, etc. No more walled garden! It's a work in progress -- we hope you'll check it out (Learn more about the recent updates).

Hard sciences simply lend themselves a lot better to repeatability. Where I think we go wrong is assigning the same certainties to the claims of the soft scientists.

Granted that hard sciences are probably more reliable, but unfortunately, a lot of the research even there is shaky. I overheard roughly the following conversation between a graduate student in mathematics and his thesis adviser one summer, while I was doing undergraduate summer math research at the CUNY Graduate Center on an NSF grant (RTG):

Student: So I looked into the paper by Smith, and when I did the same computations, I got a different answer. I haven't been able to figure out what I'm doing differently. Do you think I should e-mail him?

Adviser: No. If the results are inconsistent, pretend they don't exist. Don't use them, but don't tell anyone you got different results either. If you do, then they'll just suspect that your results are wrong.

Student: Yeah, I suspect that too.

Adviser: But don't contact him, because people don't like being proven wrong. You can point out errors in people's papers once you've got tenure – it's not something you want to do as a grad student. You don't want to make this guy your enemy.

Student: Oh, okay . . .

Even if high-profile results are more reliable in the hard sciences, your average paper is still unreproducible garbage. The problem is the system, which forces everyone to publish as much as possible without heed to quality; and the journals, which publish only positive results. Researchers need to publish all their results publicly, including registering their hypotheses before they even begin the study. Universities need to take a stand by not focusing on quantity of publications. More emphasis must be placed on repeatability.

The people who treat this kind of finding as an attack on science are perpetuating the problem. We should be looking to make the scientific process ever better and more accurate as we come to understand its pitfalls better, not shrug off its inadequacies as inevitable.

As a business tool, it's all but useless. Google provides no mechanism for installing even standard Linux VPN software which most companies provide for their remote employees. Or any other software, for that matter. Also, no company with a brain in their head is going to allow employees to be storing internal data on another company's servers. This might be a little more useful if a company could customize it to use internal servers rather than Google's, but as far as I've been able to tell, that option just doesn't exist.

This was addressed in the December 7 launch event. Google has deals worked out with a number of businesses (they had a list, it was at least one or two dozen) that are extremely interested in replacing as many of their machines as possible with Chrome OS notebooks. In addition to Google Apps and internal company websites, it will support Citrix, which they had a demo of: you can log into the company's servers and run Office or any of dozens of other major corporate applications remotely. (I can't find a link to the video, sadly; I watched it live and the URL stopped working after it was over.)

The major advantage is zero maintenance burden. By design, Chrome OS notebooks are totally interchangeable. You can chuck it out the window, grab another one, enter in your Google login info, and you won't be able to tell the difference. It also will never get viruses; it updates itself without user intervention; and the lack of customizability means it's unlikely users will be able to mess it up to the point that IT has to get involved. There go all your OS-level customer support costs.

Chrome (thus presumably Chrome OS too) is also deploying features that will allow corporate admins to lock down the browser, the way IE has supported for years. The employer can require that certain extensions be installed, disable the password manager, and much more, with more features to be added as time goes on. And you can't get around it by installing another browser, because you can't install anything.

Although I should add that, unfortunately, most software does use only one or two iterations of a hash function with no special tricks, so you could still brute-force even mediocre passwords on cheap hardware. But still not "superb" ones.

With an offline hash attack, you have total control over the hashes, and the only limiting factor in how fast you can attack them is your computer(and hash attacks generally parallelize really well). Here, the difference between a terrible password and a merely mediocre one will likely be less than the refresh rate of the attacker's monitor, and the difference between an OK password and a superb one will still be fairly small. Only a password so good that it is basically a nonstandardized type of private key will be of any use.

Actually, that's not true at all. If by "superb" you mean a 10-character password that uses ASCII mixed case, numbers, and punctuation, there are 95^10 ~= 6x10^19 passwords. Even if you use a fast-to-execute hash on a couple thousand dollars worth of hardware, you're only going to get in the ballpark of a billion hashes per second, which means on the order of 10^10 seconds -- i.e., centuries.

Moreover, if you choose the right hash function, brute-force becomes impractical for any but the very weakest of passwords. For instance, you can take the PBKDF approach and just iterate your hash function 10,000 times or whatnot, so that it takes (say) 100 ms to evaluate on consumer hardware instead of less than a microsecond. That knocks off two or more characters from what you can practically brute-force, without noticeably affecting user experience or server load (100 ms extra CPU time per login/registration).

If it takes 100 ms per hash, then with 16 cores you can only do 160 hashes per second. Even for passwords using only lowercase ASCII letters, it will take a day on that hardware to crack a five-character password. Want to price out even a day of computing time on 16 cores on EC2? It's called "not worth it" for most hackers. If you throw in mixed case, it's a month. If you throw in punctuation too, then even a four-character password is several days.

(GPUs can do more than an order of magnitude better here for the same price, to be fair. But if you use properly-designed hash strengthening, you can mess up GPUs too. As it happens, I'm currently working on just such a hash strengthening design for a final project in university. Looks promising so far, but I haven't gotten the current iteration of my OpenCL program to actually work yet, thanks to the vagaries of NVIDIA's OpenCL compiler. For plain SHA1, I was able to get 220 million hashes/second on an NVIDIA GTX 285. With a PBKDF scheme, it was about 10,000. I'm targeting <50 hashes/second with my design.)

Needless to say, all the above applies if you have only one password to crack. If you have a whole database of hashes, and they're salted, you need to repeat all this for every user whose password you want to crack.

I will concede that sqrt(x) means the principal square root if you will agree to always put the +/- in front of it.

I will not. That is not the conventional definition. sqrt(x) is conventionally defined as an injective, strictly increasing function from the nonnegative reals to the nonnegative reals. You can find this definition in any book on basic algebra, or in the second paragraph of the Wikipedia article, or any other source you like. If x is a positive real, then its positive square root is denoted sqrt(x), the negative root is -sqrt(x), and if you want to refer to both at once (like in the quadratic formula), you use (+/-)sqrt(x).

FTFY. There are other ways of constructing real numbers from rational numbers. You can even avoid constructing real numbers entirely, and just use an axiomatic approach (this is what is done in Aposotol's calculus text).

Although in that case you aren't proving that the reals exist and are unique in any particular set theory like ZFC, which means you can't justify set operations rigorously. If you want to be able to formally prove anything about sets of real numbers, you need to prove that such sets exist in some particular set theory. Axioms for the real numbers alone won't tell you that unions/intersections/etc. of sets of real numbers exist, let alone give you the axiom of choice (which is quite critical in analysis, at least in countable form).

As it turns out, 0.999... = 1 is true regardless of how you approach the real numbers.

As long as your definition winds up being equivalent to the conventional one. If you use a nonstandard definition, like the intuitive definition "all decimal numbers, with operations defined as you were taught in grade school", then you might wind up with something that's not even a group, let alone a complete ordered field. It's not immediately obvious to a layman why the standard definition is the most sensible one.

But even if you could not find a number between them, does it mean they have to be equal? Does your real number system (either by an axiom, or as a result of construction) require that between any two real numbers, there must be another real number?

Yes, this is a consequence of the axioms for an ordered field. One of those axioms is that for any x, y, and z, if x < y, then x + z < y + z. Thus if x < y, then x + x < x + y < y + y, applying the axiom with z = x and then z = y. Another axiom of an ordered field is that if x < y and z > 0, then xz < yz. Since 1/2 > 0, we therefore have (x + x)/2 < (x + y)/2 < (y + y)/2. But of course x = (x + x)/2 and y = (y + y)/2, so x < (x + y)/2 < y, and we've constructed a number lying strictly between them.

Of course, this doesn't help you if someone makes up a system that they want to call the real numbers but that isn't an ordered field. Like, say, "the set of all decimal expansions with finitely many digits before the decimal place and infinitely many after, with operations defined as you'd expect". In this set, 0.999... != 1, but there's no number in between. On the other hand, it's not even an additive group, since 1 - 0.999... doesn't exist.

You're confusing the computer world with the mathematical world. In math, the square root of one has two solutions, +1 and -1. Square root is a short form of "Solve for x where x^2 -1 = 0." Because of the ^2, there are always two solutions, although one may be an imaginary number.

In math as well as computing, sqrt(x) is taken to be positive whenever x is a positive real number. It's known as the "principal square root". However, this convention breaks down for negative or complex numbers, and so when you're dealing with anything other than positive reals you have to be very careful when taking roots. There are many possible square root functions, and you have to specify one. Also, identities like sqrt(xy) = sqrt(x) sqrt(y) and sqrt(x^2) = x only work for positive reals. The error in the computation was assuming that sqrt(x^2) = x when x is not a positive real number (in this case x was -1).

Surely the problem is that you're assuming sqrt(1) = 1 when actually it is +- 1? You're throwing away the sign change in that step.

No, sqrt(1) is 1, and only 1. It's the "principle square root" of 1.

Had you said +-sqrt(1)=+-1, you'd be correct.

However, there is no single sensible principal square root function that works for general complex numbers. In particular, if f(z)^2 = z for all complex numbers z, f is not continuous. So as soon as you're taking the square root of anything other than nonnegative reals, you can no longer treat sqrt as a well-defined function without further specification (e.g., specifying a branch cut).

The article is nonsense. Every privacy problem mentioned either doesn't exist or predates HTML5. Every browser has a security team that carefully reviews any new features for privacy breaches and reports problems back to the standards bodies before implementation. Everyone involved in web standards is well aware of all of these issues and tries to head them off at the pass. No website can read another website's data, none can store things without the user's permission, and nothing stops users from clearing all private data at any time.

Let's look at this systematically. First of all:

The new Web language and its additional features present more tracking opportunities because the technology uses a process in which large amounts of data can be collected and stored on the user’s hard drive while online. Because of that process, advertisers and others could, experts say, see weeks or even months of personal data. That could include a user’s location, time zone, photographs, text from blogs, shopping cart contents, e-mails and a history of the Web pages visited.

Web Storage, Web SQL Database, and IndexedDB are three of the standards commonly lumped in with HTML5, and all of them do indeed allow larger amounts of data to be stored client-side than ever before. What the article doesn't mention is it's only available to the site that stored it, and users can clear it as easily as cookies. It poses absolutely no privacy threat beyond cookies: if a server wants to store data on your computer, it can already just store it on the server and store a short identifying key as the cookie.

What the unnamed "experts" here say is therefore crazy. Nothing in HTML allows advertisers to see your location or time zone without your consent, let alone shopping cart contents or e-mail. Since the article doesn't deign to specify what HTML5 technologies are supposed to be able to do this magic, I can't refute it beyond saying it's just nonsense.

The new Web language “gives trackers one more bucket to put tracking information into,” said Hakon Wium Lie, the chief technology officer at Opera, a browser company.

Hâkon knows what he's talking about – he's a notable figure in the web standards community, editing such high-profile standards as CSS 2.1. But look at what he says carefully: trackers get "one more bucket". One more just like all the others, which can be controlled and cleared along with all the others, thus no greater privacy risk. I'd bet good money that this quote of his is taken completely out of context, and that he was dismissing the reporter's fearmongering.

Then there's mention of evercookie. But nothing that evercookie does relies on any HTML5 feature. Yes, it stores things in four different types of HTML5 storage, but again, those are cleared just like cookies. Try it yourself: create an evercookie on that page, clear your cookies from your browser's menus, and then click to rediscover cookies. You'll see that the four HTML5 methods (localData, globalData, sessionData, dbData) are all cleared too.

(There is one other mention of HTML5 on evercookie's page, but it's red herring. The pngData mechanism uses HTML5 canvas, but if you look at how it works, it would work just as easily by storing a JavaScript file or even a plain text file, and retrieving it via <script> or XMLHttpRequest.)

It's worth emphasizing, by the way, that using your browser's "private browsing mode" (whatever it's called) will completely defeat evercookie. So this is not some earth-shattering problem that no one's thought of.

The article goes on:

Each browser has different privacy settings, but not all of them have obvious settings for removing data created by the new Web language. Even the most proficient software engineers and developers acknowledge that deleting that data is tricky and may require multiple steps.

Again, this is patent nonsense. All browsers clear the new data sources whenever you clear cookies. The Web Storage spec explicitly advises this: "User agents should present the interfaces for clearing these in a way that helps users to understand this possibility and enables them to delete data in all persistent storage features simultaneously." But if you don't believe it, just try the evercookie test I suggested above and see for yourself.

The browsers should come out of the box with those settings. There is no good reason for 3rd party anything (cookies, flash, images) other than bad web development, injection of bad content or tracking for nefarious purposes.

This might have been tenable if it had been the policy since day one, but now there are billions of sites that expect third-party content to work. Browsers can't just disable that, or their users will say "All my websites don't work anymore!" and switch to a competitor, or refuse to upgrade.

Same with HTML5. There is no reason that website x needs to be able to read the content of website y. It also doesn't need to access your browser settings or anything outside of the window where the website renders (that is buttons, history, other cookies, preferences or bookmarks).

I'm glad you think so, because HTML5 doesn't allow any of those things, any more than any previous web technologies did. The exceptions are minor and carefully crafted: e.g., websites can communicate using postMessage(), but only if both of them cooperate, so there's no security or privacy breach.

The browsers should let users control their data and privacy settings. Let users disable the new features just like the users who are truly concerned shut off 3rd party cookies and JS.

I'm guessing they do. If you restrict cookies, I'm going to bet that most browsers will apply the exact same restrictions to other forms of client storage that they control. The same button to clear cookies clears localStorage and so on, you can check that.

actually, now that i think about it, that's a fatal hole in any browser privacy: if a webpage is serving content from another website, such as with advertising networks, we're pretty much doomed no matter what the markup language, aren't we?

Yep. If the sites you go to can store info about you, and they include ads, the ads can also store info about you, unless the site takes efforts to stop it (which the ad companies wouldn't allow).