Posted
by
CmdrTaco
on Tuesday November 16, 2010 @12:28PM
from the sekrit-p4ssw0rd dept.

suraj.sun writes "As of Nov. 15, 2010, Amazon EC2 is providing what they call 'Cluster GPU Instances': An instance in the Amazon cloud that provides you with the power of two NVIDIA Tesla 'Fermi' M2050 GPUs... Using the CUDA-Multiforce, I was able to crack all hashes from this file with a password length from 1-6 in only 49 Minutes (1 hour costs $2.10 by the way.). This is just another demonstration of the weakness of SHA1 — you really don't want to use it anymore."

While this article really has nothing to do with the security of SHA-1, SHA-1 does have weaknesses that should make anybody think twice before using it.

And I really hate it when people say "Oh, well, it isn't good for this, but how about this?! I mean, we can't toss out a perfectly good algorithm!". What possesses people to hang onto algorithms that are broken for which there are essentially drop in replacements for that aren't.

Hash algorithms are really tricky to use correctly, and know when you can and

So just get over it already and drop the bad algorithm. How hard can it be?

0) What algorithms do you propose as replacements?1) How hard can it be? Maybe you can "walk the talk" by deleting/disabling all the CA certs in your browser that use bad algorithms- e.g. algorithms that you did not propose in 0). Same goes for not using browsers, ssh servers and clients that do not support algorithms in 0).

That is tricky. Currently SHA2-256,384,512 are the only viable replacements, but I would say that's only provisional and should wait for the results of the NIST hash algorithm competition.

1) How hard can it be? Maybe you can "walk the talk" by deleting/disabling all the CA certs in your browser that use bad algorithms- e.g. algorithms that you did not propose in 0). Same goes for not using browsers, ssh servers and clients that do not support algorithms in 0).

Don't be surprised if you find that some CAs are still using MD2!

This is a completely different issue than say "Oh, it's still perfectly good for some things, you don't have to stop using it!". If someone else is using it that you need to interoperate with you don't have any choices besides not interoperating, or implementing the algorithm.

Salt has absolutely nothing to do with collisions if you have the target hash you're trying to collide with. Finding collisions means they don't need to know the original input, it means they found some other input that creates the same hash. Salting only helps dictionary attacks against the password that created the hash.

Not true... with salted encrypted passwords you're trying to find a password that the application will think is the correct one. It concatenates the salt with the password and checks whether the hashes match (simplified explanation, but that is what many implementations such as crypt do). That means you're trying to find a collision where the salt is at the start of the input that causes the collision. That's a small subset of the inputs that generate the same hash so it does make finding collisions harder.

I think you misunderstand what AndrewNeo was saying. When you have the hash itself, you can then try to find some input that also produces that hash (a collision). You don't have to know anything about the original password or the salt.

As far as I can tell, salting only helps against rainbow table attacks. OP wasn't using those, he was computing the hashes (and thus finding collisions) using only the EC2 GPU instance. He was generating the tables themselves. Salt won't help you in that case. It just

Salt won't help you in that case. It just requires more compute power which has now become available thanks to the EC2 GPU instances that Amazon is offering.

Not using a salt means you can check a complete file of passwords in the same run, since you just hash the password and check whether it matches any entry in the file. Using a salt means you can only check one password at a time, since the password will hash to a completely different value when used with a different salt (absent blind luck finding a collision).

If you want to crack 50,000 passwords, then you'll have to buy 50,000 times more power to do that if they're salted than if they're not.

Correct me if I'm wrong but, Yes, what you are saying is true for hashes without salt or systems that allow you to provide an already hashed password (why would you do such a thing?), but for these you do not need the collision the hash itself will do.

In a system that correctly applies the salt, your new input will not generate the same hash.i.e.,User sets Password, Password is hashed with the salt (e.g., passwordHash = hash(salt+password) )You discover the resultant hash,You find a collision that produces

As far as I can tell, salting only helps against rainbow table attacks.

Adding a salt also helps to increase the original keyspace, which increases the time to brute force.

As an example, in TFA the source input of the hashes were all 6 bytes. If the hash had been created by adding a 2-byte salt, that means the original user-generated password was only 4 bytes...definitely not very secure.

Note that TFA did not exhaustively compute all SHA1 hashes in 49 minutes, but it appears to have exhaustively searched the 6-byte keyspace in 49 minutes. If you use 8-byte passwords with a 4-

While this article really has nothing to do with the security of SHA-1, SHA-1 does have weaknesses that should make anybody think twice before using it.

And I really hate it when people say "Oh, well, it isn't good for this, but how about this?! I mean, we can't toss out a perfectly good algorithm!". What possesses people to hang onto algorithms that are broken for which there are essentially drop in replacements for that aren't.

Hash algorithms are really tricky to use correctly, and know when you can and can't use them when they have a specific weakness is not a trivial determination to make. And replacing the stupid thing is pretty simple. So just get over it already and drop the bad algorithm. How hard can it be?

Not to get too specific, but I use SHA-1 to generate a globally sorta-unique ID for a datapoint in multiple locations using multiple implementations by basically concatenating the relevant parts of the datapoint together and then figuring the SHA-1 hash. How is SHA-1 "broken" for this application, other than being faster and available on more systems and languages that my data importers are run on than other hash functions? I really couldn't care less if it can be reversed, but I am very interested in how

I agree the story could have been framed better. There is in any case some story here. For certain computational tasks, the linear performance scaling that vanished in a puff of Prescott has returned from the grave.

And not only that, instead of spending $20,000 to buy a Fermi class workstation and getting your result in a year, you can throw the same $20,000 at the cloud and have 10,000 machines deliver your result in an hour, for large instances of cloud.

This applies to a class of computational tasks denominated in CPU cycles where you can cut a wide swath.

Because it has become easy to create 2 plaintexts that both hash out to the same SHA-1 value [wikipedia.org]. See the section titled "SHA-1" which talks about attacks on the hash function.

This means that SHA-1 and MD5 are not suitable for "signing" usage where you have a plaintext where you want to prove that the original has not been changed. It's too easy for an attacker to alter the plaintext in a easily hidden manner so that the hash stays the same.

Is it still useful for the storage of passwords? Yes, but the writing has been on the wall for SHA-1 and MD5 for close to a decade now. When one weakness is discovered in an algorithm, it's the safe bet to assume that future weaknesses will be discovered and those make make the hash algorithm unsuitable for storing passwords. Better to move to one of the newer, more complex, algorithms while you have time to plan over the course of a few years rather then have to switch suddenly in the space of a month or three after an attack is discovered.

Because it has become easy to create 2 plaintexts that both hash out to the same SHA-1 value.

This means that SHA-1 and MD5 are not suitable for "signing" usage where you have a plaintext where you want to prove that the original has not been changed. It's too easy for an attacker to alter the plaintext in a easily hidden manner so that the hash stays the same.

Your use of the word "easy" is a little peculiar in that you don't get to chose the matching plaintext, except in the most limited fashion.

Most of the time you're signing a hash of a document instead of signing the document itself.Example "My hundred page mortgage packet", hashed down, sign the hash to prove its mine or basically stick a virtual notary stamp on it.

It is a math breakthru to be able to generate "a file" that hashes to the same, signed value faster than random guess -n- check. The problem is

It's impossible for a hash algorithm not to have collisions. You're mapping an arbitrarily large problem space down into just a handful of bits. There are infinitely more possible inputs to the algorithm than there are outputs. That said, it's supposed to be computationally prohibitive to find those collisions, and that's where MD5 and SHA1 are failing.

This isn't finding collisions, it's a dictionary attack to find the original inputs.

A collision is where you find two different inputs, A and B, such that hash(A) = hash(B). A collision attack is where you are able to control both A and B, and you manage to compute an A and B such that hash(A) = hash(B). A collision attack is now possible in MD5, but, as far as I know, not SHA1. A preimage attack is where you have a fixed A or a fixed hash(A) and you try to compute a B such that hash(A) = hash(B). That is, the difference is that you can't modify A. There is no known preimage attack for MD5 or SHA1 that is more efficient than brute force. The effectiveness of a brute-force attack is mitigated by having a larger hash output size, as that dramatically reduces the probability of finding a collision. So, moving from SHA1 to SHA2 would decrease the effectiveness of a brute-force attack. However, it's still computationally unreasonably to perform a preimage attack on MD5, much less SHA1.

However, this is talking about a dictionary attack to find the original input. That's where you have hash(A) and you try various possibilities A' and compute hash(A) until you find an A' where hash(A') = hash(A). This looks pretty similar to a preimage attack, but in a preimage attack, you don't care about the nature of A. You just want to find some B, any B, that hashes to the same value. Brute-force preimage attacks take far, far too long. In a dictionary attack, you're trying to use your knowledge of the likely properties of A to recreate likely values for A and compute their hashes. The properties of the hash function are largely irrelevant for this attack. It can be any function, they all work equally will. The important thing is the properties of A. If A is no more than 6 alphanumeric characters, that's a very small space to search through.

So, to summarize. In a brute-force collision attack, the properties of the hash function matter. In a dictionary attack, the properties of the possible inputs (passwords) matter.

Imagine they used only MD5 for hashing. If you tried to perform a collision attack, you'd need to compute on the order of 2^128 MD5 hashes. If you tried to perform a dictionary attack on passwords of 1-6 alphanumeric characters, you'd need to compute on the order of 72^6 ~= 2^37 MD5 hashes.

You need passwords of at least 20 alphanumeric characters (high-entropy ones, at that) before the strength of MD5 is a security weakness. You need 26-character passwords for SHA1 to be weaker than your password.

By definition any hash function has collisions if the passwords you are storing have more bits than the hash does (more possible passwords exist than possible hash values). The problem is when it collides in fewer bits.

By definition any hash function has collisions if the passwords you are storing have more bits than the hash does (more possible passwords exist than possible hash values). The problem is when it collides in fewer bits.

Collisions aren't a problem per se, the problem is if they're predictable. A cryptographic hash that never produces collisions is like a random number generator that never produces the same output twice in a cycle. I.e., not very random and if not random it's predictable. This is different from other types of hash functions, like those used for data structures, where you ideally want to random shuffle the input space into the output space.

Collisions are a problem if they occur frequently, or early in a brute-force. If any relatively short password can collide with another password, collisions are a problem in your cryptographic hash. In a good hash function, no password of <n bits should collide with any other password of <n bits. The higher n is, the better the hash function is.

Let me explain gently. If a hash function produces and n bit digest (output) for any given input then any input that is greater than n bits in length MUST produce a digest that collides with an input of n bits or less even though the inputs are dissimilar.

Example: For each letter of this sentence choose either a 0 or 1. You are a 1 bit hashing function. How many collisions did you create after only 3 inputs?

Yes that seems to be the case here. If he used SHA-256 it would still break like that; but with 7 character passwords he'd be doing 4-5 bits more just for lower case letters, 5-6 for lower/upper and numbers, almost 7 bits for upper/lower/number/hash. At 4 bits that's 16 hours.. just adding one lower case letter. With complex passwords with 8 characters, 16384 hours or about 2 years. The average case is half that of course. Good luck spending a year to break 8 characters.

Are you kidding? Everyone that isn't a 'computer person' is still using their daughter's name or the favorite type of sports car brand, one word all lower case passwords for all sites and always will. The best security advancements don't come from new theoretical math theory, they come from making security easy and convenient for average people.

And how I do! Just two days ago I told a friend about xkcd and browsed to exactly that "exploits of a mum" strip later that day because of it. It always makes me laughing so hard, that's the kind of girl to marry!

Good thing then that the goal for most people is not the absolute most secure setup possible. It's about finding a good balance between usability and security. If you're *only* worried about security then unplug it, fill it with cement and bury it in your back yard. (And don't tell anyone that you did it.)

Indeed. Pretty much everybody that cares about password security is stuck using a password manager anyways. So you may as well use a 20 char password when allowed to. I mean that would only take what like a millennium to break at that rate?

Indeed. Pretty much everybody that cares about password security is stuck using a password manager anyways. So you may as well use a 20 char password when allowed to. I mean that would only take what like a millennium to break at that rate?

Well the point is SHA-1 outputs a 160 bit hash, so superficially you can find "A" collision in about ( (2 to the 160) divided by 2 ) guess and checks. But it's been broken so you can find a collision in only about 2 to the 50-something ops.

So guessing 20 characters times about 7 bits/char (unless you're going all UTF-8 on us) is 140 bits in for a hash that could be "worth" 160 bits, but now is only worth 140 bits. What you don't know is SHA-1 is only "worth" about 50 bits, lets round up to about 7 ASCII c

Instead of someone breaking your 20 character password, all they have to do is find a password that hashes to the same as your SHA1 hash. Because of weaknesses in the SHA1 algorithm, any password contains only approximately 8 bytes (8 characters) of data. Put another way, until we improve off SHA1 it is not particularly useful to have a password over 8 characters because it's cheaper to crack the hash than the password anyway.

AFAIK, the NT password hash is up to 255 UTF-16 characters (two bytes per character) hashed using MD4, which is even weaker than SHA1 or MD5. Not that you necessarily need to crack the hash, because many Windows networking protocols lets you pass it directly.

Maybe he wanted a proof of concept without having to spend lots of money doing it? So he can crack a bunch of 6 character passwords in an hour or so, extrapolating up, and estimating a 100 fold increase in the search space for each extra character, you might end up spending several hundred years cracking a 10 character password. Now, what's handy is that you're just renting the equipment, I don't know how many GPU setups that Amazon has available, but it doesn't seem unlikely that you could rent several hundred, possibly even several thousand, of them at a time, cutting the time to crack a significant password down to under a year, which still seems pretty secure, especially given the cost of renting that many platforms.

But what happens in 5-10 years, after the performance per price ratio has doubled a few more times? Now you're down to maybe a single month for a wealthy individual to be able to crack a significant, real-world password. Give it another few generations of hardware and you're not even talking about a wealthy individual any more. Good luck convincing the average Joe that he needs to start remembering 15+ character passwords, especially if you're going to enforce truly random ones that aren't susceptible to more direct attacks.

But what happens in 5-10 years, after the performance per price ratio has doubled a few more times? Now you're down to maybe a single month for a wealthy individual to be able to crack a significant, real-world password.

Since an exhaustive keyspace search increases the time by 256 for each extra character (and the TFA shows about 100x increase, probably because he didn't use every character), then even if performance doubled every year (unlikely), you'd still be at about a 10 CPU-days to crack an 8-character password, and over 1000 CPU-days to crack a 10-character password. So, yeah, if computers increase in speed by more than double their current increase, and Amazon (or some other company) has enough of them to rent to

I don't see how password length makes any difference here. Most applications naïvely store hash_function(password) in the database. If you manage to find a 4-char string whose hash is the same as the one stored in the database, it doesn't matter if the original password has 300 characters. The best course of action for any application is to store hash_function(password + secret_salt) in the database.

The kind of hashes used in password databases are a lot longer than 4 characters (~24 bits) though. The total number of hashes produced by 4-character passwords is so much less than the total number of possible hashes that the chances of a hash collision like you describe are negligible.

Since the salt and the hash are typically stored in the same place, someone running this kind of attack most likely knows the salt too. If it's possible to try all 6-character passwords in less than an hour, that will still

That attack breaks down -- badly -- when random salts are used. You have to compute the hash for each salt+plaintext combination that you want to attack. If a random 32-bit salt is used, you need about 77,000 passwords before you would expect to find two that use the same salt (see http://en.wikipedia.org/wiki/Birthday_attack [wikipedia.org] for the statistics).

This just shows one more time that SHA1 is deprecated — You really don't want to use it anymore.

No it doesn't show anything. Your "attack" would only have been marginally slower with SHA-2, because SHA-2 is a bit slower of SHA-1. You didn't exploit any weakness of SHA-1 in this brute-force attack.

No, it doesn't. For any other hashing algorithm of similar speed, the same results could be obtained. It's not a weakness of the algorithm, it's a weakness of only checking for passwords of 6 characters and less. That's not a very big space.

If your password is "password", no hash is going to save you from that. The cracker takes "password", feeds it to the hash, then compares the result to every line in the hashed password file, to check if it matches anybody's.

Hashing itself has to be fast, since not only passwords get hashed. Sometimes you need to hash a DVD.iso, would you want that to take a week?

Now, you can do things like making the encoding be hash(hash(hash...(password))) with such a depth that it takes a second for a single check. You can't make it much longer than that because then the users will get tired of waiting. But even then it won't save you if you're dumb enough to have "password" or your username for the password. If the attacker has 10000 accounts, it takes about 3 hours worst case (with salting) to check if any of them use "password". And with that many, chances are pretty good that at least one is. So it's still not a license to use a crappy password. That's if they're not determined enough to get a botnet to work on it.

If the cracker HAS the hashed password file then your security has already been breached. There's about three things on any system that need access to that file: the login system, the password change system and possibly the admin.

The general idea of hashing passwords is that even if an adversary gains access to the password file it can still be secure. This accounts for things that you might not have foreseen, such as intercepted communications. They don't have to access it in storage necessarily. Now you can always say that your users should be using encrypted connections to your server, but the point of the hash is to have a second line of defense, since a password is a very unique type of information where the data doesn't need t

I think "able to brute-force thousands of passwords in an hour" qualifies as a weakness in SHA-1.

Then you must also think that this weakness applies to all hashing algorithms, and thus is not a weakness in SHA-1 but in hashing algorithms in general.

"able to brute force thousands of passwords in an hour" means nothing. The ability to brute force something given less computational steps than intended means something. Throw enough CPU at any algorithm and you'll see the same brute force time-frame results from any hashing algorithm.

I think "able to brute-force thousands of passwords in an hour" qualifies as a weakness in SHA-1.

Maybe that would be seen as a weakness. But what this guy demonstrated was not "thousands of password in an hour", it was "14 passwords with a maximum length of 6 in 49 minutes". Can you scale up, and crack thousands in an hour? Sure. But at the rate he did it, one password every 3.5 minutes, to get 1000 passwords in 60 minutes you would need a nearly 60-fold increase in power. He paid $2.10 for his 49 minutes, which means you're going to need to pay $126 per hour if you want the power to crack that ma

No it doesn't show anything. Your "attack" would only have been marginally slower with SHA-2, because SHA-2 is a bit slower of SHA-1. You didn't exploit any weakness of SHA-1 in this brute-force attack.

He exploited the "is fast to calculate" weakness.

Clearly, we need hash functions which take long amounts of time to compute.

Clearly, we need hash functions which take long amounts of time to compute.

You're being facetious, but this is basically what the apr1 algorithm used in the Apache webserver does. It's a modified variant of MD5, where the hashing step is repeated 1000 times in order to slow down the creation of dictionary hashes:

/*
* And now, just to make sure things don't run too fast..
* On a 60 Mhz Pentium this takes 34 msec, so you would
* need 30 seconds to build a 1000 entry dictionary...
*/for (i = 0; i < 1000; i++) {
apr_md5_init(&ctx1);....

I don't know whose bright idea that was... the comment about the speed of this routine on a 60 MHz CPU speaks for itself. But regardless of how effective such "improvements" are, we're now stuck with this algorithm if we want to support the password hashes used in conjunction with.htaccess files, for example.

If only there was some sort of Password-Based Key Derivation Function that could use salted, fast-to-compute hashes and apply them many times to increase the cost of computation so that dictionary attacks are proportionally more difficult.

You laugh, but PBKDF2 [wikipedia.org] is the most common way to derive an encryption key from a password and it calls for looping a hash function 1000+ times, for the specific reason of making brute-force attacks take longer to do.

Agreed - the only thing brute force shows is the importance of good passwords.

Aren't most hashing algorithms linear in time based on the input? In that case, all such algorithms would only vary by a constant factor, not really a difference in terms of security worthiness anyway.

Correct me if I'm wrong, but isn't the main point of a hashing algorithm to make it unlikely that two different messages would have the same hash (in particular to make it difficult to coerce this effect and have the second message w

That's exactly what I thought. SHA-1 has been demonstrated to have weaknesses, not trivially exploitable ones right now, but weaknesses all the same. But what this person is doing doesn't exploit any of them. They don't get to blame the ease with which they cracked passwords on SHA-1.

Though, as I understand it, there are algorithms that involve multiple rounds of hashing with a bit of salt added each time. Those would be good because there is no clear way to compute them faster and you can have a few hu

This just shows one more time that SHA1 is deprecated — You really don't want to use it anymore

Or you could, you know, use a salt (like any competent password system). And require eight-character passwords (like any competent password system). That will stave off obsolescence for maybe another decade.

that increased the amount of time required to brute-force the password significantly. Also, the use of a database of hashes is largely worthless since each password in the list would have a completely unique hash. for the sake of brute-forcing the data, short passwords don't matter (on the other hand, brute-forcing login to the application is not affected). Having a different salt for each password makes the time spent on each other password completely worthless once the cracker gets to the next item in the list.

to improve that, we can say... hash the result 1000 times in a row. For someone trying to brute force the hash, they would spend 1000x the CPU resources creating the hash. It's mostly not a big deal to run that hash 1000 times when creating the information for the database or authenticating the user.

of course, SHA1 and MD5 are still broken when it comes to file integrity checking (when it comes to tampering) since there are documented collisions. For this case, cryptographic signatures are where it's at. You can guarantee that not only was the file not tampered with, but also that the person who supplied the signature was who they say they were. Gotta love public key encryption.

While I concurred with your point somewhere else in this discussion (regarding the usage of salt), I wonder if there is any possibility that an attacker, having a sufficiently large corpus of your stored hashes, would be able to extrapolate what salt your application is using.

All of your suggestions only increase the brute-force cracking time by a linear factor.

They are useless. Adding another character or two to the minimum password length, requiring more distinct character or requiring more character classes will all have a significantly higher affect on brute force attacks, with much less effort, and less CPU time for legitimate password entries.

Salts protect against rainbow tables, not necessarily short passwords. In most situations, the salt needs to be known by both parties and is sent in the transmission so that the salt is not a secret. Don't count on the salt being a secret. You still need to choose a good password. Using a salt just means an attacker won't be able to look up the hash in a rainbow table.

Obviously this service will be used by pirates (and not the "arrgh matey" kind), hackers and terrorists and anyone else that gets labelled as a bad person (tm), so we better pre-emptively ban Amazon as they are the ones offering it up.

Oops, typo. The number '72' came from A-Z, a-z, 0-9, and the punctuation above 0-9. If you count the other punctuation on a standard keyboard the number goes up to 94, and depending on the app you might be able to use things like é and ñ which would really raise the character count.

Not to mention an exhaustive rainbow table search would've taken about 5 minutes on an average desktop, and as a bonus you'd likely get all passwords up to 8 chars (depending on your particular table).

As part of my graduate studies, in Computer Science at Texas A&M University, I built out a LAM/MPI - CUDA cluster. With this configuration we had access to all the CPU/GPU on all the systems in the lab. Although it requires knowledge of both API it can be extremely powerful. I'd love to see a cloud based system based upon this configuration. Now that would be worth paying by the hour to use!!!

896 CUDA Cores (2 x NVIDIA Tesla C2050 (Fermi) cGPU) is nice but imagine the power of a data center filled with these!!!

Sure, CPU's include a FPU these days, but in the early days between the 8086/8 you had the 8087 FPU, 286's had the 287, 386's the 387, and even 486SX's could have a 487 added (DX's had it built in). The Pentium class CPU's were the first to have all models include a FPU. Since then, all CPU's have included one.

But now, for more intensive items, we have "physics" cards, GPU cards (which at first glance appear to be FPU's?) etc. So, is the FPU as an addon on its way back? Perhaps.

Placing the dollar sign after the value isn't necessarily a mistake; it's valid usage in some parts of the world. This might also indicate that the author's first language isn't English, which might excuse some of the other mistakes.

All those were cause by Slashdot? Wow! I am impressed at the power of Slashdot and its ability to travel backwards through time. Now, that is what we should be calling the Slashdot effect. I could have sworn problems with "loose" and "opps" existed before Slashdot (likely the others as well).

Are you sure you are not just blaming Slashdot for all the language woes like the sitting President is at fault for all the country's woes? Personally, I think the problem is with the written language and the inc

Cracking things like hashed passwords or encrypted data is computationally difficult. It's natural that, given this type of computational power (i.e. distributed in "the cloud" rather than in one system), it's one of the first things people try.

GPUs are very specialized processors, therefore they will always outperform the general purpose CPUs in their domain of computational problems (graphics, physics, other massively parallel/pipelineable problems like password cracking). However they would really suck at doing "normal" tasks like running the OS and other applications. GPUs having their own memory and other components directly wired to and optimized for them on graphics cards gives them additional advantages. Finally most gamers forget that the