Google as a password cracker

One of the steps used by the attacker who compromised Light Blue Touchpaper a few weeks ago was to create an account (which he promoted to administrator; more on that in a future post). I quickly disabled the account, but while doing forensics, I thought it would be interesting to find out the account password. WordPress stores raw MD5 hashes in the user database (despite my recommendation to use salting). As with any respectable hash function, it is believed to be computationally infeasible to discover the input of MD5 from an output. Instead, someone would have to try out all possible inputs until the correct output is discovered.

So, I wrote a trivial Python script which hashed all dictionary words, but that didn’t find the target (I also tried adding numbers to the end). Then, I switched to a Russian dictionary (because the comments in the shell code installed were in Russian) but that didn’t work either. I could have found or written a better password cracker, which varies the case of letters, and does common substitutions (e.g. o → 0, a → 4) but that would have taken more time than I wanted to spend. I could also improve efficiency with a rainbow table, but this needs a large database which I didn’t have.

Instead, I asked Google. I found, for example, a genealogy page listing people with the surname “Anthony”, and an advert for a house, signing off “Please Call for showing. Thank you, Anthony”. And indeed, the MD5 hash of “Anthony” was the database entry for the attacker. I had discovered his password.

In both the webpages, the target hash was in a URL. This makes a lot of sense — I’ve even written code which does the same. When I needed to store a file, indexed by a key, a simple option is to make the filename the key’s MD5 hash. This avoids the need to escape any potentially dangerous user input and is very resistant to accidental collisions. If there are too many entries to store in a single directory, by creating directories for each prefix, there will be an even distribution of files. MD5 is quite fast, and while it’s unlikely to be the best option in all cases, it is an easy solution which works pretty well.

Because of this technique, Google is acting as a hash pre-image finder, and more importantly finding hashes of things that people have hashed before. Google is doing what it does best — storing large databases and searching them. I doubt, however, that they envisaged this use though.

135 thoughts on “Google as a password cracker”

Notably, some Danbooru clones (anime image repositories notorious for copyright apathy and a tendency to disappear without warning) store images by their hash. For example, a picture with the hash 75f630020aeb08d6dd65bdc5098783f3 would be renamed 75f630020aeb08d6dd65bdc5098783f3.jpg and stored in the folder 75f630. In addition to dividing up the images into roughly even groups, this prevents duplicate file uploads – a plus, considering how scarce moderator presence tends to be.

Salting is really easy. Don’t just hash the password, concatenate a secret phrase and then hash. When a user inputs a password then perform the same algorithm. Each site should have a unique, secret phrase. If a site is compromised then the unique phrase prevents attacks with pre-computed rainbow tables.

Shouldn’t hashing be sopped – rainbow tables have completely ruled this type of secrecy as moot. Any size limitation of rainbow tables (120Gb) is meaningless in a broadband context with reasonable drive space.

No! You don’t want one salt for your whole site, you want separate salts for every input. Otherwise it would be very simple to find one user’s password, and know every other user with the same password.

In other words, take the md5 of the password, then concatenate that with a secret string. Then take the md5 of that concatenation.

If you have unique user ids (like an auto-increment id from a database table) adding that on top of the site-specific string would make it even better, because then you wouldn’t get the same password hashing to the same value for two different users.

Adding a reasonably sized salt into the calculation will prevent this type of attack, and make rainbow tables ineffective too. Thomas Ptacek wrote an article about password calculation, including salting, on SecurityFocus.

As David points out, the salt should be user specific, rather than site specific. This forces the attacker to break each password individually, rather doing them all at once.

Designing a secure password hashing scheme is difficult, and there are additional considerations, other than salting. It’s best to use an existing design and preferably implementation, such as bcrypt or Poul-Henning Kamp’s MD5 based one.

A while back i wrote a tiny 5 line script that would run from a_ZZZZZ writing out permutation -> hash until 101k of data had been reached (since it was rumored that google only index into 101k of a doc). The doc would end with a link to the same cgi with a different seed.. the obvious plan being that once google got the first page, it would keep following links till it ran through the entire space.. (https://secure.sensepost.com/sp-hash/a)

i only planned a limited test but strangely enough i foind eventually that some pages didnt show up on results consistently.. i suspect getting some sort of cross linking would help convince google of its use as a valuable page..

Unlike the real stuff, Hashed Pepper is only good with per-user Salt. Its use is that you store it someplace else than per-user Salt, in case your db is compromised but not your app, or that sort of thing.

Ummm, no. That defeats the purpose of salt, which is that, if I set my password to “flowers”, then I can’t just pop out to /etc/password and see whose hash matches mine. (yes, shadow files also come into play, work with me here people)

Salt is randomly generated, then stored with the salted, hashed password. $Hash = Hash( $salt + $pwd), store $salt, $Hash in the database. To authenticate, you grab salt from the table, hash it with the user-provided password, and see if the resultant Hash’ matches the stored value of Hash.

such things are easy for google to protect against. non-retard crawlers don’t follow links forever. for any given page, there is a certain probability a crawler will follow a link on the page, or jump to a random url in its database of known urls to search. this means that the greater the depth, the less likely a crawler is ever going to reach it.

Of course, the question is begged: “Why the hell are all of you still using MD5 hashes anyway?” MD5 is broken. Period. Don’t use it, even salted.

Using salted SHA-1 would be a major improvement and is supported natively by PHP since 4.3 and mySQL 4.0.2. Even SHA-1 is broken-ish (collisions in 2 ^11 ops), but until there are native implementations of SHA-256, et al, it’s good enough for the time being.

“Here’s a pair of valid X.509 certificates that have identical signatures. The hash function used is MD5. … And here’s a paper demonstrating a technique for finding MD5 collisions quickly: eight hours on 1.6 GHz computer.”

I have the most secure password storing around. I have a trained monkey that waits for the password to pop on the screen then he compairs it to a list of passwords written on paper. If they match he hits the button if they dont , the throws pooo at the screen. I was going to replace him with a MCSE because they are cheaper, but alas the monkey is part of a union…

It was /usr/dict/words on Linux. This doesn’t contain proper nouns and so isn’t a very good password testing list, but it is handy. Why spend the effort getting something better when Google works so well?

The nonce (salt) your function generate_salt() generates is to short (and limited in range, too, without reason) and of questionable quality (PHPs rand() function is rarely implemented as a cryptographically secure PRNG).

Using a cryptograhical secure hash function to encrypt passwords isn’t state of the art anymore (vid. http://www.securityfocus.com/blogs/262 for a short summary for some more recent methods). But these hashfunctions are “good enough” for almost all of the simple cases where a failure doesn’t cost much and/or isn’t the weakest link. It doesn’t make much sense if the storage of the passwords your website is very carefully encrypted but the whole messages are transported as plaintext because you were to cheap to buy a certificate and install a SSL-server or to use AUTH-DIGEST.

I don’t know what the distrust about salting is all about, so I will explain why it works. ‘Salting’ means that you generate a random sequence of, say, 10 characters. This is concatenated with the password. Then both the salt and password are hashed. It is stored as a tuple:

(salt, HASH(password | salt))

This has the advantage of defeating all types of tables if the salt is long enough (longer salts make it less likely that a table could contain it). The order does matter so that even if Eve knew your salt and password, she would have to compute the hash digest from the beginning each time. For example, if I hash X, it is a very easy to compute the hash of Xy. Hashing zX requires me to start from the beginning. Think about it.

Fun fact: if a password is salted and takes N seconds to find a collision, it would take on the order of N/log(N) seconds to find a collision of an unsalted password (if I did my math right).

Just a suggestion, but maybe you should think twice before you publish an article (that the hacker responsible may have even read) that describes vulnerabilities of your database and even how you it could be decoded. Especially if you haven’t fixed the security hole they used to get in.

Also, anyone trying to test their own hash for vulnerability should perform a wikipedia search on rainbow table.

I did consider that, but believe that in this case the benefits of full disclosure far outweigh the costs. Attackers know about rainbow tables and other more powerful techniques than the one I presented here (which is more a neat hack than a practical idea). In the case of WordPress, you don’t even need to reverse the hash to break into a blog, and attackers know this too.

Here’s a useful guide for anybody who wants to use salting in PHP. A nice article that has really helped me. Concerning the story though, who’d have thought Google would have come to the rescue. After all, you can find all the MP3s you ever need via Google. Little blackhat site that it is!

You could’ve used MDCrack (mdcrack.openwall.net) or any of the various MD5 hash cracking sites, or RainbowCrack using rainbow tables (as you mentioned). You can download/buy gigabytes of rainbow tables somewhere.

I believe that support for unsalted MD5 (not raw MD5, but most likely hex, in this case) can be compiled into John the Ripper. There may be a patch for it, if not. I can’t really recall. MDCrack is pretty good though. There are others, but MDCrack is the fastest I’ve used.

As far as Google’s infrastracture goes: I don’t know how well suited it would be to MD5 cracking, especially in parallel. You’d also have to take into account the amount of normal traffic they’ve got passing through at any given moment. But just linearly extrapolating my (rather old) processor’s numbers of about 1.0 x 10^7 h/s to BlueGene/L’s, for example, 131,072 processors (this is probably more now), you get something like 62^9/(1.0*10^7*131072)/3600 or roughly three hours. Google probably has that kind of raw processing power on hand. If I recall, they have something in the neighborhood of 400,000 servers.

There is nothing insecure about salting. MD5 might be “broken” in a sense, but it is not deprecated, and it is just fine for use as a password storage hashing function. The amount of people here espousing technical authority when not having any idea what they’re talking about is astonishing. I would recommend you take what has been said up to this point mostly with a picnh of salt (pun intended). For instance, Christoph Zurnieden is mostly right, except he has no idea what the difference between encryption and hashing is. This is not encryption.

The amount of PHP code and pseudocode flying around here on how to securely created salted passwords is high, and mostly wrong. Don’t rely on ridiculous, inefficient, and largely insecure methods of generating “salt” by concatenating a number of different md5()s together. Please check your system’s libraries and use crypt() where you can. And use a proper salt generation library.

The PHP script that one guy posted is very flawed. I don’t see any input checking or SQL validation or anything. The salt is very short and weakly generated with an insecure rand(). I would recommend bcrypt and Solar Designer’s phpass framework, which is available at http://www.openwall.com.

He is a professional programmer, and well versed in security; he is also the creator of John the Ripper.

This is the reason why hash-based password encryption should always use a salt… AND your password hashes should be considered very sensitive information anyways (a cracker using rainbow tables can crack your MD5’s pretty fast anyways).

yeah.. google can block it, but again this can be beaten by distribution.. at any rate, it was an experiment, and with the cost of disks today, and the good work done by the very bright ppl at objectif securite, not necessary..

Yeah. Everyone must be wrong and you must be right, because you’re super smart and you’ve got it all figured out. Why don’t you just do some research before posting? Any time you’re so confident you’re right, do a double-take, and go find out how you’re wrong.

The value of salt has nothing to do with it being hidden. The value of salt is in its ability to increase password complexity, defeat rainbow tables and pre-computed hashes, and make it difficult to know what password others are using just by managing to reverse a single hash.

And to anon: it is nowhere near infeasible to brute force > 5 chars. I can do this in 12 minutes on my old computer (ca. 2003 hardware) using MDCrack on unsalted MD5 for the entire printable ASCII keyspace.

If you want to have a truly secure password, there’s no FIXED LIMIT on the minimum characters you should use. It is always changing. I would suggest pass phrases as a somewhat acceptable alternative, but passwords are mostly just obsoleted. That’s not to say they aren’t used and won’t continue to be used, but they’re inherently insecure.

If you really need to use passwords or passphrases and want to remain as impervious to brute force, dictionary, and pre-computed attacks as possible, then use something with an adjustable cost, like bcrypt, as I suggested before. For PHP applications, again, use Solar Designer’s PHPass framework: http://www.openwall.com/phpass/

Daryl:
> except he has no idea what the difference between encryption and hashing is. This is not encryption.

I don’t speak English natively so my informal posts most probably contain some errors but moreso many ambiguities, but that is not the case here. The use of a hash-algorithm for encryption is still encryption and doesn’t make it hashing just because you say so. You can use a lot of algorithms to encrypt something, be it Rijndael, Bluefish, base64, rot13, or /bin/rm, apart or in groups, for good or bad, it doesn’t matter. The meaning of the word “encryption” is way too broad to be usefull for anything formal. Please describe it exact (and therefore needs a formal description, you may use LaTeX2e+amsmath.sty) before you attack me for using a word inappropiatly.

It’s so fun, I looked for word “puta”, bitch in spanish with md5 1ac461a2e12a77ad54c67128b5060f28 and the only match was a paper on how to use md5 passwords to protect acces
Actually the only result I can find is your own post. 8D

I don’t know what you’re talking about, Christoph. You don’t use hashing algorithms for encryption, because the concept of encryption and hasing are completely different. Hash algorithms are not supposed to be reversed. Encryption is. That’s it.

I also made a slight error in my last comment that I’d like to correct. I said “The value of salt is in its ability to increase password complexity.”
I meant to say that it makes it impossible to try candidate passwords against multiple hashes at once (reading from http://www.openwall.com/lists/john-users/2005/12/18/1).

Daryl:
>You don’t use hashing algorithms for encryption, because the concept of encryption and hasing are completely different. Hash algorithms are not supposed to be reversed. Encryption is.

That is not a sufficiently distinctive feature. I can (and did it once) use one of the algorithms you called “supposed to be reversed” to build a nonreversible encryption. You can build the key-pair of an asymetric encryption algorithm (for example one of the Diffie-Hellman variants) and delete the private key and all the data necessary to build the pair. Now you can encrypt anything irreversible but repeatable with the remaining public key. If you add signature to that algorithm you can build a system that doesn’t involve any transport of a password for authentication (at least in theory

You can do the same with hashing algorithms: build a reversible encryption. The simplest algorithm is probably to build a PRNG with a hash function seeded with a key and XOR that stream with the message. (Don’t try it at home! It has a lot of weaknesses if not done right!). And it is in theory possible to reverse a hash of input with a length equal or lesser then the length of the hash because no information up to that amount is lost. (Holds only if the hashing algorithm is deterministic of course Some of the cryptograhic hashing algorithms take that fact in account by implementing a special information dump that loses a bit here and a bit there. OK, that simplifies the techniques used quite a lot, but I don’t think a post in a blog is the right place for that level of math involved here. I fear that would stretch the patience of our dear host a bit to much

But—long spiel, short meaning—you can’t differ encrypting algorithms from hashing algorithms in cryptography because they have way too much in common, especially in the underlying math. You have to describe the usage every time you mention one or both. In this case of password encryption we’re looking for a “nonreversible but repeatable encryption algorithm”. That can be done with a cryptographicaly secure hashing algorithm (which can be described as the forementioned “nonreversible but repeatable encryption algorithm”) or, as shown above, with an asymetric encryption algorithm (which seems(!) to be even better).

So it leads all down to the implementation if an encrypting function is reversible and a hashing function ist not—or vice versa That’s it.

Dont know why I’m even bothering to post, considered there are so many (wrong) technical posts here, but anyway.. John the Ripper, a good custom dictionary, good custom rules and a decent GPU – yes I said GPU (even throw in a CPU or two) and you have your self a mighty fine password cracker – almost regardless of the encryption has used. John has the ability to generate password lists based on rules and custom dictionaries.

You could also consider Medusa for parellel password cracking… but then again, maybe I’m just giving you guys really bad ideas (not technically bad, just plain BAD).

Using a plain string as a password is just plain stupid if you have confidential info searching/cracking an md5 hash with input containing -> !@#$%^&*() 0-9 and Aa-Zz becomes alot more harder..
I’ll give ya credit for using google to look up the md5 hash but the hacker aint that smart…
Now lets all start using google to search for hashes jeej…

As john y says in the first reply, it exists some reverse md5 databases, or more than that, a reverse md5 databases aggregator, http://md5.noisette.ch, which lookup for md5 hashes in more md5 databases.

You can’t seriously believe that there is no fundamental difference between encryption and hashing?

Encryption produces a 1-to-1 mapping between plaintext and ciphertext, for a given cipher and key (or key pair where appropriate). That is, there is only ONE ciphertext for a given plaintext, and vice versa, for a given cipher and key.

Hashing produces a MANY-to-1 mapping between input and output. There are MANY possible inputs that produce the same output, for the same hash function. This is fundamentally different to encryption IMHO.

> You can’t seriously believe that there is no fundamental difference between encryption and hashing?

This is not a theologic seminar, there’s absolutly nothing based on faith here—ens metaphysicum non est necesse.

> Encryption produces a 1-to-1 mapping between plaintext and ciphertext, for a given cipher and key (or key pair where appropriate). That is, there is only ONE ciphertext for a given plaintext, and vice versa, for a given cipher and key.

That is only an assumption and it is indeed possible to find algorithms that can have many ciphertexts for a given plaintext; the salted hash discussed in many instances above is such an algorithm. Given a specific instance of a plaintext (the password) it can output many different ciphertexts. If the word “hash” disturbs you, you can easily exchange it with somthing more encryption like, e.g. Rijndael, the algorithm used in AES:
Encypt the salt with AES with the password as the key.

> Hashing produces a MANY-to-1 mapping between input and output. There are MANY possible inputs that produce the same output, for the same hash function.

You can also find algorithms that hash every input into exact one output — $f(x)=x$, the mathematical equivalent of a short piece of wire is such an algorithm.

> This is fundamentally different to encryption IMHO.

Sorry, but there is no fundamental difference here, the distinction is arbitrary and has to be listed in the prolegomena of your paper before use. Everyone does it, see e.g. http://citeseer.ist.psu.edu et. al. for a lot of examples.

But you are in one part right: the colloquial meaning of “hash” or “hash function” is indeed that of the mapping of the Kleene-Star set K over the alphabet {0,1} to a proper subset of K, mostly with a fixed size (but not necessarily fixed, see for example “Radio Gatun”).
$K^star mapsto {0,1}^n$ with $K = {0,1}$ and $|K^star| < |{0,1}^n|$
The colloqual meaning of “encryption” is that of the mapping of the finite set $S_1$ to the finite set $S_2$ with a function $f$ such that
f: S_1 mapsto S_2
and that an inverse function $f’$ exists such that
f': S_2 mapsto S_1
In detail ($m$ shall be the message, $k_n$ shall be the keys and $c$ the result of $f(m,k_1)$; all shall be sets out of the finite universe ${0,1}^n$ )
f(m,k_1) = f'(c,k_2)
where $f = f’$ and $k_1 = k_2$ for symmetric encryption and $k_1 not= k_2$ for asymmetric encryption.
But that difference is not imperative, it’s just colloquial and also overly simplified. If you want a special distinction: describe it formally or give at least references.

In terms of collisions: If MD5 is a random hash function (all outputs are equally likely) then adding a salt would not change the time required to get a collision. It’s just that a collision is far less likely to have useful information.

E.g. You may discover that S1:P1 has the same hash as S2:P2 I don’t see how this is a win.

@102
We weren’t debating which terms we like best. We were debating what are the actual meanings of those terms.

If a person refered to html as a “procedural programming lanaguage”, would you tell him that he had that wrong? Or would you say, “Oh, that’s the term he prefers, it’s just a personal thing, I shouldn’t argue the point with him” ?

Salt is useful since it forces the off-line password search to be repeated anew for each password you want to break, rather than the attacker being able to use a pre-computed database to attack many passwords in parallel. And yes, it should be unique per account (or better, per password–new password means new salt).

There’s another technique: make the hash algorithm SLOW. For a simple example, iterate the hash 10,000 times. The performance impact on your system is negligible, but the performance impact on the attacker is huge.

And so does this site
For example when you go the the login page, you can ‘test’ if there is a username ‘admin’ (or any other name). If so, then bruteforcing could be tried.
also, often wordpress folders like
/wp-content/plugins/ or /wp-admin/ are readable.
It shows what plugins you use, and often reveals the full path of your files by directly accessing the php files.

You got pretty lucky that the password was a common/proper dictionary word. I just did a few tests by adding numbers onto some simple random words, and google didn’t have anything indexed.

Another vote for John The Ripper though. The rules engine is very good and will find things like that quickly.

An interesting project would be to develop a tool that would test your clear text password against all the common variation algorithms and help you to pick a password with a high probability of needing a pure brute force crack to discover. Much better than the simple “one letter, one number, one symbol” requirement that a lot of services are doing these days.

I own, HashHack.Com -The Online MD5 Cracker and that handles hashes with substitutions quiet well, by this evening my database will have 20 Million MD5 hashes so its worth a look for all those people with hashes to crack.