Why passwords have never been weaker—and crackers have never been stronger

Thanks to real-world data, the keys to your digital kingdom are under assault.

Attack of the dictionaries

This sort of password cracking entered the public consciousness thanks largely to the 1980s hacking thriller The Cuckoo's Egg, in which author Cliff Stoll chronicles his real-life pursuit of a hacker who breaks into US computer systems and steals sensitive military and security documents on behalf of the Soviet KGB.

The long history of passwords

The first recorded use of secret words to authenticate a human being dates at least as far back as ancient Rome, according to Joseph Bonneau, a University of Cambridge student who recently completed a PhD thesis on passwords and personal identification numbers, titled "Guessing human-chosen secrets." The Roman military developed a careful procedure for circulating daily watchwords known as signa to prevent infiltration by enemy soldiers.

Secret authentication words also appear in the tale of "Ali Baba and the Forty Thieves," included in some versions of the One Thousand and One Nights collection of folk tales, when the protagonist uses the famous phrase "open sesame" to unseal a magical cave.

Bernardo in Shakespeare's Hamlet may also be invoking a passcode when, at the opening of the play, he identifies himself to castle guards with the words "Long Live the King!"

The first use of passwords for a computer system is believed to have taken place in the 1960s with the Compatible Time-Sharing System at the Massachusetts Institute of Technology, according to Bonneau (with additional color from Wired reporter Robert McMillan here). A password for each user account was stored in an unencrypted master file and was used to ration scarce computing time. According to both accounts, a doctoral student at the college admitted to what's likely to be the first-ever password compromise so he could increase the time available for his own projects.

The system saw what may be the first-ever password database leak in 1965 when a bug sent the file to a public printer, requiring administrators to manually reset every password.

The book is packed with people in high places who undermine national security with poor password hygiene—an account on the network of defense contractor SRI Inc. with a user name and password of "SAC", for example, or a super-user account for Lawrence Berkeley Labs that hadn't been changed in years.

"When money was stored in vaults, safe-crackers attacked the combination locks," writes Stoll, who as a displaced astronomer becomes the book's unlikely hacker-hunting protagonist. "Now that securities are just bits in a computer's memory, thieves go after the passwords."

Stoll's account was one of the first to show how a hacker armed with little more than a dictionary and a Unix computer could crack any password in the English language, even when the passcode was stored as only hash on a hacked machine. At one point, Stoll compares the crypto function—which was then based on the now-antiquated Data Encryption Standard (DES)—to a one-way meat grinder that converts each human-readable word into unique ciphertext.

"Did this hacker have a magic decryption formula?" Stoll asks. "If you turn the crank of a sausage machine backwards, pigs won't come out the other end."

Only later would Stoll learn that the hacker was feeding each word of the dictionary—starting with aardvark and ending with zymurgy—into the same DES hash function the hacked Unix systems used. The intruder then compared the output to the ciphertext contained in the intercepted password files.

"This was serious stuff," Stoll wrote. "It meant that every time I'd seen him copy a password file, he could now figure out legitimate users' passwords. Bad news."

Stoll didn't know it at the time, but even as the intruder was using a dictionary to guess his users' passwords, cryptographers were fashioning a new type of attack that would ultimately be able to crack orders of magnitude more hashes in a faction of the time.

The rainbow connection

The germ of this new approach originated with Martin E. Hellman. In 1980, Hellman published a paper titled "A Cryptanalytic Time-Memory Trade-off" that proposed what came to be called Hellman tables. These tables were compiled ahead of a password attack and worked by using precalculated data stored on disk. Hellman tables reduced the computing resources required to crack a DES hash from about $5,000 to just $10. In 2003, fellow cryptographer Phillippe Oechslin proposed refinements to Hellman's technique that vastly improved the effectiveness.

The result is now what's known as rainbow tables. Almost overnight, they changed the way people went about cracking large numbers of password hashes. Like earlier time-memory tradeoffs proposed by Hellman, the concept was simple. Rather than asking a computer to enumerate each possible password in real-time and compare it against a targeted hash, precalculated data was stored in memory or on disk in a highly compressed form to speed up the process and lower the computing requirements needed to brute force huge numbers of hashes.

While earlier techniques had also tried this approach, they produced tables that were unnecessarily large and therefore unwieldy for cracking passwords. The genius of rainbow tables is a complex mathematical formula that expresses virtually every possible password combination without requiring each one to be stored in memory or on disk. Each table targets a specific algorithm and keyspace, and it contains a collection of chains. Each chain starts with an arbitrary password on one side and ends with a single hash value on the other end. The beginning password is put through the algorithm to generate its hash, and that value is then passed through one of many different "reduction functions" to generate a new password guess. The new password is then hashed.

From cryptographer Kestas Kuliukas: A rainbow table chain starts with an arbitrary plaintext, hashes it, reduces the hash to another plaintext, hashes the new plaintext, and so on. The table stores only the starting plaintext and the final hash, and so a chain "containing" millions of hashes can be represented with only a single starting plaintext, and a single finishing hash.

The breakthrough wasn't just the speed with which the tables could crack passwords; it was also their ability to crack almost every possible password as long as it didn't fall outside the targeted keyspace. Rainbow tables are believed to get their name because each chain link uses a different reduction function, but all chains follow the same pattern—much as each color in a rainbow is different but all rainbows follow the ROYGBIV pattern.

The space savings alone are huge. Storing a table of every possible 10-character password with only lowercase letters, along with its corresponding MD5 hash, would require about 3,108 terabytes of disk space. A rainbow table expressing 99.9 percent of those combinations, by contrast, requires just 167 gigabytes.

In the era of Windows XP, when Microsoft's underlying LAN Manager restricted password lengths to no more than 14 characters that at maximum were converted into two seven-character passwords and that converted all letters into uppercase, the results were devastating. In 2003, hackers released Ophcrack, an open-source program that used rainbow tables to crack most Windows passwords in just minutes. Even more powerful cracking applications quickly followed.

"The fact that you can have this thing that anyone can download that will crack literally any Windows XP password hash was really cool," said Marlinspike, who has designed CloudCracker, a service that takes about 20 minutes to check a WiFi password against 300 million possible words. "It's not like I got 20 percent, or 50 percent, or even 80 percent. You got all of them. That was a major thing."

The huge advances in GPU-assisted password cracking have diminished much of the advantages of rainbow tables, however. Passwords with six or fewer characters can be brute-force cracked with less fuss using GPU-powered computers, while passwords longer than nine or 10 characters require rainbow tables with unwieldy file sizes. That leaves only a small sweet spot of seven or eight characters where rainbow tables are especially useful these days.

Still, the tables maintain their status as a useful, if niche, tool for some hackers. Witness Free Rainbow Tables, a project that allows volunteers to donate spare computer cycles to generate publicly available tables that crack hashes returned by algorithms including SHA1, MD5, and NTLM. Its organizers have already amassed six terabytes worth of data. And with the participation of more than 3,900 volunteer computers, Free Rainbow Tables adds an estimated 36 megabits of table data every second, according to James Nobis, one of the developers behind the project.

Needs more salt

An updated version of LAN Manager known as NTLM was introduced with Windows NT 3.1. It lowered the susceptibility of Windows passwords to rainbow table attacks, but didn't eliminate the risk. To this day, the authentication system still doesn't apply cryptographic "salt" to passwords to render such attacks infeasible.

Salting appends several unique characters to each account password before running it though a cryptographic function, a process that blunts the value of rainbow tables and other types of precomputed attacks. A 16-bit salt, for example, requires 65,535—or 216—separate tables to be defeated. A random salt of 32 bits makes rainbow table attacks even more impractical by pushing the number of tables required to more than four billion. (The salt must be saved for each user and is usually stored beside the user name and password hash, so the information is available during each user login. Salt is rarely kept apart from the hash. Even when known, its virtue lies in its uniqueness, which defeats pre-computation of results.)

To illustrate what this looks like in practice, we created a new Linux account for "testuser." The operating system stored the login data in a single long line of text (kept in /etc/shadow, where Linux stores passwords):

The line is broken up by colons—first comes the username, then the lengthy password section, then data about when the password was last changed, how old it is, when the account expires, and more.

The important bit for our purposes is the password section, which is internally divided by $ symbols. First comes the number that identifies the hashing algorithm used—in this case, 6 corresponds to the SHA-512 algorithm. Next is the salt, 2lvEhpi5. Finally, there's the hash itself, a long string of letters and symbols.

In addition to making rainbow-table attacks infeasible, salting can also significantly add to the resources required to carry out more traditional cracking attacks, since it ensures that each stored hash is unique even if two users choose the same passcode. That, in turn, requires each hash in a compromised table to be cracked separately, even if they mask one or more identical plaintext passwords.

Despite the benefit of the technique, and the relative ease of implementing it, a surprising number of websites—including LinkedIn, Yahoo, and eHarmony—didn't use it when they were recently breached. Hashes derived from NTLM, because they never use salting, are among the easiest to crack.

To the detriment of millions of Internet users, going without salt is only one of the many sins that popular websites routinely commit against password security.