Passphrases only marginally more secure than passwords because of poor choices

Passphrases may be an easy-to-remember way to pack dozens of characters into a …

Passwords that contain multiple words aren't as resistant as some researchers expected to certain types of cracking attacks, mainly because users frequently pick phrases that occur regularly in everyday speech, a recently published paper concludes.

Security managers have long regarded passphrases as an easy-to-remember way to pack dozens of characters into the string that must be entered to access online accounts or to unlock private encryption keys. The more characters, the thinking goes, the harder it is for attackers to guess or otherwise crack the code, since there are orders of magnitude more possible combinations.

But a pair of computer scientists from Cambridge University has found that a significant percentage of passphrases used in a real-world scenario were easy to guess. Using a dictionary containing 20,656 phrases of movie titles, sports team names, and other proper nouns, they were able to find about 8,000 passphrases chosen by users of Amazon's now-defunct PayPhrase system. That's an estimated 1.13 percent of the available accounts. The promise of passphrases' increased entropy, it seems, was undone by many users' tendency to pick phrases that are staples of the everyday lexicon.

"Our results suggest that users aren't able to choose phrases made of completely random words, but are influenced by the probability of a phrase occurring in natural language," researchers Joseph Bonneau and Ekaterina Shutova wrote in the paper (PDF), which is titled "Linguistic properties of multi-word passphrases." "Examining the surprisingly weak distribution of phrases in natural language, we can conclude that even 4-word phrases probably provide less than 30 bits of security which is insufficient against offline attack," the paper says.

The "30 bits of security" means the chances of a single guess cracking a four-word passphrase would be one in 230. What's more, the two-word phrases cracked in the study provided just 220.8 (or 20,656/0.0113) bits of security. Another way of expressing the same finding is that a dictionary of slightly less than 21,000 phrases is enough to guess the login credentials that slightly more than 1 percent of people in the real world will use.

The study by Bonneau and Shutova is among the first to examine passphrases used by real-world people to access accounts. While it concludes phrases are harder to guess, the increased entropy isn't enough to withstand offline attacks, in which a stolen database of hashed passwords may be subjected to hundreds of millions of guesses in an attempt to find the right combination of characters.

"The most important thing about this paper is it provides some hard data on how people create passphrases when they are forced to use passphrases instead of passwords due to policy requirements," said independent security researcher Matt Weir, who focused on password cracking for his PhD at Florida State University. "That's actually a big deal because organizations can start using those findings when creating their own password policies. It makes it easier to estimate the actual effectiveness of security controls vs just saying 'it's more of a pain so it has to be secure.'"

Heard that before

To understand why passphrases failed to live up to their potential, the researchers extracted two-word phrases from sources including the British National Corpus and compared them to the phrases they had cracked from Amazon's PayPhrase system. They found most of the overlap involved common nominal modifier-noun phrases such as "bedtime story" or adverbial-modifier verb relations such as "never leave."

They ran another comparison using the Google n-gram Corpus, which harvests vast numbers of words and phrases published online. To evaluate how Amazon users may have chosen their passphrases, they compared them to a ranked list of the most common phrases and found a high correlation. "This leads us to conclude that users don't stray far from natural language patterns when choosing passphrases," their paper states.

Similarly, the researchers crawled Facebook's public index from 2010 and pulled 10,000 randomly selected names. A full four percent of the names were chosen as passphrases by Amazon users.

The findings are preliminary because the researchers could find passphrases only when they attempted to register an account and used a combination of characters that had already been selected. Unlike studies involving the RockYou compromise and other breaches, they didn't have the opportunity to analyze all the credentials. What's more, the Amazon service required users to combine their passphrase with a four-digit PIN, and that may have influenced how phrases were selected. The paper suggests further collaboration between security and linguistics researchers.

The inquiry into the added security benefits of passphrases comes as a report from security firm Trusteer concludes that passwords are the weakest link in the IT security chain, particularly those used to secure network administrative controls. The findings also arrive as some people are looking to passphrases to tame the problems that result when users authenticate themselves on smartphones, which have input interfaces that aren't ideal for entering many non-alpha numeric characters. The research is also important to those who use passphrases to protect private encryption keys used to encrypt email and SSH sessions.

"These finding suggest that multi-word phrases, if chosen naively according to natural language tendencies, are not as effective at mitigated guessing attacks as alternate choices, such as choosing 2 random words or choosing a personal name at random," the paper warns.

The nice thing about passphrases is that the computer can, with a massive dictionary, generate a nice combination of 4+ words and it will be quite easy for most people to remember in comparison to a 10+ random alphanumeric securely generated password.

HA! Right. Passphrazes are the way to go. But use uncommon words in nonsensical (but easy to remember!) order, and throw in at least one random element (not a dictionary word) and use punctuation marks.

Basically use a sentence with correct punctuation and capitalization and make said sentence easy to remember but atypical of common speech.

mary had a little lamb > better than a dumb password but not that good. This is exactly what the article is about!

Sorry, but all this talk about passphrases is just talk until everyone starts embracing the same standard. Fidelity, for example, limits passwords to 12 letters and no, you can't have spaces - or any special characters, for that matter.So I can't have a long password and I can't have a completely random password. My bank does not allow special characters either. Sure, they force you to use capital letters and numbers and force a password reset after 3 invalid attempts, but still!

Aren't natural language passphrases vulnerable to dictionary attacks anyway? At least as long as you're using only the basic forms (singular, infinitive etc.), attackers could use an algorithm that uses words as the smallest entity to greatly reduce the complexity as opposed to one that uses characters.

Still, there are many more words than there are characters in the charset, so it's still quite an improvement. Plus, if you mix a memorable nonsense word into the phrase (e.g., anything that has alternating consonants and vowels should work), you should be able to resist any purely dictionary-based attack.

Passphrases require a bit of caution exactly like passwords, but good passphrases are vastly easier to remember than good passwords. The problem still is that you rarely know the length limits of whereever you want to use a passphrase and they're often surprisingly tight.

HA! Right. Passphrazes are the way to go. But use uncommon words in nonsensical (but easy to remember!) order, and throw in at least one random element (not a dictionary word) and use punctuation marks.

Basically use a sentence with correct punctuation and capitalization and make said sentence easy to remember but atypical of common speech.

mary had a little lamb > better than a dumb password but not that good. This is exactly what the article is about!

Sheriff town, 4 Prison keep? > that's pretty good!

"correct horse battery staple" is also a pretty bad password.

Baumi wrote:

Aren't natural language passphrases vulnerable to dictionary attacks anyway? At least as long as you're using only the basic forms (singular, infinitive etc.), attackers could use an algorithm that uses words as the smallest entity to greatly reduce the complexity as opposed to one that uses characters.

That's what I'm thinking. If you have a dictionary, you can just try different permutations and combinations of words.

If you swap out a few letters in your phrase for symbols and numbers and Capitalize a few but not all letters you are again just working with a very long complex password, which is of course harder to crack than a short complex password.

It feels like the problem is easily solved with length, but I'd love to see some data on it. Sure "New York Yankees" is a crappy password, but it's also only 16 characters. I'm very curious to see the shape of the curve as lengths go up. How secure is "Paper tile tar chime" vs "In case of emergency, break glass" vs "rollingbacktransactionstakesforever"?

Pass-phrases are just long pass-words with spaces between the words. Just get sites/systems to allow longer passwords (some limit the length to 12 characters--or even less) and passphrases. Or use tokens. (And don't force people to change them every month or two... or at all--offers as much security as the hard-to-remember/easy-to-guess stuff.)

Am I the only one who has to deal with some websites and password-authenticating sources that limit the characters of my password to 12 or 16 characters? I mean if given unlimited characters I usually just choose phrases from songs with sometimes are non-sensical enough to get by.

Examples: If you need a little loving 634 5789 Thats my number <--- easy to remember numbersFight! for your right! to PARTY! <--- Punctuation and CapsNever gonna give you upNever gonna let you downNever gonna run around and desert youNever gonna make you cryNever gonna say goodbyeNever gonna tell a lie and hurt you <--- You done got Pass-rolled!

Don't most online accounts flag and refuse to allow further attempts after a certain number of failed attempts? With a 24 hour reset delay it seems it would take them forever even with a more intelligent guesser like this? Wont this be for local logins or services where unlimited guesses are allowed without locking? I always wondered why they harp on creating these really complex passwords, it seems to me the problem should be looked at in reverse, why are they accepting so many incorrect responses and why are we still building software that allows that.

"These finding suggest that multi-word phrases, if chosen naively according to natural language tendencies, are not as effective at mitigated guessing attacks as alternate choices, such as choosing 2 random words or choosing a personal name at random"

Translation: Naive users choose weak passwords.

...is this surprising? To say that passphrases conceptually aren't really stronger than passwords is just wrong. People simply need to try a little harder. How hard is it to avoid natural language. Here, I'll do it right now!

It's been estimated that normal text only contains about 2-2.5 bits of entropy per character so 20bits for a short two word passphrase isn't much lower than you would expect. The problem (like passwords) is the name suggests to the user it should be something it isn't, i.e. a meaningful phrase rather than 4 or 5 random words. I was asked once to put together a high security login system (but the password strength requirements were silly) so rather than let the user decide a poor password it auto-generated a five memorable word passphrase from a list of about 8000: that gives you about 64bits of security (roughly equal to 13 truly random alpha-numeric characters) . If the user couldn't remember the 1st choice they could regenerate it a limited number of times and as a bonus for the user it didn't expire (strong enough with the additional 80000 rounds of SHA1 key stretching).

I wonder how turning a long phrase into a shorter password would compare. E.g., turn the phrase "The hills are alive with the sound of music" into "Thaawtsom!", taking the first letter of each word to create the password. It's certainly not anything one would find in a standard dictionary.

HA! Right. Passphrazes are the way to go. But use uncommon words in nonsensical (but easy to remember!) order, and throw in at least one random element (not a dictionary word) and use punctuation marks.

Basically use a sentence with correct punctuation and capitalization and make said sentence easy to remember but atypical of common speech.

mary had a little lamb > better than a dumb password but not that good. This is exactly what the article is about!

Sheriff town, 4 Prison keep? > that's pretty good!

"correct horse battery staple" is also a pretty bad password.

Baumi wrote:

Aren't natural language passphrases vulnerable to dictionary attacks anyway? At least as long as you're using only the basic forms (singular, infinitive etc.), attackers could use an algorithm that uses words as the smallest entity to greatly reduce the complexity as opposed to one that uses characters.

That's what I'm thinking. If you have a dictionary, you can just try different permutations and combinations of words.

Sure, but there are something like 500,000 English words. Discarding 90% of those (because people will presumably favor words they've at least heard before), a passphrase of five random words is ~1023 possibilities. Add in the possibility of proper nouns

An eight character random password drawn from upper, lower, number, symbol is something like ~1015. Each additional character gets you another two orders of magnitude, of course, so by the time you get to 12 random characters you've broken even.

But the "12 random character" approach to passwords has proven to be almost totally ineffective, since it means people write them down, reuse them, etc.

Obviously, it's possible to make bad passwords in any system. But if we accept that the way users treat their passwords - blithely trading away security for convenience - is a larger portion of the problem than the strength of the passwords themselves, passphrases still (IMO) win out.

Don't most online accounts flag and refuse to allow further attempts after a certain number of failed attempts? With a 24 hour reset delay it seems it would take them forever even with a more intelligent guesser like this? Wont this be for local logins or services where unlimited guesses are allowed without locking? I always wondered why they harp on creating these really complex passwords, it seems to me the problem should be looked at in reverse, why are they accepting so many incorrect responses and why are we still building software that allows that.

This is true. However, when there is a more widespread security failure (like an SQL Injection), the attackers can often grab the portion of the database that stores the encrypted passwords. Once they have that, the login controls on the website no longer matter. They can operate on the database directly and do thousands or millions of attempts per minute. Strong password policies are made to hedge against this case.

WHICH MEANS that, when they insist on super-strong passwords, they are implicitly confessing that they are incapable of keeping their databases secure. The site that requires strong password policies is basically saying "we can't do our jobs; please do it for us."

Actually it's excellent. You have four common words that are not normally found together. Each word gives you 10-12 bits of entropy for about a 44 bit password. (assuming a 2000 word dictionary)

44 bits is quite strong depending on the application. If you encrypting your computer with truecrypt, the password will be repeatedly hashed so a fast cluster of computers can only try 1000 per second in the worst case scenario. It will take this group of computers over a thousand years to crack a 31 bit password and you can forget about them cracking a 44 bit password like "correcthorsebatterystaple"

Edit: easy way to calculate the password bit requirement: ln(Passwords_per_second*3600*24*365*1000)

Aren't natural language passphrases vulnerable to dictionary attacks anyway? At least as long as you're using only the basic forms (singular, infinitive etc.), attackers could use an algorithm that uses words as the smallest entity to greatly reduce the complexity as opposed to one that uses characters.

Conceivably, yes. But even modest wordlists can produce more combinations than you would get just by going through alphabetic permutations. I calculate 30,229,200 for a two-word diceware passphrase vs. 5,311,735 for a lower-case 10-character passphrase. Three words puts you at 7.8 * 10^10 combinations compared to 1.6 * 10^10 for mixed-case 10-character passphrases. You may be better able to attack a passphrase by searching for phonetic combinations. I'm not sure.

The chief problem in any cryptosystem, is human sensibility about what is and is not random. Passphrases, passwords, and even one-time-pad systems are only secure as long as each element is randomly selected via a process that minimizes human input. If you're pulling your passphrase from human language, or selecting your tokens to better match human language, then you have a big problem because human language is structured and word frequencies follow Zipf's Law.

The primary advantage to random passphrases is that they're much easier to remember than the equivalent random password.

"These finding suggest that multi-word phrases, if chosen naively according to natural language tendencies, are not as effective at mitigated guessing attacks as alternate choices, such as choosing 2 random words or choosing a personal name at random"

Translation: Naive users choose weak passwords.

...is this surprising? To say that passphrases conceptually aren't really stronger than passwords is just wrong. People simply need to try a little harder. How hard is it to avoid natural language. Here, I'll do it right now!

"salami pavement gracious volvo"

Good luck brute forcing that.

It's good until the source that harvests phrases picks it up as relevant to the password-cracking topic. I suspect that the correcthorsebatterystaple and its simple variations is already a part of some dictionary. If you've got a good method for generating passwords, keep it to yourself, lest you find it is subject to subtle psychological bias that can be attacked by knowing the method ("there are no grey elephants in Denmark").

Don't most online accounts flag and refuse to allow further attempts after a certain number of failed attempts? With a 24 hour reset delay it seems it would take them forever even with a more intelligent guesser like this? Wont this be for local logins or services where unlimited guesses are allowed without locking? I always wondered why they harp on creating these really complex passwords, it seems to me the problem should be looked at in reverse, why are they accepting so many incorrect responses and why are we still building software that allows that.

David Kahn in The Codebreakers suggests that the CIA and NSA *might* have been able to partially crack Soviet one-time pad systems by exploiting the tendency for the little old ladies typing the carbons to make predictable mistakes and "correct" the random sequence.

Bank site developers are idiots. Throw up a picture on a second page with the password field and then you're supposed to be safe. Right. Like they don't crack your account by knowing your username first and getting the password later.

It's security theater. Let me use long passphrases and spaces and I'll be happy to go that way. But they don't care about real security, just the appearance of security.

I have the same issue noted by others. My banking accounts and other logins don't allow the same number of characters, or even types of characters (e.g. "must start with a number"), and the restrictions vary widely. Until we get more freedom of choice I can't use any one system of passwords or pass phrases.

I wonder how turning a long phrase into a shorter password would compare. E.g., turn the phrase "The hills are alive with the sound of music" into "Thaawtsom!", taking the first letter of each word to create the password. It's certainly not anything one would find in a standard dictionary.

Using lowercase, uppercase, numbers, and the more common symbols (lets go with 10) gives you 72 characters, and "Thaawtsom!" is 10 characters long. You do the math by taking the possibilities for each character (72) and raising it to the length (10), which gives you 72^10 similar combinations for 10 or fewer characters. Converted roughly to scientific notation, that's 3*10^18 combinations.

The passphrase, on the other hand, is a bit harder to calculate. If we just do something similar (a bit of a simplification) and are pessimistic and assume we can only use a dictionary of 2000 words, combined with the length of 9 words (ignoring the capital at the start), that gives you 2000^9, or 5*10^29 combinations of words of the same length or less.

5*10^29 is far, far more than 3*10^18. Both are equally susceptible to people guessing that your password will be a sound of music reference (if they try out the passphrase, they'll more than likely also try the password version)

For this example, just for curiosity sake, how small would we have to make the dictionary for the password version to be superior to the passphrase version? It would appear to be approximately 115 is the largest dictionary that wouldn't beat the password. The XKCD comic's calculations are based on far superior math to the (quick and dirty) calculations I did in this post, but this is just a quick illustration of how it plays out. Picking a common word/phrase will always screw over your security, even with tweaks to it.

As has already been mentioned in this thread, though, I'd say the biggest issue with security right now is the crazy requirements being put on passwords, primarily length limits. Putting a minimum length is quite reasonable... but a max length that a user could reasonably type in a few seconds is just absurd. It makes me worry that they aren't properly securing the password, possibly simply shoving it directly into a database as-is, without hashing it (with a large salt, and stretching!).