With passphrases becoming more and more common based on length being more important than complexity, I'm assuming there must be some work going on involving techniques aimed specifically at cracking / auditing passphrases which would differ to some degree from the traditional methods for attacking passwords, perhaps involving Natural Language Processing, grammar, syntax, stemming, etc.

I realize there are different kinds of passphrases and that those invented by humans should be more susceptible to attack than those randomly selected from set wordlists such as suggested in a recent XKCD web comic.

Are there some resources I can read on the web to get myself up to date on what people are trying and having success with?

You question is inbetween current practice and theory, so I'm not sure you will get an acceptable answer. For current practice it is moderatly easy to find out what the good gals/guys are using and moderatly difficult to find out what the bad gals/guys are using. Most password breaking theory I have encountered looks for side channels first to reduce the keyspace to a managable size and then brute force the managable size. I havn't seen any theory about word phrases.
–
this.joshAug 16 '11 at 20:41

4 Answers
4

I don't know how "state-of-the-art" this is, but the password recovery software Passcape uses a dictionary attack against passphrases. This is very similar to a simple brute force character-permutation attack against a password, except permutations of dictionary words are used instead (once common phrases and quotes from movies, books, and poems are exhausted).

As it says in the linked article:

It wouldn't be an overestimation to say that 99 percent of the success in the recovery of a password with a dictionary attack depends on the quality of the dictionaries.

As such, it seems like a simple defense against this type of attack would be to include words that wouldn't appear in any dictionary. Mis-spell words, leave out vowels, substitute letters, etc.

Yes there is definitely more scope for making a passphrase difficult to guess for the cases when you get to choose your own passphrase. That doesn't mean that most people won't continue to not do this and therefore have passphrases with rather low entropy that can be exploited.
–
hippietrailAug 16 '11 at 18:20

You have previously asked for some documents regarding using Markov Chains to perform bruteforce attacks, and I'm still assuming you are looking for techniques to find the low hanging fruits.

There is no obstacles to perform a bruteforce attack using markov chains against passphrases. You just use words instead of characters as states, in what you could call a finite state machine, which a markov chain really is.

Passphrases consisting of random english words. (There are approximately 600.000 of them!, but to be realistic, we only choose 10.000 that would be easy to remember...). This gives a passphrase the entropy of f(n) = 10000 ** n. Comparing a 4 word passphrase to a 8 character long password (using only letters and digits), the passphrase is 45 times harder to crack by pure bruteforce (given the attacker and victim is using the same dictionary).

But, back to the low hanging fruit. Given that a user use sentences rather than random words, the use of markov chain to generate sequence of words, would produce the most likely combination of words that are used to form sentences. Grammar, syntax, stemming and such would all be included.

Given that we have no information about wheter users tend to use random words or sentences for passphrases. It is hard to decide which method would be most effective (dictionary bruteforce or markov chain bruteforce). This is because a markov chain bruteforce probably never will be able to guess a sequence of words that is not included in its training.

Indeed. There could be some tweaks to Markov models expressly to fit this problem. For instance generating longer phrases internally but leaving out small grammatical stopwords. But it's such a large area it would be hard to know where to focus without a resource of leaked/hacked passphrases to see what people really choose in practice.
–
hippietrailSep 6 '11 at 12:04

The fact that there are a near-infinite number of possible sentences is (I think) a basic property of human language. So a passphrase containing at least one unusual (or perhaps made-up) word would be, for a generator using Markov chains of common words, like a password with unknown characters. It would force the attacker back into character-by-character bruteforcing.
–
Nathan LongDec 7 '11 at 19:35

@NathanLong, you can generate made up words that are never written anywhere before using markov chains. But as you train the markov chain with an dictionary containing all infitive words, the markov chain will it self become a bruteforce generator...
–
Dog eat cat worldDec 21 '11 at 15:41

The authors use Natural Language Processing over some large word corpora to come up "tag-rules", which are sequences like Pronoun, Noun, Adjective or Proper Noun, Verb, Adjective, Adjective. They use the same corpus's to make dictionaries of words tagged by parts of speech. Then do a brute force search using the most common tag rules and dictionary words.

They compared the effectiveness against crackers like John the Ripper and Hashcat. Their approach was comparable to existing crackers, but much more memory friendly. It also cracked ~10% of passphrases not cracked by any of the other crackers. The authors limited the number of attempts to 40E12, which is pretty low, so I'd expect with more time their algorithm would do significantly better than traditional character based crackers and rule sets.

Insidepro.com's PasswordsPro allows you to do that with their "Combined Dictionary Attack": During this attack, passwords are made of several words taken from different dictionaries. That allows to recover complex passwords like "superadmin", "adminadmin", "secretpassword", "supersecretpassword", etc.