I often see passphrase suggestions written as a sentence, with spaces. In that format are they more susceptible to a dictionary attack because each word is on it's own as opposed to a large unbroken 20+ character 'blob'?

5 Answers
5

The passphrases "correct horse battery staple" and "correcthorsebatterystaple" are equivalent entropy-wise. Choosing to put spaces in an incorrect spot or sometimes including spaces and sometimes not including spaces will give you a few extra bits of entropy; but its not worth it for the extra difficulty remembering it. You'd gain a few bits of entropy for the entire passphrase for weird spacing pattern; while just adding another word would add about 13 bits (assuming a diceware dictionary of 7776 words corresponding to 5 rolls of a six-sided dice; note that lg(65) = 12.92; lg being the base-2 logarithm). (There's no disagreement between my answer and Thomas's; an attacker would have to check for passphrases both with and without spaces unless he had extra information about whether you tended to use spaces in your passphrases).

Beware the distinction between random words and meaningful sentences. A passphrase "quantum mechanics is strange" is much lower entropy than say "heat fudge scott canopy"? Why? In meaningful English you have patterns like certain words combine frequently (quantum mechanics) or certain patterns must appear to be grammatically correct (subject, predicate, subject complement) that in principle could be exploited by a sufficiently sophisticated attacker (even though I am not aware of any cracking algorithms that currently utilize this). The informational entropy of grammatically correct written English is about 1 bit per character so the first passphrase has ~30 bits [1], while the second passphrase has about 4×12.9 ≅ 52 bits of entropy; so would take about 222≅4 million times longer to crack.

Be wary of incorrect analyses like http://www.baekdal.com/insights/password-security-usability that make many fundamental information theory mistakes. E.g., "this is fun" is incredibly weak being comprised of some of the most common English words in a syntatically correct sentence that is very common ('this' ~ 23rd most common; 'is' ~ 7th most common; 'fun' ~ 856th most common word) [2]. If you tested just three random words from the top 1000 english words, it would take you only 1 second to crack it, assuming a modern GPU and you have acquired the (salted) hash. This is roughly equivalent to a 5 random alphanumeric characters (not counting special symbols). If you search google for the quoted phrase "this is fun" it appears 228 million times.

EDIT: Minor exception: in the rare case when consecutive words in your passphrase form another word in your dictionary (or your attacker's dictionary), then not having spaces (or another separator) between words lowers your passphrase's entropy significantly. For example, if the random words forming your passphrase were "book case the rapist" and you had no spaces, an attacker could get in by trying all combinations of just two words 'bookcase therapist'.

"in principle could be exploited by a sufficiently sophisticated attacker" is there any actual evidence of this? I always view such claims rather skeptically since there are so many ways to construct a sentence.
–
Jeff Atwood♦Jan 20 '12 at 20:01

@JeffAtwood - I'm not aware of these attacks in the field, where one iterates over computer-constructed meaningful English. The possibility is there as the intrinsic randomness of the phrase is (relatively) low and comprable to things that are crackable. I agree this sort of attack would be more difficult to construct than simple brute-forcing a character set or dictionary attack; but it wouldn't surprise me if NSA/others have done research on these line.
–
dr jimbobJan 20 '12 at 20:54

1

@Jeff, DrJ - I believe this loss of entropy is used in text-message autocompletion assistance in many/most cellphones.
–
Ed StaubJan 20 '12 at 20:59

@EdStaub, Jeff - google auto-complete can (for my content bubble) figure out "quan tum m echanics isstr ange" by only typing in the bold letters, while can only fill in the ge of "heatfud ge scott canopy" in the diceware passphrase, so its at least computationally/algorithmically feasible to reduce it's complexity down significantly (to ~10 chars in patterns that can start words). I doubt most password cracking algorithms would reach a bizarre low-entropy password like 666666555554444333221, but that's security by obscurity vs utilizing true randomness.
–
dr jimbobJan 20 '12 at 21:19

Spaces in a passphrase add entropy exactly insofar that they could not have been added. An important point is that an attacker cannot test for a partial match on a password; contrary to what Hollywood movies tend to suggest (in a most graphic way), there is no such thing as a "partial decryption" (where the text is partly legible, but blurred) or a "partial password". The attacker has the exact expected password, down to the last comma, or nothing at all. This is a login system, not a game of Mastermind.

For instance, suppose that you make passwords by randomly selecting four words in a list of 2048 "common" words, and appending them (the "correct horse battery staple" method). We assume that any attacker knows that you are selecting passwords that way (e.g. that's the "official password selection method" promulgated by the sysadmin). How much entropy is there in such a password ? That's easy to compute (assuming you are really selecting things "randomly", with dice, not with your brain): there are 2048*2048*2048*2048 = 244 possible passwords, which all have the same probability of being selected. Hence, 44 bits of entropy.

Now, suppose that the selection process also states: "You shall concatenate the four words without any space". There are 244 possible passwords, so 44 bits of entropy. Assume now that the rules say: "You shall always put a single space between two words". There still are 244 possible passwords, so still 44 bits of entropy. But suppose that the rules say: "you shall either separate the words with spaces, or concatenate them all together (throw a coin to decide one way or another when you choose your password)", then there suddenly are 245 possible passwords (still with equal probability): entropy is now 45 bits.

Even more generically, if the password selection process entails throwing a coin three times, to decide for each slot between two words if there should be a space or not, then entropy rises to 48 bits. But note that this is not "free": you get more entropy, but you have to remember more, too (namely where you put the spaces).

On a practical note: on a typical keyboard, the space bar, when pressed, emits a slightly different sound. If your office colleague has a keen ear, he may notice whether you use it or not, and possibly at what places. Also, your colleagues perfectly knows the password selection rules which are advertised in your organization, since he is, by definition, in the same organization than you. So I would advise against using spaces as source of entropy. Especially if you use the "four words rule" and not all words in the list have the same length: the long-eared colleague may deduce the length of each word by hearing the spaces when they are typed.

The simple answer is yes, but not very much. Think about the character space - if you are looking at alphanumeric including upper and lower case that gets you 62 chars (a-z, A-Z, 0-9). Adding {space} means 63 chars so you have improved by 1/62

Contrast that to adding an extra character which increases your entropy exponentially.

You answered for a password that randomly samples characters from a 62-charset (my answer assumed spaces in a passphrase like let me in vs letmein). The sample space of a random N-character password with spaces would be (63/62)^N ~ 1 + N/62 times more secure than one without spaces; e.g., a 8 character password would be about 1.14 times more secure if spaces were allowed. Adding an an extra character would make the password (62^(N+1))/(62^N) = 62 times more secure.
–
dr jimbobJan 20 '12 at 23:34

@drjimbob: When let me in becomes a possibility as well as letmein entropy has increased by definition.
–
GregSJan 21 '12 at 21:05

@GregS - Sure. If you forbid a user to use spaces in their passphrase, you've cut down the attackers checking time. But if user A has passphrase correct horse and user B has correcthorse, you can't say user A or B's passphrase is stronger (while user C with correct horse battery staple) is higher entropy due to the extra two words.
–
dr jimbobJan 21 '12 at 21:58

well, to be fair - correct horse is slightly stronger than correcthorse, but only slightly.
–
Rory Alsop♦Jan 22 '12 at 11:02

You can't use a dictionary attack, because you can't try each word individually. This is a pass phrase. You have to get it all to get it right. I'm sure your dictionary might contain "mike", but it doesn't contain "mike is the smartest man alive". Those are two different passwords. Adding spaces adds more characters. The more characters the more combinations the more entropy.

This is misguided. If you say have a (salted) sha1 hash of 'mike is the smartest man alive' being written meaningful english has an information entropy of about 30 bits (e.g., about 2**30 possibilities) and so a single modern GPU with a well written routine should be able to break it in about 1 second.
–
dr jimbobJan 20 '12 at 16:17

I realize what you were saying now. Could you direct me to an example of a well written routine that assumes common English uses when guessing passwords?
–
k to the zJan 20 '12 at 17:10

"The entropy rate of English text is between 1.0 and 1.5 bits per letter, or as low as 0.6 to 1.3 bits per letter, according to estimates by Shannon based on human experiments." 1. You have an extra trillion in your math; your rate corresponds to sample size of 1.4 x 10^51. Even the most naive analysis of the password 30 random characters from 27 char set is only 27^30 ~ 9x10^42 ~ 142 bits. But that's an absurd way to compare -- the letters weren't random.
–
dr jimbobJan 20 '12 at 17:21

The letters weren't chosen at random; they are full English words - all of which appear in standard diceware dictionaries. So even if the words were all randomly chosen you'd only have an entropy of ~78 bits; which according to diceware faq should be within the range of large organizations to break by 2014. But that was for six random words would like like 'greta tort cocky sewn cult river', which is much higher entropy than a meaningful English phrase like 'mike is the smartest man alive' (that can be found on google).
–
dr jimbobJan 20 '12 at 17:22

It depends. Of course it matters what the circumstances are in detail.

If the attacker does not know, that you're combining words, and does a brute force attack over the whole space of possibilities, the blank increases the alphabet.

If the attacker knows or assumes, that you're using a sentence or a list of words, but does not know, whether you use a blank or not, the number of sentences with or without blank is nearly twice as big, as only without blanks. bookcasebook case as mentioned by dr jimbob being the exception of course.

Twice sound much, but if you think about it - an attacker which does a brute force attack over 1 month will probably do it for 2 months too.

The question of sound is an interesting thought. I have another one:

If you insert your password in a Textfield of a GUI, maybe the browser, a blank should always work. But if you happen to insert it from a shell, in an indirect process:

foologin -u JoeDoe -p you will never guess it

might not work, because shells tend to use blanks as separators. You would need to use

foologin -u JoeDoe -p "you will never guess it"

My idea is now, that the error might help an attacker, to guess the reason for the error, and therefore to leak information of the way, the password is build. There might be a rule to use at least 8 characters, and "you" happens to have only 3, so people, using sentences without masking, might be typical victims of this error message.

Of course this is very speculative.

Using blanks may, on the other side, help you to memorize a very long pass phrase. You may, as a French user on an US site use a Spain sentence, insert foreign names like Kolmogorov and spelling errors you remember easily, which will be hard to plan for in brute force attackss. :) Of course, you can do that without blanks too.