Random pronounceable passwords

Simon Sapin,
2011-02-11

It is often advised that passwords should should be long (8 characters
is considered good) and contain various kinds of characters (not just lower-case
letters.) Such a password is stronger against dictionary or brute-force
attacks.

The strongest password would be a completely random one. Generating
one is quite easy:

$head-c12/dev/random|base64RU0aq07R9ZVK8LR1

However, such a password is very hard to memorize (at least for me.) It is also
not so easy to type.

I find that I remember words (and people’s names!) much better if I know
how to pronounce them. They
do not have to be pronounced out loud, I just remember the sound it would do
more than each individual letter. This means that a “pronounceable” password
would be
much easier to memorize (at least for me, again.) Most words in most languages
are easy to pronounce so we could just pick one, but that’s a very weak
password against dictionary attacks. We need something random.

So, what is pronounceable? Real words that are hard to pronounce often have
many consecutive consonants. We could just alternate consonants and vowels,
that’s easy enough:

importrandomdefpronounceable_password():# I omitted some letters I don’t likevowels='aiueo'consonants='bdfgjklmnprstvwxz'while1:yieldrandom.choice(consonants)yieldrandom.choice(vowels)print''.join(itertools.islice(pronounceable_password(),14))

The Japanese language is made of a well-known set of syllables (sounds),
most of which consist of a consonant followed by a vowel when romanized
(written in Latin alphabet.) This is why Japanese is mostly easy to pronounce
for westerners, but many foreign words are distorted in Japanese. For example,
they use the international word “taxi”, but it’s pronounced more like
ta-ku-shi.

Anyway. Using Markov chains,
we can generate text that “sounds” Japanese. Markov chains have many
interesting mathematical properties but the basics is that they represent
a system that transits between states, and the next state depends only on the
current state and not the past. In other words, for text, each character has
a probability of being chosen that depends on the previous character.
To determine these probabilities, we look at pairs of consecutive characters
in a sample text.

classMarkovChain(object):def__init__(self,sample):self.counts=counts=defaultdict(lambda:defaultdict(int))forcurrent,nextinpairwise(sample):counts[current][next]+=1self.totals=dict((current,sum(next_counts.itervalues()))forcurrent,next_countsincounts.iteritems())defnext(self,state):nexts=self.counts[state].iteritems()# Like random.choice() but with a different weight for each elementrand=random.randrange(0,self.totals[state])fornext_state,weightinnexts:ifrand<weight:returnnext_staterand-=weight

This is subjective, but I like these better. (Could be because I’m learning
Japanese.)
Maybe considering the 2 or more previous characters instead of just one would
yield better results. This is left as an exercise for the reader ;)

This algorithm produces passwords with only lower-case letters which is
generally considered a bad idea, but this is compensated by the length.
It also makes the password easier to type.

If we mix 26 lower case letters, as many upper case, ten digits and a dozen
of other symbols, that’s 72 possible characters. Picking 8 of them at random
gives 728 possible passwords, or about 49 bits of
entropy.
It is possible to calculate the exact entropy for a Markov chain, but the math
is non-trivial. I guesstimated that this pseudo-japanese is about the same
entropy as alternating 15-something consonants with 5 vowels. So for
14-characters passwords, that’s 157 × 57 possible
passwords or about 43 bits of entropy; which I decided was good enough for me.