Understanding Password Complexity

What does it mean to have a “strong” password? How can you be sure your password can’t be compromised? How do websites know when your password is complex enough?

These are questions I started asking myself in the beginning in February. And while it may sound trivial, I wanted to understand how those strength bars know how good my password is.

So let’s start with entropy. Entropy, in the Information Theory sense, measures how unpredictable text is based on the character set used and its length. And to answer these questions, we’ll need to look at password entropy, which measures the strength of the password generation. Meaning, the results from our calculations will signify how hard it is to guess a password under certain assumptions. This will all make sense when we take a look at how password entropy is calculated.

Let’s Talk About Math

Now let’s build out our function. We’ll need two variables - one for the length of our password (denoted as L) and another for the set of possible combinations with our dictionary (denoted as N). Dictionary, in this sense, means the character set used to generate the password, such as numbers (0-9) to create a PIN or your typical ASCII printable characters (a-z, A-Z, 0-9, and special characters).

So, to calculate the number of possible passwords we’ll need to raise N by the power of L, or NL. The is the foundation of our equation. (N.B.: If we increase L or N, we’ll increase the number of possible passwords, which will thereby increase the strength of our passwords generated. This will be demonstrated a bit later.)

Assuming that each symbol within the password is produced independently, we’ll use the binary logarithm, as the password (in the end) is in bits, on our foundation to determine the entropy. So:

With a little bit of manupilations, we can get the following result:

Taking a step back, our equation resembles a rough version of Shannon’s entropy, which determines the probability of characters being represented within the password based on the sum of each individual character’s entropy. In short, Shannon’s entropy has guided our evolution for our password generation entropy equation.

How Easily Can Your Password be Guessed?

Well, it depends on the dictionary your password uses. So let’s put our new found powers to use!

Let’s take for example your normal password requirements of 8 - 16 characters, requiring at least one capital letter, a number, and a special character. What’s the max entropy we can generate? Time to churn our equation. Our dictionary set is 95 characters, and we’ll assume the max values for the password length and that the characters are independently generated from each other:

Hence, the entropy of this password is 105 bits. Well, that’s all fine and well, but what does this result mean? I’m glad you thought that.

To calculate the results, we’ll use the following equation (you’ll notice it’s the inverse of our password entropy equation):

In 2012, a password-cracking expert unveiled a computer that could cycle through 350 billion guesses per second. So, as an extreme case, let’s use this number:

Geez! 105 password entropy seems relatively safe. Well, this is all on the assumption that this attack is happening offline (as most server-side securities prevents multi-failed attempts of password guessing) or cracking something locally encrypted, and this is assuming there isn’t a stronger password cracker out there now. Notice that the entropy is only stronger when the guesses per second is weaker.

So how much entropy is recommended for everyday use? Well, that depends on the application and its use. There seems to be a common census to use 80 bits of entropy for everyday use. Again, the stronger the better here. It should also go without saying that weak passwords should be excluded from your password generation.

How Can I Get a Really High Entropy Password?

Use a big dictionary. Remember that the more combinations your dictionary has, the stronger your potential entropy.

One of the biggest dictionaries with 7776 short words is Diceware. With this method, you could calculate a Diceware password yourself with five rolls of a dice. The end result of all these rolls will correspond to a word from Diceware’s dictionary (e.g., a 16656 roll corresponds to claw). These words are short so that you can easily remember them. As the XKCD comic eloquently put it:

Let’s take a moment though to create a table for the needed amount of characters to achieve a certain amount of entropy1:

Desired Entropy

Numbers (0-9)

Hexadecimal (0-9, A-F)

Case insensitive Latin alphabet (a-z or A-Z)

Case insensitive alphanumeric (a-z or A-Z, 0-9)

Case sensitive Latin alphabet (a-z, A-Z)

Case sensitive alphanumeric (a-z, A-Z, 0-0)

All ASCII printable characters (without space)

All extended ASCII printable characters

Diceware word list

8 bits

3

2

2

2

2

2

2

2

1

32 bits

10

8

7

7

6

6

5

5

3

40 bits

13

10

9

8

8

7

7

6

4

64 bits

20

16

14

13

12

11

10

9

5

80 bits

25

20

18

16

15

14

13

11

7

96 bits

29

24

21

19

17

17

15

13

8

128 bits

39

32

28

25

23

22

20

17

10

160 bits

49

40

35

31

29

27

25

21

13

192 bits

58

48

41

38

34

33

30

25

15

224 bits

68

56

48

44

40

38

35

29

18

256 bits

78

64

55

50

45

43

39

33

20

Clearly, the winner here with the fewest amount of characters needed is Diceware, but coming in second is ASCII characters as one would expect.

As measuring entropy relies on a randomly generated string. People are notoriously bad at generating random passwords. Our “randomness” will most likely come from things we use most often - like vowels. One analysis showed that over 3 million eight-character passwords, the letter “e” was used over 1.5 million times, where the letter “f” was only used 250,000 times. Ideally, if characters were evenly distributed throughout these passwords, each character would only be used around 900,000 times.