Update. The documentation as well as this post as been updated recently.

Introduction

Password Score is a JavaScript library designed to give a realistic estimation of the strength of an arbitrary password. The strength of a password will be measured in the means of entropy. For estimating the strength of a password the library may rely on several data sources: dictionaries, common passwords, keyboards, first and last names and much more.

Password Score can be found on GitHub and includes documentation as well as a demo to score arbitrary passwords - try it out!

Entropy

Entropy is a term used in information theory. Usually it is used to describe the uncertainty in a random variable - i.e. a random experiment. In the context of this article we will use entropy as a measure for the strength of a password. Given a password $p$ with length $|p|$, we will define the entropy as $\log(n^{|p|})$ where $n$ is the number of possible characters and $\log$ the base-2 logarithm.

Given the entropy $H(p) := \log(n^{|p|})$ of the password $p$, we can calculate the maximum number of attempts needed for brute-forcing the password: $2^{H(p)} = 2^{\log(n^{|p|})} = n^{|p|}$.

Naive Approach

As described above the naive approach of scoring a password - i.e. estimating the strength of a password - is calculating its entropy. This is what the following code snippet does:

The problem with this approach is the user. Because the user will not choose every character of the password independent of the previous one. This means given a character sequence $p_1 \ldots p_k$ there are characters more likely to follow than others. As example consider the string helloworl. The probability of the next character to be a d is pretty high - especially if the user is familiar with programming languages and the english language.

Assumption

Given the password $p$, we assume that we know the form of the password. That is, we know how the password is made up. For example given the password david1992 we know that the first $5$ characters make up a first name while the following $4$ characters make up a year. With this knowledge the password is easily brute-forced because we would simply try all first names in combination with all years after 1900. So the entropy would be given by the base-2 logarithm of the product of the number of first names and the number of years after 1900. Of course there are a lot of possible first names, but we could try the most common ones first and crack this password in some minutes when using multiple cores in parallel.

Patterns

Knowing the form of the password $p$, we search $p$ for patterns. In the above example the first pattern is a first name, the second one is a year (or more general a date). In the course of this article we will stumble across different kinds of patterns like dates, english words, german words or country names.

Password Score will search a password for patterns. Using these patterns, Password Score tries to give a more realistic estimation of the password strength. The following patterns are considered:

Dictionary words: A dictionary may be every gathering of words - or strings in general. Most common we will use an english dictionary (or german or whatever language is used). But Password Score will treat every dictionary the same way. Thus, we may use a list of common passwords, first names, last names, city names or country names, too.

Sequences: Sequences are substrings of the alphabet or 0123456789.

Repetitions: Repetitions of single characters as well as repetitions of a group of characters easily increase the password's length - but not its strength.

Dates: Unfortunately, dates may be of many formats. A date may only be a year or consist of a year, a month and a day in some local format.

Keyboard patterns: On the list of the most common passwords, qwerty will be in the top 100. Why? - Because it is easy to remember on the keyboard. Using an adjacency matrix of an arbitrary keyboard, Password Score is able to identify these patterns.

Idea

After collecting all the patterns, Password Score will score these patterns. As patterns may overlap, Password Score tries to minimize the overall score of the password by dividing the whole password in disjoint (that is non-overlapping) patterns and taking the sum of their individual scores. As result, the strength of the password will be underestimated. But as we want to encourage the user to choose a strong password this can only be seen as advantage of this method.

Basic Usage

Password Score has no dependencies, however, the example provided with the documentation uses jQuery for visualization. Include Password Score as follows:

password-score-options.js provides the default options of Password Score including several dictionaries. The below sections give detailed explanations of the available configuration options, however, the default options are often sufficient:

English and german dictionaries;

Lists of last names, female first names and male first names;

Lists of countries (english and german);

A list of cities;

All dictionaries are checked for leet speak;

Password Score searches for dates, sequences and repititions.

To use Password Score, simply fetch a password and create a new Score():

var password = 'qwerty';
var score = new Score(password);

To get the actual entropy score using the default options use the calculateEntropyScore() method:

console.log(score.calculateEntropyScore(options));

This method accepts two parameters: a list of options, and whether these options should be appended to the default optiosn or replace them instead. After calling calculateEntropyScore(), Password Score will store some of the results in score.cache:

console.log(score.calculateEntropyScore(options));
// These are the patterns which contribute to the minimum entropy:
console.log(score.cache.minimumMatches);

Dictionaries

As mentioned above, Password Score can be configured to match against custom dictionaries the following way:

A dictionary is an object where the keys represent the words and the corresponding values represent scoring values used to calculate the entropy of the word when considered as pattern within a password. Beneath usual dictionaries for english or german, Password Score will benefit from lists of common passwords, first and last names as well as country or city names:

The scoring value will be used to determine the entropy by taking the base-2 logarithm. Therefore, the scoring value can be used to differentiate between common patterns and less common patterns - password is the most common password whereas d9ebk7 is not that common.

Leet Speak

Using a leet speak translation table, Password Score can search dictionaries for words which occur in leet speak within the password. This translation table looks like this:

Given a word in leet speak, Password Score generates a list of all possible substitutions using the translation table. All possible substitutions are matched against a given dictionary. To use this feature for a specific dictionary use:

Keyboard Patterns

qwerty will always be within the top ten of the most common passwords because it is easy to remember when using a QWERTY keyboard. A keyboard pattern is defined as path on the keyboard when considered as undirected graph. The QWERTY and QWERTZ keyboards are already provided. Password Score can be configured to search for keyboard patterns the following way:

The number of possible dates is dependent on the format. In general we take $31 \cdot 13 \cdot y$ where $y$ is the number of years being considered. When assuming $y$ to be too large we will not get any difference from considering a random eight (or six) digit number. Therefore, choosing $y$ to be around $100$ to $200$ will be a realistic choice.

Sequences and Repetitions

Password Score is able to search for number sequences and substrings of the alphabet (and does so per default):

References

ABOUTTHEAUTHOR

In September, I was honored to receive the MINT-Award IT 2018, sponsored by ZF and audimax, for my master thesis on weakly-supervised shape completion. For CVPR 2019, however, I am working on a different topic: adversarial robustness and generalization of deep neural networks.
18thOCTOBER2018 , David Stutz

What is your opinion on this article? Did you find it interesting or useful? Let me know your thoughts in the comments below: