Date: Mon, 12 Mar 2018 16:19:53 -0400
From: Matt Weir <cweir@...edu>
To: passwords@...ts.openwall.com
Subject: Submitting Partial Password Hashes to Pwned Password Lookup
Background:
With over 500 million passwords in Troy Hunt’s Pwned Passwords V2,
Troy and Cloudflare have partnered to provide an API lookup so sites
don’t have to download the full list. Here is an example:
https://api.pwnedpasswords.com/range/21BD1
Users send the first five characters of their password hash to the
pwned passwords server, and the server returns a list of all the
hashes matching it. The user can then look to see if their full hash
is in the list. The most prominent tool to use this capability so far
is 1Password, though it’s starting to pop up in other places as well.
Reference links:
https://www.troyhunt.com/ive-just-launched-pwned-passwords-version-2/https://www.troyhunt.com/i-wanna-go-fast-why-searching-through-500m-pwned-passwords-is-so-quick/https://blog.cloudflare.com/validating-leaked-passwords-with-k-anonymity/https://blog.agilebits.com/2018/02/22/finding-pwned-passwords-with-1password/
Concerns:
I just want to start by saying that I have the highest respect for
Troy, Cloudflare and 1Password. Full disclaimer, I’m a happy user of
HaveIBeenPwned and 1Password. As an additional disclaimer, in the past
I’ve argued for investigating the use of partial password hashes to
protect users from server compromises (link:
http://www.openwall.com/lists/crypt-dev/2012/12/12/3 ).
The question that is raised by this approach is, “what is the risk to
end users?” One way to grapple with risk is to use the “cyber attack
lifecycle” methodology. Breaking out the tuple that is required for a
successful online attack, (usernames, passwords, sites), the chance
that all of them being met by an adversary exploiting this service who
couldn’t obtain them in another way is likely manageable. My concern
stems from the fact that this type of approach to risk management
hasn’t been how the security has been framed. Walking through the
steps required for a successful exploitation can bring up additional
security checks. For, example: Looking at the problem this way raises
the question “Are all of the replies from the pwned passwords service
padded to the same size to protect against passive sniffing attacks?”
Instead the conversation has focused on the idea of k-anonymity.
To be blunt, I’ve become more and more convinced that k-anonymity is
not the way to model the security of this system. The best analogy I
can think of is the past use of Shannon entropy to measure password
strength. I’ll agree that longer passwords “on average” are stronger
then shorter passwords, but we’ve all seen Shannon entropy used to
justify some completely unfounded security claims. The same goes for
k-anonymity where a focus on that property can potentially lead to
some undesired outcomes.
Justification:
I’ve talked with several other expects and we all struggled with how
to apply k-anonymity to this problem.
1) One could argue that k-anonymity is being applied to the pwned
passwords list. The username and sites associated with password hashes
have been stripped. Unfortunately, the raw lists are available for
most researchers/attackers if they know where to look. That’s how they
made it into the list in the first place.
2) Likewise, it doesn’t fit to say that k-anonymity is being applied
to the user submissions. Since the attacker knows the user, (or
site/IP), which is doing the submission, it isn’t anonymous. Yes,
their query leaves open collisions with multiple passwords. But this
better resembles a classic data leakage issue vs an anonymity issue.
To put it another way, if the user was submitting the first character
of their plaintext password (don’t do this), we’d model this as a data
leakage/keyspace reduction vs a k-anonymity problem.
This is a longer way of saying I'm doubtful that modeling the risk via
k-anonymity tells the defender the security risk of potentially
leaking the data about the first K characters of the users password
hash. My gut says other methods have a better chance for this such as,
Honeywords, password modeling techniques like PCFGs, and ironically
enough Shannon’s Entropy.
This e-mail has already grown too large as it is, but I’d be
interested in other people’s thoughts on this subject. Am I
misunderstanding the use of K-anonymity? How should we look at the
security of this approach?
Cheers,
Matt