How Broken is SHA-1?

05 Jan 2014

Back in February 2005, SHA-1 was broken. The core of
what "broken" means in this context is described very well by
Bruce Schneier in his post announcing the attack:

If you hashed 280 random messages, you'd find one pair
that hashed to the same value. That's the "brute force" way of
finding collisions, and it depends solely on the length of the hash
value. "Breaking" the hash function means being able to find
collisions faster than that. And that's what the Chinese did. They
can find collisions in SHA-1 in 269 calculations, about
2,000 times faster than brute force.

This was a major concern to me. It turns out that it's best to avoid
SHA-1 in a variety of contexts,
including cryptographic keys. Nevertheless, a bunch of
systems that rely on cryptographic security use SHA-1, including
checksums for Git objects and OpenPGP key fingerprints.

Even prior to the 2005 attack, Schneier pointed out that "It's time
for us all to migrate away from SHA-1." It was on this basis that I
almost switched to using SHA224 in my latest software. Almost.

To understand why I stuck with SHA-1, let's take a look at how
cryptographic hashes can be attacked.

Collision Attacks

The attack documented in 2005, like most hash attacks, was a
collision attack. Schneier didn't use that phrase when describing
it, but that's what he describes when he talks about finding a pair of
inputs that hash to the same value.

Collision attacks are the most common kind of attack against
cryptographic hashes because they don't restrict the set of inputs over
which the search for such a pair might be conducted. The reason
collision attacks are relatively easy is because any pair of inputs
that hash to the same value are acceptable; no restrictions are placed
on what those inputs are, or what the value of the hash is.

Aside: The Birthday Problem

A related concept is the birthday attack, which is related to the
birthday problem. The birthday problem is simple: how many people can
gather at a party before it becomes probable that two share the same
birthday?

You can find the number fairly easily by calculating the likelihood
that, as the size of the party grows, two people won't share the
same birthday. Your first guest can have any birthday at all without
fear of sharing that birthday with another guest. But your second
guest can have any day except the day of the first guest's birthday,
leaving 364/365 possibilities. The third guest only has 363/365
possibilities, and so on. In imperative Python 3 code, it looks
something like this:

# The number of guestsguestcount=1# The chance of all guests having unique birthdaysuniquechance=1# Keep adding guests until the chance of them all being# unique is less that 50%whileuniquechance>0.5:uniquechance*=(365-guestcount)/365guestcount+=1print(guestcount)

This code returns 23, meaning that once you have 23 guests at your
party, it's more than 50% likely that two guests will share a
birthday. The number is surprisingly small because we haven't
specified any day at which the collision will occur, in the same way
that the 2005 attack on SHA-1 doesn't restrict collisions to any
particular hash.

Preimage Attacks

Unlike collision attacks (and the birthday attack), for many practical
applications of hashing, the value of the input and/or the hash
matters. When the input to the hash function constrains the search
for a collision, the hash must be broken using a preimage attack.
There are two types of preimage attack:

(Preimage Attack) Given a hash, generate an input that hashes to
it.

(Second Preimage Attack) Given an input, generate a second input
that hashes to the same value as the given input.

A preimage attack only requires that the attacker find a single input
that hashes to a given hash. Even so, it is a substantially harder
attack to mount than a collision attack.

A second preimage attack adds an additional constraint: given an input
and its hash, find a second input that hashes to the same value.
Because there are more constraints on the solution, this attack is
even more difficult.

So how broken is SHA-1?

So why did I stick with SHA-1? Well, the software I'm working on is
concerned with identifying public keys using a digest of the key
itself. When used in this context, such a digest is called a
fingerprint.

OpenPGP uses a SHA-1-based hash to generate the fingerprint for a
public key. One of the worst-case scenarios is that a user, given the
fingerprint for key, attempts to retrieve it from a remote source and
receives the wrong key, but with a fingerprint that matches, allowing
an attacker to read encrypted messages intended for someone else.
Since the attacker is given both a public key and its fingerprint and
needs to generate another public key that has the same fingerprint,
such an attack is a second preimage attack.

So, while SHA-1 is technically "broken" because of the 2005 collision
attack that reduced the search space by a factor of 2000, there are
still no known feasible preimage attacks (or second preimage attacks)
on SHA-1 that weaken it when it is used for key fingerprinting. I was
particularly hesitant to move away from SHA-1 for key fingerprinting
using OpenPGP because it is part of the standard, and I don't have any
interest in rewriting well-established cryptographic routines. I'd
rather reuse existing standards with an eye toward making them
accessible in new contexts. So, for now, SHA-1 it is.