Pages

Wednesday, November 5, 2014

Colliding MD5 Images go crazy

I nearly chocked on my cornflakes while looking at blog stats yesterday morning to see several thousand page views in the morning. Looking at the referring URLs it seemed a link had been posted on Hacker News. I browse that site most days. It was on the front page at number five, anything on the front page is BIG. Later on it got posted to a couple of reddit.com subreddits. I was getting a lot of traffic, all quite unexpected.

Originally I wrote the blog post for a couple of reasons, firstly to aid my own memory about what I had done and how to repeat it and secondly to explain to a few of my friends how I made the images which I had already put on twitter. I didn't really see it as Hacker News material, anyone who has an interest in hash functions will have done this I thought.

Here are my answers to a couple of things people have said

This isn't new

Agreed, it certainly is not. All the papers and software I referenced and used are at least four years old. The original block buster attack by Wang is ten years old. Even before that in 1996 just four years after its release MD5 was known to be weak at not recommended.

My guess at what caught peoples attention were two things:

Images are much easier to comprehend as collisions than random byte strings.

The fact I was clearly a rank amateur with nothing more than an AWS account and basic knowledge.

Can you do the same thing with SHA1 and SHA2?

No not really, in the case of SHA1 there are theoretical attacks which work the same way that mean it should not be used either. Marc Stevens has published an attack which he estimates to have a complexity of 261 compression functions for same prefix and 277 for chosen prefix collisions. In fact HashClash contains some code for finding differential paths and near collision blocks for SHA1. However to come up with a collision would take vast computing effort and (unless you control a botnet) expense. I'm sceptical of the exactness of the numbers in here but certainly that order of magnitude. As if to prove this fact there are no published collisions in the full SHA1. So with those sort of numbers I'm not got to stick it on an AWS instance backed by my credit card.

SHA1 seems to be surprisingly resistant to differential analysis attacks this is one reason pretty much nobody has moved to using SHA3 yet. I like SHA3 or Keccak as it is also known.

SHA2 looks like even more of a challenge than SHA1 for differential analysis.

Is this practical?

Hopefully not, no one should be using MD5 for anything. However, old habits die hard and once upon a time MD5 seemed like a fast and secure hash function. You can still find it in use in package managers, download pages and is probably used server side in many applications to verify file uniqueness. I haven't found anywhere still using it in SSL certs following flame.