Updates on CyberSecurity, WordPress and what we're cooking in the lab today.

Avoid Malware Scanners That Use Insecure Hashing

In this post I’m going to discuss a major problem that exists with several WordPress malware scanners: The use of weak hashing algorithms for good and bad file identification. Some malware and antivirus scanners outside of WordPress suffer from this same issue.

For brevity, I’m going to refer to this as the “weak hash scanner” issue.

This issue may allow an attacker to hide malware that is undetectable to scanners using the MD5 hashing algorithm. Below I will explain how hashes are used in the security industry, what the problem is and how to solve it. I’ll also point you to research demonstrating this issue and further reading. I’ll also describe how Wordfence uses a secure hashing algorithm for our malware scanner.

How we use hashes in the security industry to find bad things

In the security world we have a commonly used process of running a file through a piece of logic, called an algorithm, and generating a unique number. That number is used to uniquely identify files. This process is called a hashing algorithm and the unique number is called a ‘hash’.

We use hashing algorithms for all kinds of really cool and useful stuff. We can take a piece of malware, create a hash for it and then store that hash. Later, we can create a hash of a file we’re scanning to check if it contains malware. If that hash matches the hash of the malware we created earlier, then we know the file is that malware.

We can also use hashes to identify “known good” files. At Wordfence we have created hashes of every file we know is safe in the WordPress universe. We have hashes for every theme, plugin and core release in WordPress history. In fact, we have hashes of every file in every version of WordPress core ever released and every version of every theme and plugin ever released.

Right now Wordfence tracks hashes for:

205,146 WordPress core files that Wordfence knows are safe.

5,967,361 WordPress theme files that Wordfence knows are safe.

23,527,261 – yes that’s 23 Million – WordPress plugin files that are known to Wordfence to be safe. This is every version of every file in every plugin ever released.

Hashes are a way for security companies like us to store a small piece of data that uniquely identifies known bad or good files, and then use that data to check if those files exist on a system we’re scanning. Then we can make a decision about whether to preserve the file or get rid of it.

The diagram below illustrates how malware scanners use hashing to identify good and bad files.

Not all hashing algorithms are equal: MD5 vs SHA-2

There are various ways to create a hash. When you run a file through one of these hashing ‘algorithms’, they create a unique number of a fixed length. MD5 is a hashing algorithm that was created in 1991 by Professor Ron Rivest at MIT. It was incredibly useful but is now quite old and has some problems.

Another newer and much more secure hashing algorithm called SHA-2 was developed by the National Security Agency and released by the National Institute of Standards and Technology in 2001. Today SHA-2 is widely used and considered secure enough for commercial use.

MD5 is quite old now and the problem with it is something called ‘collisions’. It’s easy to understand the issue: With MD5, it’s possible to create two different files that have the same MD5 hash, or unique signature. This could be used, for example, to fool a malware scanner into thinking a malware file is actually a known-good file.

That is why we use SHA-2 in Wordfence to track known good files. It prevents an attacker from creating a bad file that has the same hash as a known good file and avoiding detection.

The weak hash scanner problem

Unfortunately not all security products do this. In the WordPress space, some malware scanners uses plain old MD5 to hash files when searching for malware. Sucuri’s WordPress plugin and “Shield WordPress Security”, for example, use MD5 to detect core file changes. The way they do this is they grab the newest MD5 hashes from api.wordpress.org.

At Wordfence we use SHA-2 and this is one of the reasons we have created our own API endpoint that we use for malware scanning. Doing this allows us to use a cryptographically strong hash function to ensure that malware can’t evade detection by exploiting weak hash algorithms. We have been using SHA-2 since 2012, when the very first version of Wordfence was released as version 1.1.

In 2014 Nat McHugh showed how to create two different PHP files and two different image files with the same MD5 hash. This demonstrates the same concept in PHP – that an attacker can create a friendly file which becomes trusted and later replace it with a malicious file that avoids detection by MD5 scanners.

This research has actually been around for some time now. The attack is called a ‘chosen prefix’ attack on MD5 in the security industry. It first came to light in a paper in 2005 written by Xiaoyun Wang and Hongbo Yu at Shandong University in China in which they refer to it as a modular differential attack on MD5.

In 2007, Marc Stevens created an open source toolkit as part of his masters thesis which actually exploited this weakness in MD5. These tools are what were used by the researchers above to create different files with identical MD5 hashes.

This research demonstrates that it’s already possible for an attacker to exploit MD5 to provide a safe file and later replace it with a malicious file that will avoid detection by scanners using MD5. It may soon be possible to create a malicious file that shares the same MD5 hash as a legitimate WordPress core file. For this reason it is important that malware scanners avoid MD5 and use strong cryptographic hash functions to verify file integrity.

What to do about this

The goal of today’s blog post is to encourage two things:

If you are a customer of a security product, make sure your product is using SHA-2 or another secure hashing algorithm for malware scanning and other checks. If a product uses MD5, it risks being fooled into thinking a file is safe when it is dangerous.

If you are a security vendor and have not already switched to SHA-2 or a secure hashing algorithm, it’s time to do so now in the interests of your customer’s security.

It is a problem. We think they're using it for backward compatibility. They use about 8000 rounds of salted MD5 for password authentication. Increasing the number of rounds is called 'stretching' a hashing algorithm. It's an attempt at making a weak hashing algorithm more compute intensive and therefore harder to crack. Same goal when you're using salts.

Unfortunately it's incredibly fast to crack WordPress salted-stretched MD5. We do it many times every day using a custom built GPU cluster in our data center. You can find more about Wordfence password auditing here. The photo in that post is a pic I took of our actual GPU cluster. It's 8 liquid cooled consumer GPU's in a custom built enterprise grade chassis. Was a fun project and continues to be.

As far as I'm aware, Wordfence is the only scanner currently that is using SHA-2 for malware scanning. The scanners I analyzed for this article including Sucuri and the other one I mention, both use MD5 for scanning. I looked at iThemes security and didn't see any usage of SHA-2 but did see that they generate MD5 hashes, so while I didn't fully analyze their code (which is why I omit any mention of them) it looks like they're using MD5 too.

If you or anyone else is aware of another scanner besides Wordfence that uses SHA-2 for scanning (or another secure hashing algorithm) please let me know. As far as I'm aware we are the only scanner currently that uses a secure hashing algorithm for file comparisons.

WordPress only provide the api.wordpress.org API which is being used by many of these scanners. They created that API for WordPress updates, not for malware scanning. So it's being used for something it wasn't designed for.

So it's not really up to WP to fix this. The scanners should do what we did: Get your own servers, mirror the WordPress repository, then generate new secure SHA-2 hashes whenever an update occurs. Then use that API to do your scanning. That's what we've been doing since 2012 when we first launched. It's a lot more work, but it's the proper way to do malware scanning if you want to stay secure.

Another slightly tangential question. I'm interested in your thoughts on WordPress's nonce algorithm and implementation. Many times I use MD5 hashs in conjunction with salts and secret keys, instead of a WP nonce, to validate and expire links and form submissions. In this use case, would a WP nonce and MD5 hash provide the same level of authentication? Is one better than the other?

Our malware scanner, our Sucuri Firewall and none of our core services use MD5. That article over exaggerates the issue a bit and there is no risk with anyone using it. The only time we use MD5 is on our Free WordPress plugin to compare the current core files with the ones provided by the WordPress API. Again, there is no security risk on using it for this purpose.

Their WordPress plugin uses MD5 to verify the integrity of WordPress core files. As I mention in the article: "This research demonstrates that it’s already possible for an attacker to exploit MD5 to provide a safe file and later replace it with a malicious file that will avoid detection by scanners using MD5. It may soon be possible to create a malicious file that shares the same MD5 hash as a legitimate WordPress core file. "

So it's important they switch away from MD5 soon and use SHA-2 or another secure hashing algorithm to verify core file integrity.

There are many technical variables that needs to be considered here but let's not waste time and go straight to the point of the "issue".

1 - The article mentioned (http://natmchugh.blogspot.com.br/2014/10/how-i-made-two-php-files-with-same-md5.html) uses 2 files controlled by the person trying to create the collision, that won't happen if you are trying to replace a wordpress core file for instance, not saying it is impossible but there is no known attacks that allows someone to replace a WordPress core file and keep the same MD5". The WordFence article confused re-image attacks with the ability to create collisions on two files controlled by a possible attacker.

2- Overall, even if a re-image attack is possible, it will take a long long time and the only risk you will run is that our integrity checking will not detect a change on our free wordpress plugin. Our paid services are not affected by it.

Im short, you are secure with us, that is just another perfect example of what we discussed here: [Link redacted]

Yeah, as I mentioned in a previous post, we're direct competitors. I'm a little surprised seeing a security company come out in defense of MD5. I think the 1990's called and they want their algorithm back.

Seriously though, our post is very clear. There is a current attack that allows the creation of two different files with the same MD5 hash. It's also quite possible that a malware will emerge very soon that has the same MD5 hash as a core file, in which case Sucuri will be in a world of trouble that will take time and significant effort and investment to fix. If they want to stick to their guns and use MD5 to verify core files, that's their choice. We made a different choice and made it years ago.

Wow! Very interesting stuff. As always I am so grateful for your work and for educating people like me.
One thing I could never grasp, why someone using wordfence would use another provider like bulletproof security or as someone mentioned in a comment Ithemes security. Is there really a benefit to use both?

No there is not. Wordfence is the best firewall and malware scanner for WordPress by a significant margin these days. Our firewall alone is far superior to anything else on the market. Our malware scan is excellent we release new detection capability every week in real-time via our threat defense feed.

Thanks for the education. That was helpful. I have been moving all my sites to Wordfence. On some I have it running parallel with IThemes Security Pro, but I have never really compared the differences. I have also used Sucuri.

One of the things I appreciate from Sucuri is there Online Scan of any website entered into the scanner. I have run many potential client sites, and my sites, through that before proposing web design solutions. It would really be nice if Wordfence had something similar that was available for open use!

From Ron's response from Securi "The only time we use MD5 is on our Free WordPress plugin to compare the current core files with the ones provided by the WordPress API. Again, there is no security risk on using it for this purpose."

For Wordfence to recognise a short-coming in the MD5 back in 2012 when they started, put in the extra Hard Work to use a more robust (SHA-2) in the early days certainly shows the professionalism behind the Wordfence team and to be the best they can be. .. the best!

I have to admit I'm struggling to see this moving beyond theoretical risk into the real world.

In the linked research, the researchers have control in the creation of both "wolf" and "sheep". This is very different from using their toolset to masquerade as a pre-existing core OS file. Whilst I would hope that migrating off MD5 for fingerprints happens before malware is find with the same hash as 'ntdll.dll', for example, the research hasn't quite stretched as far as to say MD5 for fingerprints is as dangerous as it is for passwords.

Hi Neil. We expect malware to emerge in the very near future (several months to within 3 years) that has the same MD5 hash as a WordPress core file. The parallel computational power that a cluster of consumer GPU's provides is absolutely staggering. To put it in perspective, our own 8 GPU cluster that we custom built can perform computations at a rate of 36 Teraflops. That is more powerful than the most powerful computer on earth in 2002. Back then it was the NEC Earth Simulator in Japan at 35 Teraflops.

When you consider that Rivest developed MD5 in 1991, and the massive acceleration in development of parallel computing power in consumer GPU's, you begin to understand the challenge that MD5 faces when operating in a 2016 security environment.

GPU's are getting even faster. Attacks on MD5 are getting even smarter. Combine those two, and you have a recipe for disaster for anyone using MD5 for anything security related.

Great article. Just to continue my education... what makes SHA-2 so much more secure than MD5? And, as the power of computers seems to be expanding at a relentless pace (even quantum computers!), will this method become insecure in a few years (if not, what makes it different?) ?

And also... while I'm here... is there another *even more* secure hashing algorithm above SHA-2.

And one more thing (as Columbo would say), I really enjoy your blog posts. Keep up the great work.

Thanks for this update. This article is too technical for a layman like us to understand. Can we get specificr comparison on which malware scanner you dont recommend. I believe you guys did the same with cloudfare, and that was an excellent post.

It's good to see you are using what is today considered a sufficiently secure hashing algorithm. But with your start in 2012, anything different would just show contempt for security. After all, 2012 was also the year that Poul-Henning Kamp officially laid to rest his widely adopted, md5-based password library md5crypt (his farewell post under the title "Md5crypt Password scrambler is no longer considered safe by author" can still be found on his website http://phk.freebsd.dk/sagas/md5crypt_eol.html ). In 2012 it was already clearly visible that md5 had no future - at least not for security purposes.