Updates on CyberSecurity, WordPress and what we're cooking in the lab today.

Gravatar Advisory: How to Protect Your Email Address and Identity

Update: We’ve added comments at the end of the post pointing out that the National Institute of Standards and Technology (NIST) considers an email address to be personally identifiable information or PII.

Gravatar is a service that provides users with a profile image that can appear on many sites across the Net. It is integrated with WordPress.com (The version of WordPress hosted by Automattic) and is also integrated into WordPress.org, the self hosted version of WordPress. Gravatar is also used by many other popular services on the web like StackOverflow.com.

If you sign up for a website on WordPress.com and publish a blog post, a Gravatar icon appears on your site as your profile photo, indicated by the red arrow below. You can visit gravatar.com to customize that icon and upload a photo of your own.

If you use WordPress.org, Gravatars are an option you can enable for your users and they are widely used. It will either show their profile photo if they have gone to Gravatar.com to create one, or it will show a default image. You can select from several kinds of default images.

Other services like StackOverflow, one of the most popular sites on the web, also use Gravatar for profile images.

Even if you haven’t signed up for a custom profile image at Gravatar.com, a hash of your email address still appears in the source code of any website that integrates this service.

You can see in the screenshot below how Gravatar loads your profile image using a hash of your email address:

The value that appears after /avatar/ above is: fe967ccdc7b3caa33e0480bb95ae6588
That is a number (in hexadecimal) that is a hash of the email address that I used to create a WordPress.com website. The email I used is gravhashtest@wordfence.com.

I can run a PHP instruction to verify that. If I run the following PHP code, it produces the above hash:

&lt;?php
echo md5('gravhashtest@wordfence.com');

This prints the value: fe967ccdc7b3caa33e0480bb95ae6588

Using Gravatar and GPU cracking to steal email addresses

If I want to steal a lot of email addresses, I need to turn those hashes back into email addresses somehow. If I can figure out a way to do that, I can crawl wordpress.com, all the self-hosted wordpress.org websites and a lot of other services like StackOverflow and harvest a huge number of email addresses for spamming. I may also be able to reveal the email addresses of people who want to remain anonymous.

Then in 2013 Dominique Bongard presented a talk at PasswordsCon in Las Vegas where he demonstrated that he could reverse engineer 45% of Gravatar hashes into email addresses. He targeted a well known political forum in France which uses Gravatar for user profile pictures.

The big difference in Dominique’s approach is that he used Hashcat, which is a password cracking tool. He repurposed it so that he could reverse engineer Gravatar hashes into email addresses. The reason this is important is that Hashcat executes significantly faster because it uses consumer graphics processing units, or GPUs, which are used by gamers to accelerate game graphics performance. Cracking hashes with GPU acceleration increases performance by a factor of several thousand.

At Wordfence we have done a significant amount of experimentation with GPUs and hash cracking and we even provide a commercial service as part of Wordfence Premium that uses a GPU cluster to perform a password audit on your WordPress website. We launched this service over a year ago. The photo below is the password cracking cluster we designed for this service. Those are liquid cooled chrome GPU pipes in the photo. They look even better in real life.

When Dominique did his talk in 2013 on using Hashcat to turn Gravatar profile hashes back into email addresses, the Nvidia GeForce GTX Titan GPU was released which provided 5045 Gigaflops of processing power.

In May of his year Nvidia launched the GeForce GTX 1080 which comes with 8873 Gigaflops of processing power. In just two years the amount of processing power that is available has almost doubled.

When you consider that 2 years ago a single researcher reverse engineered 45% of gravatar profile photos into email addresses, it’s quite possible that a criminal group armed with a modern GPU cluster, as shown above, could reverse engineer a far higher percentage today. The problem will only get worse.

Email hashes may expose your identity across the Web

The use of email address hashes has a further problem. If you view the source of a website using Gravatar profile photos, extract the hash and then google that hash in quotes, you can find other websites and services that are used by the individual you are researching.

For example: A user may be comfortable having their full name and profile photo appear on a website about skiing. But they may not want their name or identity exposed to the public on a website specializing in a medical condition. Someone researching this individual could extract their Gravatar hash from the skiing website along with their full name. They could then Google the hash and determine that the individual suffers from a medical condition they wanted to keep private.

To demonstrate this issue, we have created the form below which you can use to do a Google search of the MD5 hash of your own email address. We don’t log anything. This simply uses pure javascript to open a new window or tab with a google search of the hash of your email in quotes. Enter your email in the text field below and click the link to do the search. You should note that Google doesn’t index all Gravatar hashes because they appear in page source. But you may find a few interesting results that help illustrate the problem.

The above can be used to Google an MD5 hash of anything. Try entering in your domain name or common passwords (not passwords you actually use). Let us know what you find in the comments.

What to do to protect your email address and identity

To solve the identity and spam problem that Gravatar presents, the most effective option is to use a unique email address to register on each website you are a member of. The email address should be hard to reverse engineer.

If you use an @gmail.com address, Gmail provides a feature whereby you can append a plus sign to your email address and anything after it is ignored. If your email address is yourname@gmail.com, you can change it to yourname+junkGoesHere@gmail.com and you will still receive the email.

Using this technique makes it much harder for a spammer to reverse engineer your email address from a Gravatar hash. Try to make your email address at least 20 characters long and include upper and lower-case letters and numbers in the suffix after the plus sign. If you have uploaded a custom Gravatar profile image, you should note that this has the side effect of not displaying that image on the websites where you make this change. Instead you will get a default profile image.

Receiving extra spam is an inconvenience. It can be a minor inconvenience if you have an excellent spam filter in place. However, having your identity exposed on a website where you assumed your identity was private can be embarrassing at best and have far worse consequences. We therefore suggest that you switch to using a plus-suffix on any website where it is important to maintain your personal privacy.

What should Gravatar do?

This presents a significant challenge for a service that is as widely used as Gravatar. They can’t simply upgrade their own systems. Web applications that have integrated Gravatar rely on the fact that they can request an image with an MD5 hash of a user email address and get a profile photo in return. These applications all need to be updated too, and there are thousands – quite possibly tens of thousands of them.

Even if Gravatar switch to SHA-2 or a longer and stronger hashing algorithm, they are still vulnerable to GPU accelerated email cracking attacks. The identity problem will also still exist.

They could consider switching to a more computationally intensive hashing algorithm like bcrypt. That would provide significant resistance to reverse engineering. But it comes with the obvious cost that it is computationally intensive. Gravatar need to generate a lot of hashes to provide the service they do. Developers who integrate Gravatar into their products also need to generate hashes from email addresses. Both will suffer from increased resource usage if they start using bcrypt. It also doesn’t solve the identity problem.

There are other options available like using a shared secret between developers and the Gravatar servers to generate hashes. These come with their own implementation challenges and performance implications. This option may solve the identity issue because it could generate unique hashes across websites that are also hard to reverse engineer.

A final option is to switch to locally hosted images and move away from hashes or global unique identifiers of any kind. This will introduce more complexity for developers who want to integrate Gravatar into websites, but has the benefit of doing a better job of protecting user privacy and avoids disclosing email addresses.

Further comments on privacy

This is a complex problem and there is unfortunately not an easy fix for Gravatar. In my opinion, the most important issue here is the potential exposure of user identities. I think the medical example that I provided above illustrates how much damage can be done if a user identity is exposed under certain conditions.

That is why the privacy implications of this problem cause the most concern. If you aren’t particularly technical you may simply trust a website owner who says that your full name and personal information won’t be exposed. With the current way Gravatar works, you run the risk of having that information exposed.

As always I welcome your comments below and will respond as time permits.

Very informative. Thanks! The suggestion to use the + in theory makes sense, but I wonder if the spammers can also catch on to that and simply create a formula to remove content after the + in an email which would render this strategy no longer helpful. (Says the person who does NOT understand programming code and formulas like this.)

No that wouldn't work Kassy. They would need to reverse your email address WITH the data after the plus sign first, before they could remove that. Adding that data after the plus sign for any website profile, where the website uses Gravatar, significantly increases the difficulty of grabbing your email address.

Thanks Sam. I got a handful of results for two of my 3 addresses. Also got none for one I've used a lot. I don't think the Google search is very effective. I included it as a curiosity. If someone wanted to expose identities, they would target the websites directly and wouldn't rely on Google search results.

Another suggestion to prevent cross-referencing of Gravatar usage is to make sure that your real identity is never used with your Gravatar image. It is of course still possible to link your interests and exposed details, deduce what sort of person you are - as major search engines do already -and guess who you might be.

i use my gravatar image on a lot of pages (and its not connected to the above email-adress...). Using your offer to google the MD5 doesnt list any of the sites...
The hash given for my email adress matches the one that is shown on gravatar and on all sites i checked for the existence of my gravatar logo...

Thanks, Mark, for letting us know that the MD5 search would only reveal the most easily found results. I tried all three of my email addresses, and a couple of passwords, and nothing came up.

When you say bad actors will scrape individual sites, how much more work is involved for them? Google is tuned to gather public info, I expect, so they may be less efficient at gathering private info. How wide a search would the bad people have to use, how could they make their searches more efficient?

I guess the advice is to make up an identity to go with Gravatar icons, or not use Gravatar on sites where confidentiality matters, such as sites related to personal health conditions.

It would not take much effort at all. Any of our analysts including me could write code to scrape gravatar hashes from, for example, stackoverflow and wordpress.com within an hour or two. Then we would just need to launch hashcat on a machine with a GPU and let it run for a few days to reverse a high percentage. We could also correlate hashes between the two sites to find bloggers on wordpress.com who also have a stackoverflow account. That's just an example of two websites. Including several more and cross-correlating is easy to do.

Not totally convinced this is a major problem. Yes, it does potentially expose users to the surveillance phase of "hacking". But a lot of the same information can also be gathered from simply knowing someone's email address, first and last name, etc. One other point could be made about the MD5 hashes... Marketing companies also use lists of hashed email addresses to identify potential customers segments. They will sometimes use the hashed email lists as a way to share lists with 3rd parties while preserving user privacy.

I do agree with the notion of having a service like Gravatar use a stronger encryption algorithm like bcrypt.

I have my own domains and mailserver and for years I use a different email address for every website that needs an address. For example, here I use wordfence.com@myowndomain.com. For sites I like and comment more then once, I add that email address to my Gravatar account. Two added benefits (ok, now three ;) of working like this are:
1) When I start to receive spam I can block that email address so the spam is gone forever!
2) I know where I used that email address, so I know they either sold my email address or they have been hacked. I always contact them (it doesn't happen a lot though). Surprising replies I get...

That's a great question. If you're not concerned with someone being able to tie your use of all those sites to an inividual - and if you're also not concerned about your email address becoming public, then you don't need to do anything. If you do want to ensure someone can't connect you to all those separate accounts, then you have a lot of work ahead. You would need to use the plus-suffix technique we suggest in the post and update your email accounts on all the websites you've used with unique email addresses.

According to the article, yes it does.. "The feature also works with hosted Gmail addresses where you use your own domain."

Although I haven't actually tested this out on my own Google Apps domains, it's very likely that Google have enabled this feature across the board on all Google email services where Google's own servers are the mailservers for that account.

Interesting article.
I think the best option is to use a different shared secret for each website in combination with a stronger hashing algorithm like SHA-1 or SHA256. They can fase the old API out and force website owners to transition to the new system by first setting up a shared secret.

This way They can keep the current method of using an API to fetch the profile image but using a unique hash, so the identity can't be linked to other sites. The stronger hashing mechanism in combination with the added use of shared secret to the hash will make much more difficult to reverse engineer the e-mail address.

The suggestion of using an unique e-mail address for each site is good for people that want to be sure of anonimity but totally breaks the use of a gravatar profile image. That's something I would only use if really needed, but it's nice to know the option exists, I didn't know Gmail and Outlook.com support this.

GMail happily reports back whether the email is taken or not. It's dead simple to automate this in order to harvest all possible combinations of valid GMail addresses. Does this make GMail insecure? No, because your email address is not secret data, and it was never intended to be.

Knowing the existence of an email address is not what we're discussing. It is the email in combination with other data. So for example, you can use this to reverse an email address and associate it with comments, forum posts, the use and membership of a website, and other content, data and behaviors.

When I entered an email address I use to receive ezines, a long string of numbers and letters appeared on the google page that popped up. I'm guessing the string of numbers and letters is the MD5 hash of my email. There were no listings in the results.

Hiya, one overlooked problem with myname+facebook@gmail.com is that when someone send you an email and you confirm you have received the email the receiving party will actually see your real email address as myname@gmail so the story is not complete. I have not tested it with an auto-responder but I guess it's the same. try it ;-)

This is a Return Receipt for the mail that you sent to myname@gmail.com.

Note: This Return Receipt only acknowledges that the message was displayed on the recipient's computer. There is no guarantee that the recipient has read or understood the message contents.

I tried my two gmail accounts (only one of which has a gravatar associated with it) and got back the long hash tag but no documents associated with it. I have other web based mail but no gravatars associated with them. Doesn't look like a problem to me--or at least for me.

Excellent post, Mark, thankyou. On a sidenote, I have been pondering the idea that perhaps Gravatar should be removed from the core and turned into as a plugin (similarly to Akismet) that could be deactivated in order to manage user avatars locally. This can be easily achieved combining a couple of plugins ("WP User Avatar" and "Disable User Gravatar") but it seems surprising to me that a feature as basic as managing user avatars currently can only be done via a third party service. I would be very interested in hearing your thoughts on this issue.

Why have you published this and caused an undue panic like this, have you discussed these issues with wordpress.com & Matt in order that they could review your information & possibly change their hashing policies before making it public like this?

I wouldn't consider this a critical or severe issue, even though i fully understand the privacy concerns, but i would've thought you would have spoken with someone at wordpress.com beforehand and given them the chance to update the hashes or whatever fixes they can deliver before going publicwith it.

As we pointed out in the post, this has been covered extensively and much discussed, in public forums, within the security community, for several years now. They're well aware of the issue and there has been no movement on this for over 7 years.

I didn't click on all the links in the article, it was a long read & I pretty much scanned through as I already know the issues with md5 hashing, rainbow tables and hash collisions etc which is what makes md5 so insecure, i've even questioned the md5 hashing routine for passwords, even though it uses stretching & rehashing, it's still md5 & there are collisions.

& you are right, this issue should've been fixed and sorted nearly a decade ago.

My apologies for seeming to be harsh in my initial response.

Though I do think the article makes it sound like a critical security flaw, I don't think it's as severe as it sounds in this regard, tracking cookies, and social media sites give up far more of your information to spammers & Profilers than this does, and they're a lot more aggressive & damaging.

Vaughan Montgomery - I think what you and many people in here aren't getting are the potential side affects of violating 6sigma rules or HIPPA laws.

Let's just say a client of yours uses a gravatar on a site you set up for them.
Then let's say said client develops Aids.
The let's say said client goes to a website of a famous doctor that helps aids patients
Then let's say said client of yours is a famous Church pastor

Then let's say some scumbag cracks his gravatar hash

Then let's say said scumbag traces this pastor's online activities based on cracking the email from the gravatar and finds out said client is a famous pastor of a big well known church and would like to screw over the pastor somehow and put the word out that said client of yours frequents a website of a doctor who helps aids patients.
The pastor is totally innocent of wrong doing and acquired aids while in Haiti helping Haitian Hurricane victims because of an accident he needed a blood transfusion and got tainted blood.

Well let's say said scumbag puts all over the net that the pastor your client that used the gravatar you set up for him on his church website is gay and had gay sex and got aids. Totally untrue but his reputation as we know it is ruined and he probably didn't want it getting out for obvious reasons.

Sure, because he's a good Christian pastor he may not sue you but, your reputation is shot as a consultant, and the federal government will fine you millions for serious violations of the HIPPA act and then who wants to do business with you then?

Still think this is something minor and nothing to get concerned about?

Mark Maunder is not a flake and if he posts a concern you can bloody well bet it's something that everyone should be concerned about. This issue has grave implications... not only for the client but for everyone in the web design/seo/online marketing industry.

This is also why there are serious issues with the WordPress REST API that now is forced to be on and available for every WP site with the latest 4.7 release. Anyone can grab everyone's MD5 hashed gravatar email using the publicly accessible REST API. (Example: https://yoast.com/wp-json/wp/v2/users?per_page=100). And what's worse is that the 4.7 release deprecated the filters that allowed users to "turn off" the REST API.

There is a problem with using unique email addresses for every website - what happens if you forget you password?

Most password recovery processes involve entering your email address to receive an email with a recovery link or one time password. However, if you have forgotten your password, then in all probability, you have also forgotten the unique email address too. This will make lost password recovery impossible.

Interesting... my search yielded a single result: a photo. Turns out a mobile app I use makes use of Gravatar and associates my account there with photos taken while using the app. In my case, no big deal, but that could be even more damaging in certain situations.

I found this when I searched - "..No 'Prompter: Was it Something I Said?
cvelardi.blogspot.com/2011/10/was-it-something-i-said.html
Oct 4, 2011 - akismet-3d51c601e978cc31fed44face153ddc2 October 4, 2011 at 12:34 PM. Don needs to get a life.... ReplyDelete. Add comment ......."

Having sold insurance and having to abide by HIPPA I can also tell you that now knowing this and doing nothing about it quickly leaves Gravatar open to major fines and penalties by the Federal Government and I can only imagine how many very expensive law suits for violating HIPPA not to mention 6Sigma.

Mark,
I'm wondering if anyone else has half seriously wondered if WP as big as they are has joined with the part of the Federal Government that wants to take away freedoms of Internet. Every time you fix security flaws WP seems to almost intentionally push out more.