A constant combined compute power of 150 GH (measured on SHA1 bruteforce) was used throughout the contest. This figure peaked to about 190 GH which is the rough equivalent of 35 GTX 980Ti. Around 130 CPU cores were reserved solely for GPU unfriendly algorithms, this burst to maximum of 300 cores for a short period. An additional 100 CPU cores were used for all other algorithms which peaked to 250 cores.

Strategy

Free-for-all approach

Have fun

Utilize resources efficiently

Surprise the other teams

Before the contest

We redeveloped our hash management system and ensured it was fully functional prior to the contest. In addition we had the pleasure of beta testing a personal project of one of our members. An improved distributed hashcat system dubbed Hashtopussy, (a fork of the hashtopus project) with numerous improvements including; a revamped interface, multi-user and user-rights-management support, optimized hash handling and of course support for Hashcat3. Keep an eye out for this project, as it will be released soon.

Hashtopussy instances were deployed and allowed the team to remotely manage, voluntarily donate compute cycles and deploy tasks across clusters of compute nodes and streamline the cracking process. As hashcat is now open source (big thanks to the hashcat developers), we were able to easily apply minor changes to ensure it played nicely in a distributed environment.

During the contest

We started off by probing all algorithms looking, for any signs of patterns and tackled the bcrypts immediately by running extremely simple checks against common passwords. We recovered about 20 bcrypts within the first hour on our CPU cluster and were able to feed it with enough test candidates allowing us yield hits consistently.

MDXfind was used to quickly test algorithms which hashcat couldn’t initially handle namely DCC, with Waffle quickly adding WBB support. Once we knew these hashes were valid, support for both these algorithms were swiftly added to hashcat.

As there is already a write-up regarding the patterns for the generated hashes we won’t go into them, other than saying we spotted some and missed others and discovered some too late into the contest. 11 hours into the contest and we had hits for every algorithm except phpbb3_gen which we didn’t waste too much time pursuing. This was a pretty good starting point and kept us busy through the remainder of the time.

To make it up to some individuals who have complained that our large submission towards the end of the contest would have skewed any pretty graphs, we have decided to provide analytics gathered by our hash management system. The graphs should reflect the actual crack progression for each individual hashlist throughout the contest. This should provide some insight on how we tackled each hashlist.

Graphs for real hashlists

Graphs for generated hashlists

Interesting observations

As a portion of the hashes were from the real environment there is always the chance the hashes are mislabeled. We identified some DoubleMD5 labelled as MD5, these hashes tackled by cracking the initial MD5 list as DoubleMD5 then performing a single MD5 on the password prior to submission. We also identified vBulletin <3.8.5 hashes which were mislabeled MD5:pass with the salt being the plain for this MD5, there was no possible way to submit these since they were technically solved.

Once again since there were real world hashes, sometimes hashes become corrupted during extraction or transport. A feature of hashcat is that does not match every bit of the hash, allowing it to essentially detect a mistyped hash. We encountered a small portion of these which we assumed were most likely corrupted. As there wasn’t a large number of these, we simply ignored them.

While GPUs are extremely powerful in parallel hash cracking, it was surprising to see that the top scorer in our team predominately used CPUs.

Final remarks

A huge thanks to Bitcrack and Hashkiller for organizing an almost flawless contest, we had plenty of fun and very little sleep. We can only imagine the amount of time and effort put into arranging this contest to ensure it run so smoothly. Congratulations to Team Hashcat on their second place, glad we’re able to finally beat our rivals. Congratulations to the FCHC, I’m in your Wifi, LeakedSource and all other teams who participated.

When we obtained the Myspace data, we didn’t think too much of it for several reasons. In addition to being a fairly old data-set, the passwords were also truncated to length ten and converted to lowercase prior to being hashed with the SHA-1 algorithm. This means that some of the passwords recovered would be ambiguous and incomplete. This is no longer the case for roughly 68M of the hashes.

The total data-set of roughly 360,213,049 lines contained 359,005,905 usable hashes. This data was de-duplicated to 116,822,086 SHA-1 hashes. Roughly 97% of these hashes were recovered by our group, totaling to 113M hashes. As the passwords were all pre-processed before hashing, the plain-texts which we recovered did not exceed length ten and were all lower-cased.

Since the plain-text passwords aren’t in their original form, they are not as interesting as it does not allow us to gather that much useful information from them. Being truncated, they do give us a glimpse of some longer passwords we may have previously not been able to recover.

Interestingly, user ‘frekvent’ over at the hashes.orgforum made an amazing discovery. It appears that for some users there exists an additional salted SHA-1 hash that contains the password in it’s original form, without being truncated or lower-cased. This hash is generated by salting the password with the userid prior to being hashed with SHA-1.

Rather than directly recover the salted SHA-1 hashes, we can take a shortcut. This means for all those users who contain this secondary salted SHA-1 hash, we can now case correct it against the plain-text we previously recovered. It also means we can derive the actual password for these users prior to length ten truncation.

Out of the entire data-set, about 68M users contain the secondary salted SHA-1 password hash. Of these 68M users, we were able to pair 66M up with the recovered password. This 66M list was then divided into two groups, ‘non-user pass’ which are users containing system generated passwords (14M) and ‘meaningful passes’, those which belong to users (51.6M). We were only able to pair 66M of the total 68M hashes as we have not fully recovered all the SHA1 hashes, but only 97% of them.

Using our tools we performed either a case toggle and/or length extension attack for each of the salted hash pairs. We have successfully verified over 45M plain-texts against their salted SHA-1 counterpart. The case toggle refers to toggling all passes length ten or less against the salted SHA-1. The length extension attack involves cycling through all possible characters and appending them to the plain-text derived from the recovered normal SHA1 and checking this against the salted SHA-1 hash.

Having both variations of the password hashes has made cracking the longer passwords quite easy since we can first recover the length 10 representation and use this in length extension attacks to obtain the full length password. It would appear that the Myspace data may have some usefulness after all.

Note: The salted hashes can be paired up with their corresponding plaintext data and arranged such that they can be recovered using off the shelf software. However, this won't work for case correction, you will also need to reparse the final output.

Friday, September 11, 2015

We would like to present some statistics based on our current finds of roughly 11.7 million passwords. Firstly, we would like to state that we are predominantly targeting a 15 million subset of the 36 million potential passwords. Secondly, bear in mind that we still haven't cracked about 4 million tokens, all of which could affect the findings presented here.

Total password entries = 11,716,208

Total unique password entries = 4,867,246

The majority of passwords that we have cracked so far appear to be quite simple, either being lowercase with numbers or just lowercase. We also observed some UTF-8 encoded passwords. Passwords containing purely numbers also appear to be relatively popular. Note that we crack passwords in gradual increasing complexity, so it is normal that we have recovered most of the simpler ones first.

The shortest password we cracked had a character length of 1 (length 1), while the longest was length 28. We normally would expect to see more length 7 characters, but as evident from the above results, this was not the case. It is possible that there were fewer length 7 passwords compared to length 6 and 8 because we covered larger bruteforce attacks for the length 6 keyspace. We also observed some extremely long passwords, some of which were caused by users using either their email address or their lengthy usernames as their password.

Going beyond the 15 million vulnerable hashes and another interesting find

User data as passwords

We were curious as to how many users use their username as their password. A full run against all 36 million users was conducted in parallel and we discovered that there were over 630,000 matches. We tried each username against its corresponding bcrypt hash and performed some simple case toggling. This number shows that even without using the discoveries outlined in our previous blog post, more than 630,000 bcrypt hashes could have been easily recovered. We would like to note that this search was not exhaustive, as we only tried common case mutations. We suspect that this figure would have been higher if we had tried more upper and lower case combinations, though this would have taken much longer. It is also worth noting that a similar approach can be tried, but using the email address or other user data.

Suspicious accounts

Our very brief analysis of the passwords suggests that the possible ‘suspicious’ accounts used the following passwords:

asdferfa324

hello

DEFAULT

123456

asdfg

superman

iloveyou

111111iwillneverdoitagain

welcome

Top Interesting passwords

Rather than bore everyone with the standard top 10/50/100 lists, one of our members has kindly put together a top interesting passwords classified by various categories purely for your entertainment.

Those that think adding a few more words to the word password makes it harder to crack:

A package has been sent out to the press containing all the statistical analysis and data derived from the cracked passwords. If you are affiliated with the media, reporting on this story or related stories and wish to acquire these statistics, then please email us.