Prizes increased again

As of 2018-07-26:07:44:10 UTC we have increased the prizes again, doubling the previous prize total (which was already doubled from the original). This doubling is a little different than the previous one. We want to keep more people in the race, and so are distribution the increases a bit differently than our previous doubling. The new prize schedule is

@jpgoldberg Given that the contest has been live for coming up on three months, do you still expect to achieve all the objectives of this experiment?

Also, I'm curious whether you have some smart ways to "correct" for the randomness of when a password might be found in a random distribution. It's possible that someone could get "lucky" and find a password after only searching a small amount of the keyspace. Similarly, if there are only a small number of people attempting this challenge, it's also possible that all of them might be relatively "unlucky" and have to search >50% of the keyspace before cracking the first password.

Just curious how you account for this. Are you going to rely primarily on the written report from the crackers (where presumably they could indicate their cracking hashes / second and percentage of the keyspace they searched) or is there another way?

Finally, will you make any attempt to extrapolate the findings from this contest to the (presumed) capabilities of nation state actors, who must have massively more resources than even a large and sophisticated cryptocurrency miner?

Given that the contest has been live for coming up on three months, do you still expect to achieve all the objectives of this experiment?

We have largely achieved our principle objective, in getting a much better estimate of how costly these things are to crack. Though once we get write-ups, we will be able to make this more precise. But another goal was to be fair to participants. We intend to award every cent of the prize money. So if this latest prize increase doesn't yield winners, we will start offering hints.

Our Master Password generation assistance defaults to four word passwords, but it allows people to go for three word passwords. One of our internal questions was whether we should continue to allow that. We have learned that the three word passwords probably eight times more expensive to crack than our original estimate.

So while there is still more to learn, and which we would get from a report of a crack, we've mostly learned what we want to learn. But as I said, we also want to be fair to people who have been paying an opportunity cost and paying for electricity and cooling costs over the past months.

Also, I'm curious whether you have some smart ways to "correct" for the randomness of when a password might be found in a random distribution. It's possible that someone could get "lucky" and find a password after only searching a small amount of the keyspace. Similarly, if there are only a small number of people attempting this challenge, it's also possible that all of them might be relatively "unlucky" and have to search >50% of the keyspace before cracking the first password.

Any "smart" correcting would weaken things. We cannot presume to know which order people search through. Even if we did, then the crackers would adjust for our adjustment. Suppose we believed that people were to go down the list alphabetically. We could bias our generation toward picking things from the middle of the list. But if we did that, people would bias their guessing strategy to start there. Picking uniformly is the only smart way for us to generate passwords.

Are you going to rely primarily on the written report from the crackers (where presumably they could indicate their cracking hashes / second and percentage of the keyspace they searched) or is there another way?

Exactly. We would like them to report how much of the space they searched through. If they went through just 10% of the space before they hit a solution or whether they went through %90 will affect our estimate of the cost.

Finally, will you make any attempt to extrapolate the findings from this contest to the (presumed) capabilities of nation state actors, who must have massively more resources than even a large and sophisticated cryptocurrency miner?

Yes and no. This is why we are trying to estimate of total costs instead of time. If one person dedicates $5000 dollars worth of equipment for two months with $200 per month running costs, and another person dedicates $1000 worth of equipment for three months at $500 per month running cost to make the same number of guesses, we will be in a position to look at the depreciation of the fixed assets and come up with a range of costs.

From each report we should be able to come up with an approximate number that represents total cost per 1 billion guesses. That number will differ from report to report, but if we have multiple ones of these, we should, at least in the range we are talking about, be able to see if economies of scale play a role.

But any attempt to estimate economies of scale to the resources available to major governments will still be guesswork.

Has anyone actually tested the sample passwords with hashcat and gotten a successful crack? Setting the whole thing up seemed a snap, and after having hashcat running for days, a sudden curiousity grew in my mind regarding whether I had made a bad assumption about the hash format. I am using hash format:

10900 PBKDF2-HMAC-SHA256

which if you check the hashcat doc example hashes shows that format with the salt and hash in base64 format, such as:

sha256:1000:MTc3MTA0MTQwMjQxNzY=:PYjCU215Mi57AYPKva9j7mvF4Rc5bCnt

or

My assumption was just base64 encoding the salt and hash and writing in the format above would suffice. That does not work on the samples with a dictionary containing the passwords given. I've read the 1Password source, and it appears the salt and hash are just bytes -> hex, but I do not know what the innards of hashcat are expecting under the base64 representation.

Anyone have any ideas, or can you share a hashcat command / hash format which works on the samples? I think the essence of this challenge is optimizing dealing with the dictionary, which is something different, so I hope this is adequately in the realm of common discussion.

Hi @AxzHandul, several people have checked the three samples, and I would assume that every participant would check those themselves as a test of their configuration (and as a test that we didn't get things wrong when producing these).

These were not designed specifically according to hashcat's or John the Ripper's formats, but they are designed to be easy to transform. I don't know hashcat well enough to know exactly what you need to do, but I presume others in this discussion do and may be able to offer help.

Thanks for the reply @jpgoldberg. I wasn’t questioning the validity of the sample hashes — but what Hashcat is doing underneath the hood with the format. I wasn’t referring to checking the validity of the samples with the go tools posted as part of the challenge — I was referring to checking successful crack on a specific Hashcat format. Hopefully someone can enlighten me — unless I missed it, the Hashcat doc doesn’t have much to say on it.

From what you've said, hashcat expects the "hash" to be base64, but our salts and derived keys are represented as hexadecimal, so that is certainly something that will have to be modified.

And this is where the samples come in. Forget about testing the validity in Go, that isn't relevant. What you need to do is make sure that you can get the expected results with hashcat (or with whatever tool you are using). There is no reason to waste computer time trying to crack the unknown ones if your setup can't "crack" the known ones.

@jpgoldberg -- yeah, that's the gist of it. I did base64 encode them -- both from hex and converting them back to decimal and then base64 encoding them -- but the samples don't crack. It's most certainly a formatting issue.

@AxzHandul, It sounds to me like you may be stuck on understanding the nature and roles of these encodings.

I did base64 encode them -- both from hex and converting them back to decimal and then base64 encoding them

Hexadecimal and base64 are two different ways of encoding a sequence of bytes. Decimal is yet a third way to do so. You need to turn the hex back into a sequence of bytes (something that can't be directly printed, which is why we use encodings) and then re-encode that data.

Here are the base64 encodings of the derived key and salt for sample 3UOKUEBO:

One way to help see if your results look plausible is that given the number of bytes underlying data, hex will use two characters per byte. The "5f" in the hex encoding represents one byte of underlying data. So if you have 32 bytes of data (as we do for the derived key), then the hex representation should 64 bytes long (as it is).

Base64 is more compact (which is why it is often preferred over hex). It will use four characters for every three bytes of data (and add "="s when the data isn't a multiple of 3 bytes long). The "Xzej" in the base64 encoding represents three bytes of underlying data. So 32 bytes of underlying data (as we have with our Derived key) will result in 32 * 4/3 (rounded up and without the "="s). So that is 43 characters.

But if you base64 encode a hex or decimal representation of you will end up with something longer than the representation you started with (approximately 4/3 longer). So remember, we are using 32 byte derived keys and 16 byte salts. If your base64 encodings come out the wrong length, you've done it wrong. (Coming out the right length isn't a guarantee that you've done it right, but at least you are encoding the right number of bytes.)

Thanks for the explanation — I understand well the nature of encoding (but appreciated the explanation nonetheless — sanity check always helps, especially since I haven’t had my coffee today :-). My question was really getting at some things in hashcat internals, somewhat encouraged by the new version release of hashcat and some things posted on the hashcat forums. My question really probably belonged there, but given others have very likely the answer I posted here on the off-chance they tread similar territory. Thanks for talking through it with me....cheers!

hashcat will not (by default) put the words in the dictionary in "word1 word2 word3" format; creating combinations will be necessary. hashcat was able to crack the sample passwords in a few seconds when given a combination dictionary including only the password wordlist.

Exhausting the entire space for one salt/hash pair, costing around $1/day on our hardware, would take around 15 years. It would only require about $5000 in electricity at current prices, which is technically lower than the total prize value (before taxes). If 1Password continues increasing the prize value, eventually it will surpass the cost of exhausting the search space with hashcat on AWS or similar, and someone will front the money even if they only make a few hundred dollars of profit. That the cost of an AWS GPU instance to crack one of the passwords is almost $200k is mind-boggling; no wonder AWS is Amazon's profit center. As someone pointed out, though, that's peanuts to anyone with real infrastructure. Once you get to four seed words as suggested by 1Password, you're almost if not certainly out of the reach of nearly everyone.

It's not clear that an FPGA implementation is worthwhile. If it is, it'll most likely need to be run on an AWS f1.16xlarge instance; the cost of purchasing one of those VU9 FPGA cards (the f1.16xlarge has 8 of them) is on the order of $15k. But they can be rented for about $4/hour. It seems like naive reimplementations of pbkdf2_hmac sha256 don't have much better performance than a GTX 1080, but are a little bit more power efficient. edit: reference

Not sure how it makes sense that Chick3nman hasn't cracked one of the passwords yet. It's been (almost?) three months, and he expected to be done in a few weeks. Did he run out of money or just configure his dictionaries wrong?

Not sure how it makes sense that Chick3nman hasn't cracked one of the passwords yet. It's been (almost?) three months, and he expected to be done in a few weeks. Did he run out of money or just configure his dictionaries wrong?hatl

We did the math wrong on that. See some of the later discussion.

Exhausting the entire space for one salt/hash pair, costing around $1/day on our hardware, would take around 15 years. It would only require about $5000 in electricity at current prices, which is technically lower than the total prize value (before taxes). If 1Password continues increasing the prize value, eventually it will surpass the cost of exhausting the search space with hashcat on AWS or similar, and someone will front the money even if they only make a few hundred dollars profit.

We were hoping that someone would spin up to EC2 instances or whatever. This way we would would have a very nice picture of what it cost to succeed.

If we knew when we set this up what we have learned over the past few months then we certainly would have priced things differently. (Or actually just used slightly weaker passwords.) We really wanted the whole thing wrapped up in a month or six weeks for all prizes. But if we knew then what we know now, then we wouldn't have had to run the contest at all. We can say that the total cost is likely to be within $8000 and $16000 (electricity isn't the only cost). This is about eight times my initial estimate. (We originally priced things as twice my initial estimate.) One way to look at this is that I was only wrong by three bits!

But that three bits means that we are now committed to giving away four times as much money as originally planned. (That's four times instead of eight times because we already started by offering rewards that were double what we guessed the cost would be.)
Fortunately the boss isn't docking my pay.

Our next move will be to offer hints. See my August 4th post to see what the hints look like. Each round of hints should halve the amount of work needed.

I'm not really used to all this but already learned quit a lot just testing and setting up hashcat. Right now I'm in the process of creating a wordlist with all possible combinations from the given text file. According to my calculations, this would take up around 200 Terabytes of data. Is that possible?

Am I maybe doing this wrong? should I just feed the permutations directly to hashcat? And if I do that, how do I resume?

@nagele: It isn't a sure thing that they will. As Goldberg said, "if this latest prize increase doesn't yield winners, we will start offering hints." Don't hold back, or someone may beat you to the prize, and then we won't offer any hints at all.

I'm not telling, and not just because I don't know. We want to incentive people to keep working on it. And we certainly couldn't do it during KoreLogic's Crack Me If You Can contest at DEFCON. (We'd really wanted everything wrapped before then to avoid a conflict.)

Your more than $13 per hour reflects an expensive way to do this. Your crack rates are indeed in line with what others have posted, but people who already have cracking rigs can work on this at lower cost, even taking into account the depreciation of their equipment. It should be noted that during recent (and previous) Crack Me If You Can competitions no teams (well at least none of the major ones) used AWS hardware as far as I know.

It also appears that you weren't (initially) aware of how hashcat can pipe candidates into its guessing system. But I don't claim to understand those details either.

But again, the crucial thing here is that I don't know the costs. This is why we are running the contest. So I am not ruling out that you are correct.

It's clear that I under estimated the costs when first setting things up. But note that making an error of three bits means an eight-fold error in costs. We know now that I was off by at least three bits. (My estimate was about $2000 to crack a 42 bit password, so initially set the top prize at around twice that). It would be really nice if you are correct. I would love for three word passwords from our generator to be that strong. It will be inconvenient to wrap up this contest and fairly award all of the money we've promised, but the hint scheme that was developed earlier in this discussion should take care of that.

This estimate isn't anywhere near correct. This is a permutation problem, not a combination problem. "Imagine for this test that there is an urn containing 18328 balls, 3 of them colored green, and you draw 250 balls out. What are the odds of drawing all 3 balls?" fails in the case that the three words are in the wrong order or are repeated, so it is actually nine times harder than he has calculated. If the password consists of three words without repetition, and he divides the original wordlist such that those three words are not in the same 250-word partial list (which is way more likely than not!), he will never crack the password even after spending $400k. It's not like there aren't ways to divide the wordlist that don't destroy your odds of ever cracking the password; he literally just picked the only one that does.

I can't imagine anybody being stupid enough to spend $200k or $400k on AWS hours without considering buying a few $20k servers themselves. There are people on the hashcat forums who will rent their cracking rigs to you for probably $5/hour.

Fun fact: If all 12+ million GPUs in the Ethereum mining network were directed at cracking the password, the 3-word passwords would take a little under 87 seconds each. The super-secure 4-word master passwords that 1P uses would take about 18 days, but good luck bribing the entire network to forfeit their chances of getting $65 million worth of Ether they'd have mined over that time.

@bb8 -- actually, my problem stemmed from nothing more than a tool whose output was just plain wrong. I had everything on my end right all along -- the tool was bad. I tried a different tool, changing nothing else, and everything went as expected.