sulfericacid has asked for the
wisdom of the Perl Monks concerning the following question:

Hello fellow monks, it's sure been a while.

I'm preparing for a Security+ Comptia exam on the 15th of this month and after going through the resources, brute force password breaking is a fairly big issue. I've never been too interested in playing with it myself but I figured it'd be a good learning experience to see the difference in breaking a 3 character password vs a 5+ character password.

Below is my first attempt. It works fairly well on passwords 1-3 characters long. A 3 character password took about 20 minutes (sometimes as much as 40). And to my shocker I was able to snatch a 4 character password in just over 12 hours (it took 12,000,000 password tries to solve it).

Not happpy yet, I decided to try with a 5 character password. After about 16 hours it locked up saying "OUT OF MEMORY".

Anyway, I have a few questions and please keep in mind it's not perfect (it doesn't read a dictionary as I want this to be a totally random brute force and it doesn't have every character a password can have).

1) I have a hash set up that stores every attempted password which seemed good for short 1-3 character passwords. I know this is why the password solver ran out of memory but would the script work the same without it? My initial assumption is it could ultimately take infinite tries to crack the password unless it's told to find new ones. What are your thoughts on this?

2) I never got into multithreading or anything of that nature but would this be a prime example of something that could be improved by using it?

3) Share your experiences in doing this with Perl. How fast has yours solved your passwords for you? Anything you can share will help me find a base line to improve this script and give me more experience/knowledge for my Security+ exam.

Your charset (taking the above faux pas into account), has 62 chars. For a 3-char password that gives just 238328 possibilities. But you had to try 288,994 before you found it because you are generating duplicates.

You will run out of space using a hash as a duplicates detection system.

6-chars * 62 := possibilities. ( and this one would take 2 Terabytes! )

You need to find another way to detect duplicates. And the easiest way to do that is to not generate them.

Shuffling an array to create your passwords is highly inefficient, especially using a pure Perl shuffle.

Update: Besides, in the real world, the slow bit is not generating the possibilities--assuming you use sensible methods--it is testing each possibility. You obviously do not have the actual password to directly compare to (else you wouldn't need to do this:), so you have to inject the password into the application or remote interface (along with the account name or user id). That involves IO which means it will invariably take far longer than even the suckiest password generation algorithm.

Also, in the real world, any authentication mechanism that doesn't detect rapid and repeated failed login attempts should be justification for having the programmers ritually disembowelled in public with a rusty spoon! At the very least they should double the time before another attempt may be made to log in with each failure. And in this world, people being what they are, some relatively low limit on the number of consecutive failed attempts should lock out the password for human supervised verification and reset.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

If you seriously wanted to crack passwords, you would write the code in C (or the like) and not perl. Additionally, using the dictionary and variations before going to brute force is certainly something you want in your arsenal. I say variations because people tend to replace zero for the letter O or put a symbol like _ or - between two short dictionary words or add numbers to the end. This human nature flaw is one you should exploit*. The divide and conquer approach is good but there is only so much improvement that can be done on a single CPU. Writing distributed algorithms is better suited for other languages like Erlang. If you don't know what Rainbow Tables are then look them up, it will make your attempt of caching all possible 3 character passwords seem like child's play.

* Update: Lest anyone assume I feel cracking passwords is ethical, let me be clear - I don't. sulfericacid has indicated that this is for educational purposes and that is all I support. In Re: Character Combinations I point out how silly brute force cracking can be (555 years for the problem presented). I in know way think any of these techniques should be used to gain unauthorized access nor do I feel you should engage in such practices without authorization even if you feel proving the weakness is for everyone's own good. You can ask a well known monk how well that worked out for him.

It seems like you are not really skipping anything -- you are just printing a message; this IO might be slowing you down a little.

If you don't print your progress message ("Guessing: ...) that may also save a tiny bit of time.

I wonder if your shuffle sub could be more efficient if you eliminated all the array ref and wantarray checks, since you always seem to be passing an array and returning an array. Or maybe try shuffle from the Core List::Util module.

Agreed. Normally, the CPU-intensive part of a password cracker is not the listing of passwords to guess, but in performing the password calculation related to the application. Since you are just comparing strings, the program should be greatly faster than your algorithm is producing.

In this case, I suspect the majority of your time is spent on making duplicate guesses. The further along the program gets, the more likely that a random guess has already been tried. At some point you'll have a fraction of a percent chance of guessing a password that you haven't already guessed, meaning that more than 99% of your guesses are a waste of time. Every increment you apply to the length will make this exponentially a more wasteful algorithm.

Your guess generator needs to stop guessing strings that it's already tried. You could try to pre-generate all of the the guesses in a shuffled list and pop them off. Or you could try a sequential list (but randomly define what that sequence is). In fact a smart sequence might attempt common characters before less common ones. Another solution might be to keep track of partial character strings and mark them as dead ends as you go, deleting the children to save memory.

These are just ideas. I see your goal in taking a random approach at the guesses, but you need to step it up a bit, because right now, someone doing a sequential scan, and making all the wrong guesses first, will out-perform you.

And yes, I/O is slow. Definitely don't print every single guess. Maybe print counters every hundred or thousand cycles. This might give you a clearer picture of how the algorithm slows down over time.

1) If I understand that sub correctly you are creating your password by shuffling the whole list of available chars 5 times to generate a 5 char password (why not just pick a random char with rand(@char) instead?), then you put that string into a rapidly increasing hash that will have to be swapped out to hard disk as soon as the main memory is exhausted. You should try to profile it but I guess that takes at least 100 times longer than the one string comparision to test if you found the password.

Lets assume you have searched 99,9% of all possible passwords and so practically all passwords you generate were already tried (the worst case). Then avoiding the password comparision would be making your program just 1% slower. And that doesn't even take into account that you could eliminate the hash which would certainly give a speedup of more than the 1% you lost

2) Now I guess you made this as a test case and had a real encryption in mind so that testing the password means applying a costly function with permutations and math operations. But then the big search space becomes a problem. With 4 character passwords (and about 70 different characters) you have a search space of 24 million, with 5 characters 1.68 billion...

Leaving aside that perl hashes use a lot of memory and hashes are only efficient when they are mostly empty, you get to the limits of your memory very fast even if you use a more efficient hash storage. If you put the hash on disk instead of into memory you can store passwords with not even two characters more but then testing for doubles costs more than testing the password which makes it absolutely useless again.

As described in The golf course looks great, my swing feels good, I like my chances (Part III) (see "Eliminating Exponentiation" section
and the C/assembler search program), searching for
this golfic magic formula seems to be essentially equivalent to brute
force password cracking.
In this case, it took me six months to exhaustively md5 search a six
character magic string (with 180 different characters in it).
The most time-critical piece was the calculation of the md5 hash, and switching
to an assembly language md5 routine made a huge difference to the speed
of the brute force searcher. It would have been impossible for me to
solve it in pure Perl.

In researching this problem, I stumbled upon
some interesting links related to password cracking:

Though I haven't yet had a chance to play around with CUDA and OpenCL,
these new technologies seem to have fundamentally changed the password
cracking playing field. Nowadays, you see, a humble PC containing four
or more high-end NVIDIA graphics cards, in harness with CUDA/OpenCL, is
effectively
a personal supercomputer.
And since password cracking is highly
parallelizable, a farm of these cheap super computers would make a formidable
password cracking weapon.

Trying to remember all the passwords is the killer here. If you really want to do it in some random way instead of in a more ordered way, why not pick the first three characters at random, and then try all combinations of the next two characters? Of course, that still doesn't scale, and will probably break down after 6 characters.

Note also that in real life, the allowable set of characters is much more than the set you are using, making you run out of memory even faster.

You may want a much smarter datastructure, for instance based on a trie and bitvectors as leaves to keep track of the tried passwords. It'll take more work (codewise) to insert and search, but you'll save on memory.

I wouldn't even dream to use Perl for any such brute force work. This seems like a job for C. Specially since if you're going to use anything like this in practise, you only have the encrypted password.

In Re^2: Improve password solver, tprocter says "I see your goal in taking a random approach at the guesses...", and this seems to be generally accepted as a valid approach although perhaps rather expensive in terms of memory and execution time.

I don't see the point of this. Why would generating guesses at random be preferable to sequentially iterating through every possible combination, e.g., for the character set given in the OP, from 'aaa' to ' ' (three spaces) for a three letter password? The only advantage I can imagine is that it might fool a login monitoring program that was set up to detect a series of sequential passwords.

I was about to launch into a defense of the concept here, but I basically proved myself wrong. The primary effect of randomizing the test sequence would be to encourage people to use stronger passwords, which is good, but not usually a goal of an attacker. However, a cracker could be at a disadvantage if it consistently starts with 'a' and the majority of solved passwords start with 'Z'. A middle ground might be to randomly select where you start from in a character sequence, and sequentially test all variations for that length in a circular style of sequence.