I've written a hashfunction which creates a 32-byte hash out of a password.Are there any ways i can test it for collisions etc?I want to implement this function in an encryption program i am writing, so all ideas are welcome!

I know it's a bit long, but i think it is could work.I've written a program around it which generates a hash + password to a file like:

a-<hash>b-<hash>...aa-<hash>ab-<hash>...aaa-<hash>...zzzz-<hash>

I've did this until zzzz. Then i've imported the complete textfile (part by part) into an excel file with 1 DWORD value per cel. Then i sorted all hashes per dword and checked if in any place there were 2 and the same dword values. in 65535 different hashes no one dword value in one row is the same.So this should be promising.

Attached you will find the source for a file which prints the hash to StdOut and takes a password as commandline parameter.Any ideas, comments on the alghoritm are highly appreciated since i'm still pretty new to cryptography.

As for the algo, i've tested it with 65535 different hashes at a time. However, i am unable to test all the hashes that can be created and therefore unable to know what the collision rate would be.I know that for every hash in md5 you should have 3-4 different passwords (if i remember correctly).I'm wondering what that would be in this case and how one would calculate that.

Thanks, it's an interesting read indeed. But like you said, i haven't got a clue how to apply it here.I don't really care about speed since it only has to hash once for every password / file that is encrypted.So if the hash would take 0.2 or 0.9 seconds to complete isn't really important.

The first function i created for this wasn't a hash function and made the same output for 'a' as for 'aa' as for 'aaa' etc.This one doesn't.The result of a-z is this:

I've added the : sign between every dword so it is more easy to import them to excel or compare them one dword at a time.As you can see the result looks promising (IMO), but i'm still not sure how to test it to be absolutely sure.The biggest thing i'm worried about is that this hash is going to be the input of a function which generates a double byte stream to pick bytes out of a table file which are used to XOR the bytes in a plaintext file.It would be useless if there are too many passwords generating the same hash, since it would weaken the complete program.

I'm indeed writing a secure hash, and i already was advised not to write one on my own, but i still want to write one on my own.The complete program will be fully written in my code, no code is borrowed from anyone (except the GUI part).I know i will probably end up spending a whole lot of time optimizing the alghoritms, but IMO it's worth the result.I will use some dictionary files to run through the program, that's no problem, but when i do have the hashes in a file, how do i compare so many hashes?Excel is worthless, i already noticed that. Do you have any ideas for this?

I'm indeed writing a secure hash, and i already was advised not to write one on my own, but i still want to write one on my own.

:) - it's an interesting project, but I wouldn't trust myself to write a secure hash before doing a LOT of research.

I will use some dictionary files to run through the program, that's no problem, but when i do have the hashes in a file, how do i compare so many hashes?Excel is worthless, i already noticed that. Do you have any ideas for this?

Good question, really.

Perhaps looking at the entropy of the output list of hashes is a good idea? Also, try sorting the {input,hash} pairs by {hash} - if it looks like some items are sorted by {input}, then the hash isn't good.

That's the idea behind it, but how would i write a program that can handle that many items to sort?excel can handle 65535 different items and thats not sufficient.I think i would need a whole lot of memory, or an (almost) endless loop to sort such a thing.Any ideas?

It takes about 50 seconds to sort 10^7 dwords using Shellsort on 933mhz PIII. With a faster algo and cpu it would be possible to sort in a reasonable time pretty larger number of hashes. The problem will be checking all those values visually afterwards. :shock:

As for writing the program.. Here is one solution: Allocate two blocks of memory. Put all the string you want to hash in the first block. Hash a string and put the hash value along with a pointer to the original string in the second block. Sort the hashes while moving the pointer accordingly. Obviously you'll need to implement some progie to display the result (breaking the second block into smaller parts could help).

Thanks for the explanation, but i just realized that the windows sort program should be able to help me sort them.Indeed, the manual checking will be a pain in the butt, but it's worth it...I'll let you know the results.

There's a really good website for sorting algorithms, they're in java, but that's an easy enough syntax to grasp, even if you don't know the language itself.http://www.cs.ubc.ca/~harrison/Java/sorting-demo.html

This details many different sorting algorithms, and even gives a visual representation of them, and their comperative speeds, also note that these speeds are much slower than they would be without any visual representation. The Fast Quick Sort on that page is the fastest sort algo they have there, and it's pretty damned fast, so if you want to speed up the sorting thyere, you should definately look into using that algo.

http://en.wikipedia.org/wiki/Sort_algorithm#Quicksort gives some nice information regarding what exactly you're doing, and what the ideal speed of a quicksort is.

http://en.wikipedia.org/wiki/Big_O_notation explains the notation of O(log n) and the other efficiancy info there.

Thanks, i will check them some other time.I've used the windows sort function which took about 15 minutes to sort a 3.4GB file, so that's sufficient for now.

I just tested the hash extensively with a lot of words.

Here's what i did:

i've downloaded every dictionary from http://www.picozip.com/prt/index.htmlI've added all the dictionaries together to one big dictionary. Total size: 49MB

I have to say i had to build on extra check into the hash because with 1 inputted word the result was a devide by 0 which crashed the program.After the updated code i ran the following tests:

I wrote a program which would hash every word in that dictfile and outputted it to another file in the format <hash>-<word>

a total of 4,485,600 words (3.4GB file!)Then i sorted the complete file using the windows sort.exe tool by hash.

After this i wrote another program which compared every line to the next.-if they are the same then skip the line.-if they are not the same, then compare only the hash.-if hashes match then print them to stdout.

This resulted in 0 double hashes.

So in this case you would say that with 4,5 million tested passwords the hash is quite nice ;)

Yes indeed.I needed a way to use the inputted password for generating a stream of pseudo random dwords to pick bytes from the table file.The first one i wrote was a simple additional function:'passwordpasswordpasswordpass' etc...of course this wasn't sufficient so i decided to write something else, which is this function.I never tried to make it this efficient, but as time progresses it might be worth doing so. :D