I have some spare time, and a few hundred DJB2-hashed values sitting around. I thought I'd try to do something "useful" and invert DJB2, such that I could calculate the plaintext of the hashes (which has long since been lost, a fact that is often bemoaned).

Text is a string of ASCII characters, so DJB2 has a lot of collisions, a fact that I knew full well going in to this. Fortunately, the plaintext I have has characteristics that will allow me to use heuristical filtering, so false positives shouldn't be much of an issue :)

In other words, find the remainder of the hash / 33. Then, for all the ASCII values from 65 to 120, check to see if the value / 33 has the same remainder. If it does, subtract it from the hash, divide the hash by 33, and continue the algorithm. In this way, we only investigate promising paths, because we know that the subtraction of C must leave a number that is evenly divisible by 33.

For example, here is the algorithm working to decode a simple hash:

$h = 177676$

$y = h\mod{33} = 4$

Values of $c$ where $4\equiv c\mod{33}$: $70$ and $103$

I can now pursue only those two values of $c$. (In this case, $103$, or 'g', was correct)

Thus, I know my algorithm works to reverse the hashing process. Unfortunately, I ran into a nasty problem. Djb2 will rapidly overflow the bounds of an int, often with plaintext as small as four characters. This overflow essentially results in r being implicitly modded by $2^{32}$.

Normal division won't work. I asked a question on programming SE about division in this case, and was informed about the multiplicative inverse of 33. Unfortunately, I don't need a division operation (yet), I need a remainder operation! This is proving to be much trickier, and I'm not sure it's even possible.

Here's an illustrative example:

$h = 2090289493$ <-- h is actually $6385256691\pmod2^{32}$ because of the overflow

$y = h\mod{33} = 28$ <-- Incorrect! Should be 32

Here is an example of the algorithm using the operation $\Omega$ and an $h$ that has overflowed. This is what I'd like to do, but I don't know what operation $\Omega$ represents.

$h = 2090289493$

$y = h\ \Omega\ {33} = 32$

Values of $c$ where $32\equiv c\mod{33}$: $65$ and $98$

I can now pursue only those two values of $c$.

Am I on the right track with reversing DJB2 (can it be reversed?)? Is there some way of finding the remainder of a large number that has been modded by $2^{32}$?

Hint: I can tell you that the number I'm thinking of is even (0 modulo 2), that's not going to help you know if I'm thinking of 2, 4, or 34857188414. That information is lost when you reduce your input modulo 2^32, only the remainder remains, the original value is lost (yes, forever).
–
ThomasJul 26 '14 at 5:59

@RickyDemer I'm using mod33 to get the remainder, so I know which values of C to investigate.
–
Xcelled194Jul 26 '14 at 6:20

@Thomas what about the multiplicative inverse? It provides a way to recover the original number. Nothing like that exists for a remainder? If not, do you have any suggestions for reversing this hash?
–
Xcelled194Jul 26 '14 at 6:23

But, but, how is that supposed to help you figure out which values of C to investigate? $\;$
–
Ricky DemerJul 26 '14 at 6:27

2 Answers
2

Am I on the right track with reversing DJB2 (can it be reversed?)? Is there some way of finding the remainder of a large number that has been modded by 232?

You were on a right track to explain why it can't be easily inverted.

Given an arbitrary $h_i$, every letter of the alphabet will give you another potential $h_{i-1}$ that the value was before that letter was concatenated. Subtract the letter's value, then invert the multiply. That's a possible hash of some string.

Just six or seven letters will allow you to construct most numbers modulo $2^{32}$ as hash values. For strings longer than that it is impossible to tell what the last letter could have been just from looking at the possible hash values before that letter.

Unless you know the strings are very short, trying to invert the function is unlikely to give you much better performance than an exhaustive search such as you'd need to run for a cryptographically strong hash function. I.e. guess strings that match your expected pattern and see if they give the same hash value.

So my best hope is a dictionary or brute force attack?
–
Xcelled194Jul 26 '14 at 16:00

@Xcelled194, yes, that's likely the case. If you update your question with an example pattern of the kind of string you'd be looking for, I could be able to give a more confident answer.
–
otusJul 27 '14 at 9:45

It's actually really simple: 99% of the plaintext values are English words, or at least follow the format (eg "Joust", " Miku", "Renewal") but there are some occasional misspellings/acronyms.
–
Xcelled194Jul 27 '14 at 17:37

@Xcelled194, in that case a dictionary search should be very easy. Brute force should work if not all are dictionary. Because it's a bytewise hash, you can share some of the costs between e.g. "Renewal" and "Renege".
–
otusJul 28 '14 at 7:05

If your ints are unsigned then the code r = (r * 33) + (int)c and the fact that you're
using 32-bit integers yield the equation $\;\;\;\; \text{new_r} \: \equiv \: (\text{old_r} \cdot 33) + \text{(int)}\hspace{.02 in}\text{c} \;\; \pmod{2^{32}} \;\;\;\;$.
Since 33 is odd and $2^{32}$ is even, 33 is a unit mod $2^{32}$. $\:$ I used wolframalpha to determine
that the multiplicative inverse of 33 mod $2^{32}$ is 1041204193. $\;\;\;$ Then, I solved that linear
equation for $\:\text{old_r}\:$ and arranged everything into a sequence of equalities and congruences.

Soo... You're saying it won't work because of this? Or is this an alternative to my algorithm?
–
Xcelled194Jul 27 '14 at 0:33

This is an alternative to your algorithm (and one that actually accounts for $\hspace{1.9 in}$ r being implicitly modded by $2^{32}$). $\;$
–
Ricky DemerJul 27 '14 at 0:38

Could you edit your post to include an explanation, example, etc? For instance, where'd new_r come from? How did you work out this algorithm? I'll play with your equation on my own, but I can't accept this until it's improved a bit.
–
Xcelled194Jul 27 '14 at 1:41

This appears to simply reverse the math to get the old_r. I can do this, however that means I then have to follow every c, which is not computationally viable. I was looking for a way to find a subset of c to follow.
–
Xcelled194Jul 27 '14 at 2:23

If you can do that, then you should have said so, since your reverse algorithm doesn't. $\hspace{.95 in}$
–
Ricky DemerJul 27 '14 at 2:35