3 Answers
3

If it were a reversible encryption, zip and tar.gz formats would be quite verbose. :)

The reason it doesn't help hackers too much (obviously knowing how one is made is beneficial) is that if they find a password to a system that is hashed, e.g. 2fcab58712467eab4004583eb8fb7f89, they need to know the original string used to create it, and also if any salt was used. That is because when you login, for obvious reasons, the password string is hashed with the same method as it is generated and then that resulting hash is compared to what is stored.

Also, many developers are migrating to bcrypt which incorporates a work factor, if the hashing takes 1 second as opposed to .01 second, it greatly slows down generating a rainbow table for you application, and those old PHP sites using md5() only become the low hanging fruit.

MD5 is a hash function -- it "mixes" input data in such a way that it should be unfeasible to do a number of things, including recovering the input given the output (it is not encryption, there is no key and it is not meant to be inverted -- rather the opposite). A handwaving description is that each input bit is injected several times in a large enough internal state, which is mixed such that any difference quickly propagates to the whole state.

MD5 is public since 1992. There is no secret, and has never been any secret, to the design of MD5.

MD5 is considered cryptographically broken since 2004, year of publication of the first collision (two distinct input messages which yield the same output); it was considered "weak" since 1996 (when some structural properties were found, which were believed to ultimately help in building collisions). However, there are other hash functions, which are as public as MD5 is, and for which no weakness is known yet: the SHA-2 family. Newer hash functions are currently being evaluated as part of the SHA-3 competition.

The really troubling part is that there is no known mathematical proof that a hash function may actually exist. A hash function is a publicly described efficient algorithm, which can be embedded as a logic circuit of a finite, fixed and small size. For the practitioners of computational complexity, it is somewhat surprising that it is possible to exhibit a circuit which cannot be inverted. So right now we only have candidates: functions for which nobody has found weaknesses yet, rather than function for which no weakness exists. On the other hand, the case of MD5 shows that, apparently, getting from known structural weaknesses to actual collisions to attacks takes a substantial amount of time (weaknesses in 1996, collisions in 2004, applied collisions -- to a pair of X.509 certificates -- in 2008), so the current trend is to use algorithm agility: when we use a hash function in a protocol, we also think about how we could transition to another, should the hash function prove to be weak.

One of the criteria of good cryptographic operations is that knowledge of the algorithm should not make it easier to break the encryption. So an encryption should not be reversible without knowledge of the algorithm and the key, and a hash function must not be reversible regardless of knowledge of the algorithm (the term used is "computationally infeasible").

MD5 and other hash function (like SHA-1 SHA-256, etc) perform a one-way operation on data that creates a digest or "fingerprint" that is usually much smaller than than the plaintext. This one way function cannot be reversed to retrieve the plaintext, even when you know exactly what the function does.

Likewise, knowledge of an encryption algorithm doesn't make it any easier (assuming a good algorithm) to recover plaintext from ciphertext. The reverse process is "computationally infeasible" without knowledge of the encryption key used.