This is theoretical question. I'd like to know if it's possible (and what are eventually the consequences), not that I'm going to do it in one of my projects. ;)

The first hashing functions created were based on a symmetric cipher (just like the first Unix crypt was based on DES). The idea was simple. Use the password as key to encode a constant.

I'd like to know if it's still possible to convert any modern symmetric cipher into a hashing function (and second question: into a cryptographic hashing function). Is it possible (easier) to guess the key, when you have an encoded and decoded messages?

7 Answers
7

The usual hash functions (MD5, SHA-1, SHA-256...) use the Merkle-Damgård construction which relies on a block cipher E. A running state r is initialized to a conventional value. Then the input data is split into a number of chunks, each chunk being used as key for the block cipher: r is encrypted with E, using the current chunk as key; the result is added (or XORed) with r, and this yields the new state r for the next chunk. Last obtained state is the hash output.

If you want to apply the Merkle-Damgård construction on a standard block cipher such as AES, you will run into the following problems:

The size of the internal state is the size of the hash output, and it is also the block size of the cipher. AES is a 128-bit block cipher, leading to a 128-bit output, which is a bit too small with regards to today's technology (it would imply resistance to collisions only up to 264 evaluations, which is expensive but doable).

The MD construction exercises unusual features of the block cipher. In particular, the input data is used as key, and collision attacks (a major concern for hash functions) correspond to related-key attacks on the underlying cipher. Related-key attacks are not a problem for block ciphers when they are used for what they were designed to, i.e. encryption. AES is known to have slight weaknesses with regards to related keys but this is not a problem for encryption. It will become a problem if AES is used as building block in a MD-based hash function. (Resistance to related-key attacks was not a design criterion of the AES competition.)

Therefore, MD-based hash function use custom block ciphers which are designed to be especially robust against related-key attacks (or, at least, so we hope for). In the case of SHA-1, the inner block cipher was blessed with a name of its own, SHACAL.

Another example is Whirlpool, based on the block cipher "W", an AES derivative with a larger block size (512 bits) and a revamped key schedule to make it much stronger against related keys (and, unfortunately, it makes it much slower too). This addresses the problems explained above. (Whirlpool does not use the Merkle-Damgård construction, but Miyaguchi-Preneel, a distinct construction which has its own quirks.)

Yet another example is Skein, one of the SHA-3 candidates. It builds over an internal block cipher called Threefish; again, a block cipher with large blocks (512 bits in the "standard" Skein, extensible to 1024 bits).

A few SHA-3 candidates reused not the AES, but parts of the AES; in particular, ECHO and SHAvite-3. The idea was to be able to optimize the hash function with hardware meant to speed up AES, especially the AES-NI opcodes of recent x86 processors. But that's not reusing the block cipher itself.

What must be remembered is that building a hash function out of a block cipher is hard. This is not the same usage context. A fundamental difference is that a block cipher has a key which the attacker does not know (and tries to guess), while a hash function has no secret (when trying to build a collision, the attacker knows everything about every operation in the hash function).

Each of the PGV schemes (64 in total, only some of which are secure) are some permutation of supplying a message block and a current hash value into the key and plaintext inputs of a block cipher, with various feedback/feedforward manipulation of the block cipher input/output values.

e.g. in Davies-Meyer, each message block is used as the key to encrypt the previous hash value with the block cipher, and the result xored with the previous hash value to produce the next hash value.

These one way compression functions are then typically used with the Merkle–Damgård chaining construction, to construct a useful hash function.

The one way compression and chaining constructs are also often combined in hash functions with other techniques such as prefix free padding to prevent extension attacks, and the use of tweaks and other parameters for various security and personalization reasons.

Yes, we can build a hash function from a block cipher, and that's common, although with block ciphers designed for that purpose, when in the following I focus on AES, mentioned in the (different) question that motivated the present answer [which got moved here because said question was found to be a duplicate].

the message is padded to form $n$ blocks $M_i$ the same size as the key of the block cipher (customarily it is appended a single 1 bit to the message, then some number of 0 bits, then the original length of the message in bits on say 64 bits, with the number of 0 the lowest nonnegative integer such that the end result is a multiple of the block cipher's key size);

for $i$ from $0$ to $n-1$ it is computed $H_{i+1}=\text{ENC}_{M_i}(H_i)\oplus H_i$ (where the subscript of $\text{ENC}$ is the key);

the hash is $H_n$.

If we use AES-256, we obtain a $128$-bit hash, with $n=\lfloor(l+320)/256\rfloor$ AES encryptions (and sub-key derivations) where $l$ is the message length in bits. There are other constructions, but this is the less inefficient we have when using one of the three AES ciphers. Its main drawback is that the output width is limited, making brute-force collision search a real issue (it requires about $2^{64}$ encryptions and can be efficiently distributed, see Parallel Collision Search with Cryptanalytic Applications). Also, there is a second-preimage attack with cost $2^{129}/n+2^{65}$ encryptions, generic to Merkle–Damgård hashes (attributed to R. D. Dean in his 1999 thesis section 5.3.1, and also exposed by J. Kelsey and B. Schneier in Second Preimages on n-bit Hash Functions for Much Less than 2n Work).

By building the compression function by the Hirose construction, we can double the output width to $256$-bit (for more robust collision-resistance), at the price of about four times as many encryptions (and only twice as many key derivations, but that still means quite a slow hash compared to SHA-256, except perhaps if AES-256 is hardware-accelerated):

the message is padded to form $n$ blocks $M_i$ as above, except that the block size is the width of the block cipher's key minus the width of the block cipher's block (thus we use a $128$-bit message block size with AES-256; that would be $64$-bit with AES-192, and we can't use AES-128).

$G_0$ and $H_0$ are set to some nothing-up-my-sleeves values of the width of the block cipher's block;

Both Davies-Meyer and Hirose compression functions have security arguments that in a Merkle–Damgård hash they are collision-resistant within about the collision-attack limit implied by their output size, under the assumption that the block cipher is computationally indistinguishable from a random one. I do not see that the known minor related-key weaknesses of AES are a practical threat [Update: but those known attacks can only get better; thus some have given more prudent advice; and I have opened a question about that].

The (different) question that motivated the present answer asked if CBC would be usable. No, CBC is not directly appropriate for constructing a hash from a block cipher. CBC encryption would not give a fixed-width output. In CBC-MAC, what would the key common to all the encryptions be? It is bound to be known to the adversary (there is no secret in a hash) and independent of the blocks past the beginning of the message, which allows manipulating the end of the message to have whatever desired result.

Yes, it is possible to construct a hash function, or even a message authentication code (MAC), from a block cipher. The easiest way is to simply encrypt your input data with a pre-selected key, in a chaining mode such as CBC, and use the last output block of the cipher as your hash. However there are problems with this simple approach. Depending on the properties you need from your hash function, there are various approaches. An excellent treatment of the subject can be found in the Handbook of Applied Cryptography, chapter 9, starting especially in section 9.4.1, "Hash functions based on block ciphers."

Beware, however, that constructing a cryptographically-strong hash is now considered more difficult than it was when the above-mentioned book was written. I would strongly recommend that you use SHA256, SHA384, SHA512, SHA-3, or one of the SHA-3 candidates — or a MAC based on one of these — unless you have a very good reason not to.

CBC can be used to turn a cipher into a MAC, but not into a has. In particular if the attacker knows the key for a CBC-MAC, he can trivially find collisions/pre-images.
–
CodesInChaos♦Feb 25 '13 at 11:55

Yes it definitely is possible. For an example that is widely in use today, see bcrypt, a password hashing algorithm based on the blowfish cipher.

To quote from wikipedia,

Provos and Mazières took advantage of this, and took it further. They developed a new key setup algorithm for Blowfish, dubbing the resulting cipher "Eksblowfish" ("expensive key schedule Blowfish"). The key setup begins with a modified form of the standard Blowfish key setup, in which both the salt and password are used to set all subkeys. Then there are a number of rounds in which the standard Blowfish keying algorithm is applied, using alternately the salt and the password as the key, each round starting with the subkey state from the previous round. Cryptotheoretically, this is no stronger than the standard Blowfish key schedule, but the number of rekeying rounds is configurable; this process can therefore be made arbitrarily slow, which helps deter brute-force attacks upon the hash or salt.

A very common approach for construction hash functions is using Enc(k, m) xor m as a compression function that maps the input k||m to a shorter output, where Enc encrypts a single block with a block cipher.

The block size of this cipher corresponds to the output size of the hash, and should be at least 256 bits to be secure.

The cipher should not suffer from related key attacks, and rekeying must be fast.

Many common hash functions, including SHA-1, SHA-2, Blake and Skein are based on such a construction (with minor variations). SHA-3 is a notable exception, since it uses an unkeyed permutation and omits the feed-forward(xoring the message which makes the compression irreversible).

AES is not a good choice here, since by default it has only 128 bit blocks (rijndael supports 256 bit blocks), and suffers from related key attacks.

What does not work is simply encrypting the message in CBC mode using a fixed IV and using this as a hash. This construction is known as CBC-MAC. It can be used as a secure MAC, but not as an unkeyed hash. For further reading on this, check out: Matt Green: Why I hate CBC-MAC.

A "cryptographic" hash function commonly has to fulfill two properties:

It is collision resistant, meaning that there is no efficient (probabilistic polynomial time adversary), who can find two different messages that map to the same hash value

It is compressing, meaning that takes a 'long' string and outputs a shorter hash value.

Simply encrypting a string with a block cipher and using the output as the hash value does not necessarily work, since the encryption requires a secret key, the output will not necessarily be shorter than the input and collisi on-resistance is not guaranteed. In general these are simply two different crypto primitives with different "requirements".