I have noticed that some programs used for file encryption will tell you if an entered key is wrong when you try to decrypt. It seems (to me at least) that this would mean that the key somehow is written into the encrypted file. And the algorithms that I know of will produce an output even if the key is wrong.

How does one built into an algorithm this type of validation of a key? Is it just a matter of encrypting the key with the original text?

3 Answers
3

First off, many block modes of operation require a message to be padded so that its length is evenly divisible by the block size of the cipher. CBC mode (Cipher Block Chaining), for instance, typically pads a message either with an entire block of zeroes if it happens to be exactly divisible by the block size, or with a given number of bytes that will extend the message to the next block, each of those bytes being set to the number of bytes added as padding (one byte set to 0x01, two bytes set to 0x02, 16 bytes set to 0x10, etc).

Now, you have a checksum of sorts built into the message. If the last block of the message, when decrypted, doesn't have valid padding, then the decryption has failed; either the message was corrupted in transit (CBC mode results in a "cascading" of error due to the XORing of the previous ciphertext block with the current plaintext block before encrypting), or the key used to decrypt was incorrect. An application for file encryption that uses some other sort of integrity check, such as mirroring/parity, could verify that the data is OK, and thus the only other explanation is that you used the wrong key.

As an aside, a system that can tell you whether a particular ciphertext message was properly padded is known as a "padding oracle", and it is a vulnerability of modes like CBC, because the cipher (initialized by the legitimate user with the proper key) can be fed a series of "chosen ciphertexts" to try to decrypt, each based on combinations of the real ciphertext and some random data, and the behaviors analyzed to reverse-engineer the real plaintext.

More advanced cipher modes incorporate a single-purpose message authentication feature into the encryption, which will cause decryption to fail in the same way with either a bad key or a corrupted ciphertext. CCM, which is Counter w/ CBC-MAC, is one of these modes; first, the message is "hashed" by running it through CBC encryption with the given key, but only keeping the last block of the ciphertext (remember that "chaining" of each previous ciphertext into the next block of plaintext, and the cascading error it causes? That's a beautiful way to calculated a "keyed hash" of the message). The message and its MAC are then encrypted again in Counter mode (related to CBC but slightly different; instead of the previous block of ciphertext, a nonce, produced by a combination of the IV and a sequential counter, is combined with each block of plaintext to "salt" it) to produce the ciphertext that is transmitted or persisted.

To decrypt, the message is decrypted in Counter mode with the key, then the message portion is hashed in CBC mode with the same key and compared to the decrypted MAC. If the MACs don't match, an error is given. Again, as used in a file encryption application, if there is an independent method of verifying that file integrity is good, the only other explanation is that the wrong key was used.

Continuing the aside, the beauty of this mode is that there's no way to turn it against itself as a padding oracle; if a ciphertext has been tampered with, or if it was decrypted with the wrong key, the MACs won't match up, and with that being the test for proper encryption (and thus the error given), decryption failure gives an attacker much less information (pretty much every attempt except one using the correct key and an untampered-with ciphertext will fail with exactly the same error every time). Another similar mode is Galois/Counter Mode or GCM, which has similar behavior but better performance and parallelization due to the use of a faster checksum calculation.

It's not a security problem but a necessary feature. It's not an exact science to distinguish a "good decryption" from a "bad decryption". What if the user had encrypted random data? you would not be able to figure out if the key is correct or not from that sole information, since in both cases the decrypted output would look completely random!

Similarly, what if the user typed his key wrong? You want to be able to inform him he made a typo ("invalid password") instead of blindly decrypting garbage and waiting for the user to realize he typed it wrong and try again.

What most programs do is store the hash of the key (or an HMAC of the encrypted file, using the encryption key) so that they can verify if the key is correct (or if the file is corrupt - you get free integrity/authentication with the HMAC method) without disclosing the key at all. An attacker would need to find an exact preimage to obtain the actual encryption key, which is designed to be infeasible.

In essence, you use a one-way function to store the key's image somewhere in the file, so that anyone without the key cannot invert the function and retrieve the key, but anyone who has the key can easily feed it into the one-way function and compare the result with the value that's in the file. If it matches, it's the correct key! (with very high probability)

You should not store a normal hash of the key, but preferably (as mentioned) encrypt or MAC something with the key which then can checked.
–
Paŭlo Ebermann♦Jan 7 '13 at 17:07

@Paŭlo: While storing a plain hash of a low-entropy password would be bad, I'm not aware of any issues with storing a hash of an encryption key properly derived from the password using a key-stretching KDF. If there's something wrong with that that I'm missing, please do let me know.
–
Ilmari KaronenJan 10 '13 at 1:11

By using padding, one can tell if the decryption is correct. Padding is used when the message length is not a multiple of the block size. You append predictable data at the end of the message (one "1" followed by several "0" for example) and then you encrypt it. If you find the correct "1000..." sequence at the end of decrypted message, it means it's ok.

Note most symmetric padding schemes will fail to provide integrity with probability $\frac{1}{256}$ in the worst case, so it's really just a sanity check and not a substitute to proper integrity checking.
–
ThomasJan 11 '13 at 21:09