I have a file I'd like to encrypt with a symmetric key using AES-256 and then use an HMAC for integrity. So I don't "roll my own" scheme which I know is a big mistake, what are the standards for using the HMAC once I've created it? Is this what GCM is for? Can I just store the HMAC with the keys or should it be included in the encrypted file somehow by either prepending or appending it to the final message? I've seen a lot about how to create the HMAC and what data to calculate it on, but nothing about what to do with once I've got it. Perhaps I'm just looking in the wrong place or making things too complicated.

3 Answers
3

Combining MAC and encryption is hard. Depending on how you do it, the result will be secure, or not, or only "mostly secure" modulo a zillion of implementation details. GCM is an Authenticated Encryption mode which does all the hard work for you, so it is warmly recommended that you use GCM (or an equivalent mode like EAX) instead of designing your own protocol.

For research purposes only, the MAC value will be part of the encrypted data if you use MAC-then-encrypt, whereas it will have to be transmitted or stored separately if you use encrypt-then-MAC or encrypt-and-MAC. Afterwards this is only a matter of encoding. "Encrypt-then-MAC" is recommended notably because since the MAC operates over the encrypted data, it cannot reveal anything about the plaintext data, so the MAC value can be shown to everybody with no ill effect.

I think part of my confusion comes from what exactly is a "protocol" but that might be a different question altogether. To make sure I understand you correctly, if I used GCM or other AEAD mode, I would only be responsible for ensuring I'm using unique (secret) keys and unique (public) IVs. Are there any security implications for transferring the IV in the clear along with the encrypted message when using an AEAD mode? Thanks for the assistance.
–
jeffaudioMar 6 '13 at 19:04

The IV needs not be secret (otherwise we would called it a key). You may show the IV to the whole world.
–
Thomas PorninMar 6 '13 at 19:45

Well, GCM is an authenticated mode which accomplishes the same goal, as is CCM (Counter with CBC-MAC), and either of these would be preferable to a home-brew implementation. But if for any reason you don't have access to a built-in library of primitives that can use these modes (.NET's System.Security.Cryptography algorithms, for instance, do not have these, but a variety of aftermarket options are available), a combination of HMAC and an unauthenticated mode can provide similar message authentication.

The short of it is that it's generally better to encrypt, then compute an HMAC of the ciphertext, and prepend the MAC (which is of fixed length and so can be easily separated when placed first) to the ciphertext. This is known as the "Encrypt-then-Authenticate" approach, and it is a generally accepted way to create an authenticated encryption mode from an unauthenticated one.

Other permutations such as "Authenticate-then-Encrypt" (compute the HMAC of the plaintext and then encrypt HMAC and plaintext into ciphertext; this is SSL/TLS's basic approach) or "Authenticate and Encrypt" (compute the HMAC of the plaintext, then encrypt the plaintext and prepend the HMAC to the ciphertext) are shown to be more vulnerable in the general case. Certain implementations may still be secure (AtE is secure when using CBC mode or a stream cipher, if padding and timing attacks are accounted for in CBC).

The EtA method is the safest overall, when properly implemented, and so it recommended for situations where a developer must implement their own authenticated mode. It will work with any secure cipher and cipher mode, and any secure MAC algorithm. However, some care must be taken. First off, the HMAC must include more than the ciphertext; it should also include the IV and, in situations requiring "algorithm flexibility", a unique identifier of the permutation of crypto primitives used (such as AES128-CBC-HMAC-SHA256). This prevents an attacker from being able to perform certain exploits with a different IV or algorithm.

One other thing to keep in mind is the possibility of a "timing attack". EtA has a very fast initial failure mode (the MAC doesn't match the ciphertext), but if the MAC does happen to match, the next possible failure (a padding error) will take longer to discover. An attacker can use the difference in computation time between a MAC error and a padding error to figure out which is which, even if you don't provide this level of detail in your feedback to a client. They can then use this to perform a chosen-ciphertext attack. It's a relatively minor concern, because this scheme makes the chances of even an intelligently-engineered change still passing the HMAC extremely low, but to mitigate this possibility, consider building in some sort of delay that will ensure that the amount of time needed to return an error is dependent on something other than raw computation time.

To extend Thomas Pornin answer, the encrypt-then-mac is also recommended since you first take the mac of the encrypted text and compare it with the mac the sender sent. You are only going to perform the processor intensive task of decryption if the two MACs are equal and it is proved that the data is not tempered with. If the two MACs are not equal, there is no point in decryption the packet since it is already proved then that the data is not authentic. If you perform mac-then-encrypt, you need to first perform the decryption and then take mac of the message and compare it with the original mac. Moxie Marlinspike call this The Cryptographic Doom Principle