I need to store encrypted Personal Account Numbers (PANs) in a database. The only encryption option I have available is CFB mode with a fixed (0x00) IV. I can choose my cipher, and have chosen AES-256.

Encryption and decryption are performed without the need for the application to 'know' the key, which is stored securely 'elsewhere'.

For the sake of this exercise, the PANs are 16-byte numbers (16 decimal digits); the attack I'm currently addressing is theft of a copy of the database, with all the encrypted PANs, any of which can be repeated.

I fully understand that the use of a constant IV is a weakness, but this is beyond my control.

Will any of the following options mitigate the weakness?:

I have an 8-byte binary clock available whose contents change once every microsecond or better, and which repeats every ~130 years. I prepend this clock to the plaintext PAN before encryption, and discard it after decryption.

The same, except that I prepend and discard two copies, to complete the first 16-byte block.

I generate 8 bytes of random data and prepend.

I generate 16 bytes of random data and prepend,

Specifically, I should like to know:

a. Are 16 bytes sufficient while 8 are not?

b. Are random bytes sufficient while the µsec clock is not?

c. Do these options completely compensate for the weakness of the fixed IV?

At the risk of flogging a dead horse, some additional comments, following the excellent answers (especially from @D.W.)

I have always agreed to the principle of 'Do not roll your own', but now, even more so.

I think it relevant to point out that these 'messages' never transit, but are only stored 'at rest', and that the attack I'm defending against in this exercise is 'they stole my database'. So I'm not sure what

D.W.'s chosen plaintext attack is a real eye-opener.

In this scenario (input, charge, store, perhaps retrieve for refund), I'd like to understand better what advantages authentication would bring me.

I'd welcome further comment on using the TOD clock as an IV. Here are some additional details:

The TOD clock is 64-bit integer representing the elapsed time since 00:00 on Jan 1,1900.

The clock is updated in such a way that bit 51 is guaranteed to be updated once per microsecond. Bits 0-51 are the unsigned number of elapsed microseconds.

On faster machines, bits in the range 52-63 are also updated at their corresponding frequencies.

The system in question runs as a single image on a single real machine.

Clock synchronisation is unlikely to set the clock backwards, but let's analyse that case:

If at peak I'm doing 100 encryptions per second (the real number is a lot less), and I assume the slowest clock (1M per second). Let's suppose that the clock gets set back by 5 seconds (I believe that to be extraordinarily extreme. It's GMT, by the way so there's no daylight saving)

Now, I apply the birthday paradox to 1000 encryptions (5*(100+100)) in 5 million; the probability of two or more synonyms is less than 1.3E-7

2 Answers
2

CFB with a fixed IV? Yikes! That is completely insecure: for the first 16 bytes of plaintext, it is even worse than ECB mode, and that's saying something. Please go enlighten whoever thought it was a good idea to expose this as the only mode of encipherment available (or even one among multiple options).

Let me elaborate. It sounds like the baseline is to use CFB mode with a fixed all-zeros IV, to encrypt a 16-byte (one-block) message. In this case, the encryption of the PAN $P$ will be

$$C = E_K(0) \oplus P.$$

This formula will be used for the encryption of every single PAN. Notice that the 16-byte value $E_K(0)$ is the only secret needed to decrypt -- and given a single plaintext/ciphertext pair, you can easily recover $E_K(0)$ via the formula $E_K(0) = P \oplus C$. Therefore, given a single valid PAN and its corresponding encryption, I can decrypt all other PANs. For instance, if I know my own PAN, and my PAN is stored encrypted somewhere in the database, and I get a copy of the database, then I can decrypt everyone else's PAN. That's not good! This highlights why you have to do something.

This is why cryptographers are so big on "Don't roll your own", and why they instead recommend you use well-vetted high-level schemes, like GPG for encryption of data at rest or TLS for encryption of data in motion.

You mention some possible workarounds. If you use a random 16-byte prefix, the problem above goes away, as you've rendered this basically equivalent to CFB mode with a random IV. Yes, that works.

No, a 8-byte prefix is not sufficient: it would leave the first 8 bytes of the PAN unprotected and vulnerable to the attack I described above.

Based upon your edits, a 16-byte prefix containing a clock value sounds like it's probably OK. Normally, I'm wary of relying upon clock values to be unique, because it is possible for clock values to repeat (e.g., because you're running two copies of the software in two different VMs, or because you run the software in a VM and do a checkpoint and rollback, or because NTP adjusts your clock and your clock runs backwards for a little while -- yes, it can happen). It sounds like you've addressed those concerns. I might still be a little bit wary, because using a clock is more sensitive to how the software is deployed (it would be easy for some ops person to someday in the future move the software over to the cloud or to a virtualized environment, without realizing that this might invalidate your security analysis). When doing a security design, I find it satisfying if I can reduce the number of external assumptions I'm making, so I would find a design that uses a random 16-byte prefix cleaner (e.g., read 16 bytes from /dev/urandom). However, if there are other constraints that make a clock preferable, you'll probably be OK going with a clock.

No matter what you use as a prefix, you still will be missing authentication for the data. I always recommend using authenticated encryption, not plain encryption. In this case, I would be hard-pressed to identify a benefit of authentication. (It helps prevent tampering with the data in the database, if an attacker gains the ability to read/write your database but not otherwise compromise the integrity of your software -- but it's not clear that this matters in practice.) So, let me admit up front that I can't see immediately see any serious risk from omitting authentication, in this scenario. That said, as a design principle, I like to simplify security analysis as much as possible. If adding authentication prevents me from having to think hard about whether there might be any scenario where authentication matters, I'll usually just reflexively apply authentication, as it usually is effectively free (or cheaper than trying to think carefully about the consequences of lack of authentication). That may just be a personal idiosyncracy/style.

Thank you. In the light of your excellent answer I've (a) Moved my accept tick (again), and (b) added further detail to the question.
–
Brent.LongboroughNov 20 '12 at 10:25

A small detail: if one uses a CFB offset of 8 bits, am I right in thinking it's a little bit more complex that just C=EK(0)⊕P?
–
Brent.LongboroughNov 20 '12 at 12:57

1

@Brent.Longborough, The answer will probably depend upon specifics, such as exactly how much known plaintext the attacker has. But in any case, this may be getting too far off-base to explain in a comment thread, so maybe you'll need to post a second question. Sorry about that. Or you could just take it as a bottom line: it's gonna be safer to prepend with a random 16-byte IV, even if you're using 8-bit CFB mode (with a constant IV).
–
D.W.Nov 20 '12 at 20:02

2

@Brent: One notable weakness of 8-bit CFB with a zero IV that, for every key $K$, there is some byte $B_K$ such that encrypting $B_K$ with CFB-8 will yield a zero byte, and will thus leave the internal state unchanged. Thus, any plaintext beginning with $n$ bytes of value $B_K$ will yield a ciphertext beginning with $n$ zero bytes.
–
Ilmari KaronenNov 22 '12 at 21:32

I recommend that you prepend a random 16-byte prefix. Prepending a random 16-byte prefix, before encrypting with your CFB mode, will be just as good as using a random IV. The argument is pretty similar to Using CBC with fixed IV.

If we use CFB with an all-zeros IV and a random 16-byte value prefixed to the message before encryption, as you suggested, we get a scheme that looks like this:

$C_0 = 0$

$C_1 = E_K(C_0) \oplus IV$

$C_2 = E_K(C_1) \oplus PAN$

where $E_K(0)$ is a constant and $IV$ is your random 16 byte value. Since a constant xor-ed with a random value is itself random, the above has the same properties as:

$C_1 = IV$

$C_2 = E_K(C_1) \oplus PAN$

which is the standard CFB construction (with a random IV) with different indices.

Strictly speaking, the 16-byte prefixes don't need to be random. The above approach is probably secure as long as the 16-byte prefixes are all unique. If you pull 16 random bytes from a well-seeded crypto PRNG, they should all be unique. But I don't trust your other schemes to produce unique values. Therefore, I don't recommend using any of your other approaches.

Note that while this technique of prepending a random prefix works for some modes, such as CBC or CFB, other modes, such as OFB or CTR are totally broken in such a scenario.