I was not able to understand why we practically need a CPA security in Cipher Block Chaining. (which insist on having a random IV), let say if the encryption is not CPA secure i.e , the adversary can identify two cipher-text are same and conclude the two plain-text would have been same. of what use this information is for adversary. i see he is able to know that both the message are same , so what? i think he can't find the plain-text.

In other words is CPA security is strictly a academic definition of security. Does in practice attacker can extract the plain text out of this?

2 Answers
2

Cryptography is not just about confidentiality of the message, but also confidentiality of information about the message. Given the ciphertext, an attacker should not be able to determine any information about a message without knowing the key.

If you can tell that message A is equal to message B, that's a leak of information. This could be useful when trying to identify the type of message, especially in predictable protocols. It also poses problems when you consider the possibility of replay attacks and side-channel attacks.

Attacks on the key get much easier when you know both the plaintext and ciphertext. If you have a list of known possible plaintexts, you can determine which one corresponds to a particular ciphertext because the output of the cipher is always determined by the plaintext and key. If you use CBC, the position of the message in the stream and the IV come into play, making it infeasible to find which plaintext is associated with which ciphertext.

Repeatedly encrypting the same message to the same ciphertext is full of practical attacks. Encryption is supposed to leak no information about the content of the message other than its length, and there are very real ways to exploit the information leakage you mention. Some of them have to do with the fact that plaintext domains are not always very large. Others have to do with the fact that an attacker who can perform a chosen plaintext attack (CPA) gets to learn a lot about the ciphertext.

Here are some attacks, assuming that we use a CBC-like encryption scheme that is deterministic (and thus not semantically secure):

If the attacker sees the encryption of message A, then later learns message A, he can then know the contents of message A whenever it was sent in the past or the future. That's a very real threat if the total messages that could be sent are reasonably small.

Example: Consider an army that coordinates movement by sending instructions to each other on who is to move where. The attacker might be able to catalog intercepted encrypted communication and then determine what the message meant based on what happened after it was sent. When a duplicate message is intercepted, they will know that what happened the last time the message was sent and thus what is likely to happen this time the message is sent.

Plaintexts that begin the same way will have ciphertexts that begin the same way. Even if two ciphertexts don't match, some prefix of them may match. If they do, knowing the contents of one ciphertext can lead to knowing the beginning of another ciphertext. While this isn't much different than the previous case, it means that the problem is much worse than simply having ciphertexts that are the same, it extends to ciphertexts that begin the same way.

Example: Consider a style of document with a person's (Alice) identifying information at the beginning. Even if two document contents differ, the attacker will still know the documents belong to the same person. If Alice ever sends the attacker one document, he learns how to identify a document encrypted by her. If one of those documents is recovered from another person's (Bob) computer, the attacker learns that Alice likely communicated with Bob. Information about the content of the the document is expected to remain confidential under encryption, failing to do so is a failure for the encryption.

It's also possible that the small domain of the plaintext will pair with the context of the ciphertext and reveal information.

Example: Consider a program like SSH that transmits what a user types, sending small sets of encrypted key strokes at a time. We would have limited number of possible ciphertexts because every key maps to the same ciphertext each time. If every set of messages is stand alone (no ciphertext chaining across gaps in key strokes), then an attacker could use frequency analysis to determine which ciphertext mapped to which keys. They could then do a decent job of learning what the user had typed. It would be even worse if every message were stand-alone, in which case it would be ECB. (In case it wasn't obvious, note that deterministic CBC has a similar taste to ECB.)

The attacker may be able to inject chosen plaintext into a live system and see the resulting ciphertext. (This is a very real attack that we must be able to withstand and motivation for why we consider chosen plaintext attacks.) He can then brute-force decrypt other ciphertext by simply finding plaintext that encrypts to the known ciphertext. The smaller the plaintext domain, the easier this is.

Example: A bank may encrypt the amount of money involved in a transaction in one block of the ciphertext. The attacker may be able to conduct his own transactions and determine what numbers encrypt to what ciphertexts, then be able to learn about other ciphertexts.

(Obviously, the specific players in those examples are arbitrary, you could swap them out for any number of other situations.)

The main problem is that we don't know how messages are going to be formatted and what kind of content it will contain, and it's very possible to pick formatting and content that will leak information through the encrypted ciphertext. Encryption is supposed to do it's job of protecting all information you give it, and leaving some edge cases vulnerable and telling the user that they're only secure if they take extra precautions to avoid those edge cases isn't acceptable. Nobody wants to use something that is insecure for edge case (especially not when there's the option of eliminating those edge case vulnerabilities). It would be like your house being stable for everything, except if you slam the bathroom door the kitchen ceiling falls in. Who'd want that?