I was wondering, whether there exist such algorithms/enciphering procedures which both compress and encrypt the input data. That means, for starters, the output will be both smaller in size and difficult to decrypt...and if the compression algo is good, then the bits will be almost random.

Also any twin-encryption algo-s around?: by which I mean, suppose I have 2 data strings (alphanumeric only, say for now) -- Using them both, and an algo, I produce the encrypted output - I take in a pair, and produce a pair. The procedure is algo-based and not key-based. Any comments on this? And how this could relate to the earlier crypto-compression problem? (One way it would relate is, if the data is not just strings of characters, but a large multimedia file, it can be compressed fast and securely apart from being encrypted)

The FAQ says "please ask questions that can be answered, not merely discussed" but I can not be more specific, atleast not now: if it had some coding, I might have posted it on stackoverflow!

All comments, with links to other Q&A are welcome. Other forum addresses are also welcome!

5 Answers
5

Unlike some crypto tasks like encryption+authentication combining compression+encryption have nothing in common/non synergies, so combining them into one algorithm offers no advantages.

In practice this means you first compress your data, and then encrypt it, because encrypted data is uncompressable. That way you cleanly separated the separate concerns, and you can vary them independently.

A good point to combine them is at the protocol/fileformat level. For example TLS supports compression and encryption, as do most archive formats(zip, rar, 7z,...).

But compression can hurt security: Typical encryption is designed to hide everything about the data, except its length. Compression depends on the data itself and affects the length of the data. That means that it can leak information about the data through the length bypassing the encryption. This is particularly severe if you compress data chosen by the attacker and secret data within the same context. This lead to the CRIME attack against TLS compression.

I believe the encryption+channel coding couple is also a non-relevant combination but there exist some schemes such as code-based schemes which use this idea for the purpose of performance. In my opinion the idea of compression+encryption makes more sense since both encryption and compression functions outputs are some kind of data with high entropy.
–
HabibNov 7 '14 at 12:48

Also any twin-encryption algo-s around?: by which I mean, suppose I have 2 data strings (alphanumeric only, say for now) -- Using them both, and an algo, I produce the encrypted output - I take in a pair, and produce a pair. The procedure is algo-based and not key-based.

One fundamental fact (or perhaps I should say "assumption") in cryptography is that you cannot securely encrypt data without there being some unknown secret involved. This is known as Kerckhoffs's_principle. In such a scheme as you propose here, any adversary with access to the algorithm could reverse the process and decrypt the data trivially because the algorithm was simply a map between two inputs and two outputs. We have to assume that the adversary knows the algorithm, perhaps it is publicly available, perhaps they craftily reverse-engineered it, etc, and so they will be able to perform the reverse mapping.

What if the algorithm depended on the data (or some part of the data itself)?
–
pratchitApr 6 '12 at 5:35

How are you going to decrypt it without that data? I'm not worried about the encryption part, but rather the decryption part. Once you walk away from the data, anyone who wants to decrypt it will either need access to some unknown value or they will not. If they do, then for all effective purposes you are using some form of a key for the encryption/decryption process. If they do not, you're using obfuscation, not encryption, and they will be able to reverse it.
–
B-ConApr 6 '12 at 6:24

I think a combined algorithm might be less secure. Imagine compressing a long string of 1's with a LZW algorithm, which would reduce it to a very short message, and then encrypting it. The reduction in size would leak information about the nature of the plaintext.

If we consider the case of unknown-message-attack the attacker won't be aware of the reduction in size because whether the sent message is long and redundant or short and with high entropy is not known to the attacker. Also in case of known-message-attack the reduction in size won't help the attacker to guess anything about the key if we assume that the reduction in size is independent of the key. Therefore your argument look flawed to me.
–
HabibNov 7 '14 at 19:14

The usual constraint in cryptography thought experiments is that the attacker knows everything but the keys to the encryption, so he would be aware of the reduction in size. In any case, this is just a thought experiment to demonstrate the concept that more might not be better.
–
ddyerNov 7 '14 at 19:19

Although I do not have a formal proof I think a compressing encryption algorithm could not be secure. The reason is that we know there is no way to compress every input (information theoretically this is impossible). That means that a lossless compression algorithm can really only compress certain input strings. That in turn means that if our encryption algorithm manages to compress an input, then that fact reveals information on the input.

Although I think this is correct for random plaintext, it is not necessarily the case for non-random plaintext. Say for instance we know that the input will always be ASCII text, encoded into bytes. In that case we could always re-encode the text to use 7 bit character encoding. As the reduction would be identical for any text, you would not leak additional information through the size of the ciphertext.
–
Maarten BodewesNov 7 '14 at 18:20

This is also more or less the same point as this answer is trying to make, although your answer puts it a bit more clearly.
–
Maarten BodewesNov 7 '14 at 18:25

These methods mostly aim to increase performance by combining the two blocks of "Source Coding" and "Encryption". It worth mentioning that there exist a different but successful attempt for combining the blocks of "Encryption" and "Channel coding". The very best classic example is McEliece cryptosystem.

"usually in cryptography it is unwittingly assumed that the input data to encryption block is already fully compressed" -- huh? In cryptography, we explicitly make no assumption what the plaintext looks like; we expect the encryption to be secure whether the plaintext is fully compressed, or is extremely redundant (e.g. consists of all zeros), or somewhere in the middle.
–
ponchoNov 7 '14 at 15:55

Maybe the statement falls short because of the term "usually". I'll edit that :). But, an example of such assumptions is a deduction of Shannon's perfect secrecy that demonstrates the "key space" should be bigger than "message space". This is acceptable only in the case that message is fully compressed.
–
HabibNov 7 '14 at 16:53

@Habib: The size of the message space doesn't change by compression, assuming you are using lossless compression (i.e. compression which can be reverted). You just change the encoding of it to a form which is on average shorter.
–
Paŭlo EbermannNov 7 '14 at 18:39

@PaŭloEbermann Yes! that's precisely correct. However, the probability distribution of the messages changes after the compression and it would be closer to the uniform distribution which subsequently would result in higher entropy. Therefore, in case of uniform distribution for the message and the key the result of "bigger key space than message space" is deducible from Shannon's Perfect Secrecy Theorem. However, :-| I'm going to admit there shouldn't be such an assumption for any encryption and also that could not help with the crypto-compression thing. Or, maybe I should ponder this more.
–
HabibNov 7 '14 at 19:01