Apache Kafka comes with a lot of security features out of the box (at least since version 0.9). But one feature is missing if you deal with sensitive mission critical data: Encryption of the data itself.

Sure, this could simply be accomplished by encrypting the disks on which the Kafka brokers store their data. But the unencrypted form may still reside in memory or other caches and, even worse, anyone who can access the appropriate message can just read it. So it seems that disk encryption solely is not enough for sensitive data. In addition to “just encrypt the data” we need a mechanism to make sure no one altered the data after the producer sent the message. Keep in mind that Kafka, which is a distributed system, have to deal with insecure networks and SSL/TLS may not protect your data in every case under all circumstances (weak cipher suites, man in the middle attacks and openssl bugs, etc.). Furthermore with SSL/TLS enabled Kafka cannot leverage the sendfile syscall anymore (which writes a pagecache directly to a socket).

To achieve all these requirements the producer has to encrypt the messages before pushing them over the wire into Kafka and the Consumer needs to decrypt them upon retrieval. So both endpoints of the communication link need to handle the security aspects which is called End-to-End security (or sometimes also End-to-End encryption). The advantage of this approach is that it is totally transparent for Kafka and all other involved components between Producer and Consumer.

Encryption algorithm for Kafka

Because Kafka is a high volume and low latency message broker we need a fast (but still secure) encryption algorithm which can encrypt an arbitrary amount of data. The obvious choice here is AES (Advanced Encryption Standard) mainly because of the widespread and common hardware support which is available. Modern Intel and AMD processors support AES en-/decryption natively within the CPU which is a lot faster than AES software implementations. The problem with AES in our context is, that it is a symmetric cipher which means that there is only one key which is used for encryption as well as decryption. This leads to the question how a secure key exchange between producer and consumer can be accomplished. The short answer is: it is not possible if you not already have a secure channel (and having that would make establishing another one maybe unnecessary).

So we need another mechanism to conduct secure key exchange. Luckily this challenge is already solved leveraging asymmetric cryptography whereas encryption and decryption is performed with different keys (they belong together and are often referred to as key pair). One key is the so called public key which is used for encryption and the other one is called private key and can decrypt messages encrypted with the corresponding public key. The private key cannot be derived from the public key so it is safe to transfer the public key over an insecure channel.

One could now ask why do we not just use an asymmetric cipher (instead of AES) and we are done. The reason is that asymmetric ciphers are much slower than symmetric ones and they cannot encrypt data bigger than the cipher’s key size. In our case (let’s choose RSA for now as our asymmetric cryptosystem) the typical key length for RSA considered secure is nowadays 4096 bit (=512 byte) but for practical reasons we go with 2048 bits (=256 byte) here. To use RSA as the only one algorithm we would need to chunk our data in pieces of 256 bytes length. That sounds not very practical. But RSA is well suited to encrypt our AES key (which is either 16 or 32 bytes in length). So let’s do this.

Now we need to make the usage of AES and RSA encryption semantically secure. That means that encryption of slightly different messages (which have some identical content) with the same key does not contain informations which allows to derive the key from that. To circumvent this for AES encrypted messages you either have to generate a new random key for every message or to use a the same key but choose a AES mode which support initialization vectors (IV) and use a different IV for every message. For RSA we need to apply a random encryption padding schemes such as Optimal Asymmetric Encryption Padding (OAEP).

As mentioned above RSA en/decryption is pretty slow so we need to avoid it as much as possible. That is why it makes sense to use the same AES key for encrypting more than one message and use an initialization vector to ensure semantically secureness. On the decryption side we need to somehow cache the decrypted AES key. To know which AES key is used for the particular message without submitting it in plaintext (or decrypt it constantly) it is necessary to add a key hash which serves as caching id. To prevent hash collision attacks the hash needs to be cryptographically secure. For now let us use SHA-256 for that.

As a last piece it would be nice if the consumer can detect if a particular message is encrypted or not. In the latter case the decryption is simply skipped. This can be achieved by introducing magic bytes to label encrypted messages.

With that background wiring all this together leads to the following high level message processing chain:

Consumer: 1) Check magic bytes (M). Bypass unencrypted messages 2) Extract hash(K) by looking at L 3) Extract IV by looking at L 4) If hash(K) is in cache get plain AES key (K) 5) If hash(K) is no in cache get decrypt rsa(K) to get plain AES key (and put them into the cache) 6) Decrypt aes(O) with K and IV 7) Replace M-L-hash(K)-rsa(K)-IV-aes(O) with O

So rsa(K) is only necessary when a new AES key is used (by producer) and/or the cache needs to be populated (by consumer). The drawback with caching the AES key is that it resides permanently in producer and consumer JVM memory. But without caching especially the decrypting process is too slow to be useful (see benchmarks).

Byte sequence on an encrypted message

This adds a constant overhead of 309 bytes to each message. 5 bytes for the header, 32 bytes for SHA-256 hash, 256 bytes for the RSA encrypted AES key and 16 byte for the IV. The encrypted message size may also be 15 bytes bigger than the original message due to PKCS5Padding which is used (blocksize 16). So the maximum overhead in total would be 324 byte. That may be a lot especially if you only handle small messages you easily double your data size. There is nothing we can do here but use a weaker RSA key with only 1024 bit key size. This would reduce the maximum overhead by 128 byte to 196 byte. But 1024 bit keys are considered vulnerable. A weaker hash algorithm producing a shorter hash is also not an option. And finally we cannot omit the IV nor put them into the RSA encrypted part (because that would make caching impossible). So increased message size as well as the additional CPU cycles for AES en-/decryption are the costs for security.

But that’s enough theory, let’s look at the implementation:

The basic idea for transparent end-to-end encryption in Kafka is to write a Serializer and Deserializer which wraps the original Serializer and Deserializer and adds the en-/decryption processing transparently.

Implementing a delegating Serializer and Deserializer is easy. We just need to implement two classes for that:

Benchmarks

Limitations

The design of End-to-End security discussed in this article does have some limitations. It provides currently only encryption and no kind accountability or non repudiation (because message are not signed yet). Authentication and authorization is also not covered but can be leveraged by using Kafka’s own mechanisms. It does also not protect against one sitting in den middle (Man in the middle) from dropping, replaying or reordering messages. There is also no forward secrecy present. We will discuss and add some of this features in the part 2 of this article.

Conclusion

This article and the provided implementation demonstrates how transparent end-to-end security can be applied to Kafka and add a enterprise grade security feature. We discussed the nature of symmetric and asymmetric encryption systems, how they can be combined and how much overhead they added. In Part 2 we will discuss optimizations (like batching and compression of messages) and adding cryptographic signatures to accomplish a trusted relationship between various producers and consumers. In Part 3 we will have a look on how non-Java producers and consumers can be made ready for end-to-end security.

Tags

Hendrik Saly is working for codecentric AG as an IT-Consultant and is doing a lot of Elasticsearch consulting and development. He is also involved within the Java Community process (JCP) as an Expert Group Member of JSR 367 and JSR 374 and engaged as an active Apache committer. He spoke already on various conferences, local meetups and customer events. Hendrik is an IT professional since 2001 and has worked for Pixelpark AG, akquinet AG and PTA GmbH and as a freelancer.