The above 3 deletions and 1 addition are the fix for a bug in my decryption routines, which took me 3 days to find. Lets examine the bug in more detail in order to understand what the code does, what went wrong, and why it took me so long to find it.

As I described in an earlier blog post, an encrypted OpenPGP message basically consists of the symmetrically encrypted body and a bunch of public key encrypted session keys. In Bouncycastle, the encrypted session keys are called PGPPublicKeyEncryptedData objects. The are encrypted using the asymmetric public key of a recipient and they contain the symmetric session key which was used to encrypt the actual message. So if Alice writes a message to Bob and Carla, the message will contain at least two PGPPublicKeyEncryptedData elements. One containing the session key encrypted for Bob, the other one for Carla.

In PGPainless, I iterate through all PGPPublicKeyEncryptedData objects of an incoming message in order to get a list of all recipients of the message. I do that because I can imagine it would be cool if Bob can see, that the message was also encrypted for Carla. Also during every iteration I see, if the current PGPPublicKeyEncryptedData object is encrypted for a key of which I do have the secret key available. If so, I extract the private key for later use.

Lets have an example:

Alice encrypts a message to both herself and Bob. Then she sends the message to Bob. Bob tries to decrypt the message.
PGPainless saves the first PGPPublicKeyEncryptedData object in the variable encryptedSessionKey.
This first encrypted session key has Alice’s key ID, so PGPainless adds the id to the list of recipient key ids and goes to the next iteration. Again it stores the encrypted data object in encryptedSessionKey.
This second encrypted session key has Bob’s key ID. PGPainless finds the respective private key, extracts is and saves it to the variable decryptionKey.

Now the iteration ends, as all PGPPublicKeyEncryptedData objects have been processed. Now the decryption can begin. Using the extracted private key, Bob decrypts the encryptedSessionKey to retrieve his session key with which he decrypts the message. Easy Peasy Lemon Squeezy – at least that is what I thought.

For some strange reason however, during integration testing I noticed that every now and then decryption would fail. I thought it had to do with some race conditions at first. I blamed JUnits @Before and @After annotations which I used to delete the key rings from disk after the test was done to fire too early. Coincidentally after I removed the annotations the test would work flawlessly, which encouraged me and seemed to confirm my hypothesis.

However, the more tests I did, the more confused I became, as I could not find the cause of the failing. Bouncycastle has debugging disabled for their library, so I could not follow the calculations step by step. This is done for security reasons I guess. Fortunately there is a debug version which I discovered today. Using this library, I can see, which lines are responsible for throwing exceptions and step through the execution in great detail.

The final hint however came from a forum post. So lets see what exactly went wrong. For that purpose lets assume the same scenario as above, but with a slight twist.

Again, Alice encrypts a message to herself and Bob, but this time she prepends Bobs session key first.
Bob, when decrypting the message, stores the first PGPPublicKeyEncryptedData object in encryptedSessionKey, notices that this is his own session key and extracts his respective private key into the variable decryptionKey.
In the next iteration (remember, Bob wants to know all recipients), he stores the next PGPPublicKeyEncryptedData object in encryptedSessionKey. This is Alices session key.

Now he proceeds with decrypting the encryptedSessionKey with his decryptionKey – and hell breaks lose. Why? Because at this point in time encryptedSessionKey actually contains Alices session key, not Bobs.

The tricky part about this bug is, that it only happens by chance. I’m not sure in which order session keys are prepended to the message by Bouncycastle, but in certain cases this caused my code to fail – while in certain cases it did not. One benefit that OpenPGP has over OMEMO is, that you can write better tests. In OMEMO keys always change when the ratchet advances, so you cannot really test if certain messages decrypt. At least it is much easier using OpenPGP. But this bug told me a lesson, that you have to be very very careful with your JUnit tests. At some point I replaced randomly generated key pairs with some fixed keys to get more reliable test results and to be able to confirm the result using GnuPG. Ironically I was very lucky that as a result my test would reproduce the second scenario above. If instead it would have produced the first scenario, it would have taken me much much longer to discover the cause of the issue.

Fighting this bug took me 3 days. Luckily I didn’t spend 100% of my time bug hunting. I also wrote another integration test, which covers one very cool feature of OpenPGP for XMPP.

Secret Key Synchronization

In OX it is possible to store the secret OpenPGP key in a private PubSub node. That way the key can be easily transported to other devices, so you can read your encrypted messages in a web browser for example. This is one huge benefit over OMEMO, where you can only read messages received *after* you began using the new device.

During my testing, I also found out, that ejabberd despite announcing support for alternative PubSub access models, does not properly handle some requests.
For normal messaging, OX recommends to publish the public keys of users in a PubSub node configured with the ‘open’ access model. That allows users to access your public key, even if they are not in your roster.

Smack however does a disco info query on a PubSub node prior to fetching it in order to get some metadata about it. It turns out that ejabberd does return an error for such a request, stating that you have to be subscribed to the owner of the node in order to fetch information about it. I’m pretty sure though, that this bug will be fixed soon 🙂