Sunday, April 30. 2017

The previous week I was working with CRLs files and a very big PEM CRL bundle was taking several minutes to being parsed using Java. The complete file was several megabytes in size but the fact did not justify such a long time. A Certificate Revocation List (CRL) is a bunch of certificates issued by a Certificate Authority (CA) that have been revoked (the certificate is not valid anymore). Usually a CRL is kilobytes in size, when a revoked certificate expires it is usually removed from the CRL (it is invalid anyway), and this practice makes the CRL size be moderate. The revocation info is just the certificate serial number and the date, so thousand and thousands of them are needed to have such a big file.

Looking at the code the method to parse the CRLs is generateCRLs. In my case the file was a list of PEM encoded CRLs (PEM is base64 DER format and enclosed between a header -----BEGIN X509 CRL----- and a footer -----END X509 CRL-----, DER format is a binary format using ASN1). The X509Certificate implementation accepted both formats (DER and PEM) but it seems to do something very weird to deal with the PEM format (you can see the code in OpenJDK). The DER format is read directly, but PEM is first moved into a memory buffer, the base64 data is decoded and the resulting DER is parsed. The buffer is initially 2KB in size and it grows by chunks of 1K (the Arrays.copyOf method is used, so a new array is allocated and the data copied again and again). This is not a bad idea if the CRL is several kylobytes in size but, if it has several megas, the process is extremely slow.

Obviously our decision was transforming our CRL file from PEM to DER format. The load took just a few seconds instead of the several minutes it was taking before. Nevertheless I spent some time trying to improve the performance of the java method. Now java8 has standard Base64 encoder and decoder and, more important, it has a method to directly read from an InputStream. This way the extra buffer can be avoided and it always read from the file directly. I tried my idea and it works. I opened a bug against the java bug database and it is public now. My patch is there if you need it.

With the modification an enormous CRL file of 30MB (around one million revocations) gives the following times:

DER: 2 seconds.

PEM (patched): 14 seconds

PEM (before): 379 seconds

So, the DER file is always faster (there is no Base64 decoding) but, at least, now the PEM process time is in the same scale. If you need to manage CRLs files in java, please, never use the PEM format, always manage DER native format. I do not know if my patch will be finally included but, it does not matter, it just improves the situation, DER will always be faster and with a smaller footprint.