Sidebar

AES New Instructions

AES-NI is the new buzzword for fast cryptography. Or is it? This article will show that it improves throughput on some processors, but it does not guarantee that.

AES-NI is used in IPFire to encrypt SSL/TLS and VPN traffic. Despite some other usages these are major ones that also require high throughput if you are connecting to the Internet over a fast connection.

Keep in mind

AES-CBC can only be performed on one core at the same time (AES-GCM can take advantage of multiple cores but is less commonly used)

How does AES-NI work?

Each block (128 bits of data) that should be encrypted or decrypted is copied into memory. The key is also in memory. An AESENC (for encryption) or AESDEC (for decryption) instruction is executed that performs one round of AES. Then, the key is rotated with help of some other instructions and the next round is performed. For AES-128 10 rounds are required. For AES-256 it is 14.

So it is not one instruction that does it all. It is rather a set of instructions that perform different things. The idea behind that is that even if the algorithm is slightly modified (e.g. longer keys or more rounds) the instruction set is still usable. The instructions also used to accelerate other tasks than AES.

Performance considerations

The downside of that is that it puts more load on the memory bus and requires more operations. Among those are copies of memory pages into the registers of the processor and back. Of course the operations are complex and take some time.

Conclusion #1: The performance of AES-NI is highly dependant on clock speed

The faster the processor executes instructions, the faster a single round of AES is. This is usually quite linear with the clock speed as there is no chance to optimise a lot without using too much space on the die. Intel Xeon processors are likely to do that, but for smaller processors this would cost too much energy.

Conclusion

AES-NI makes operations faster on the same processor. We have not seen any processor that is slower with AES-NI or where the improvement is negligible on the same processor.

There are low-end processors (or many SoCs) that are implementing AES-NI only in microcode but the memory bus is the bottleneck.

So it is obvious that one has to be careful with the AES-NI label on low-end hardware. A similar processor without AES-NI might be as fast or even faster as one that comes with a poor implementation of AES-NI. So the advice is not only to check if AES-NI exists, but also consult benchmarks if it is actually faster in this class of processors.

Side notes

AES-NI is not VPN throughput

Please note that the AES-NI throughput is not equal or roughly equal to the VPN throughput. Unfortunately in most cases this is running on only a single core on which integrity must be computed, too. The lower the performance of this core, the less time for AES-NI is left to sustain a good throughput. A processor with a higher clock speed and higher single core processor is highly advantageous in this situation.