https://everywhere! Encrypting the Internet

Imagine a world where the Internet is entirely secure and attackers have no place to hide. A major step toward realizing this vision of world-wide security is making sure that all the traffic exchanged between servers and clients is encrypted. This is a very difficult technical challenge since networking speeds are excessively high (10-100 Gbps), whereas cryptographic algorithms consume millions of processor cycles to execute. Intel® is researching solutions toward realizing this vision that can accelerate secure Internet transactions by orders of magnitude. First, the latest Core™ micro-architecture (Nehalem) re-introduces the feature of Simultaneous Multi-threading Technology, SMT into the CPU. SMT is ideal for hiding the cycles of compute-intensive public key encryption software under the stall times of network application memory lookups. Following Nehalem, Westmere adds new instructions for potentially speeding up symmetric encryption by a factor of 3-4X. These instructions not only provide better performance but also protect applications against an important type of threats known as side channel attacks. Last, Intel® has developed superior Integer arithmetic software that can speed key exchange and establishment procedures by a factor of 2X.

Simultaneous Multi-threading Technology

The most recent Core™ Micro-architecture (Nehalem) developed by Intel® re-introduces the feature of hyper-threading (also referred to Simultaneous Multi-threading Technology, SMT) into the CPU. This represents a major departure from the earlier Core™ micro-architecture where each core was single-threaded. As part of our research we have demonstrated that simultaneous multi-threading can result in substantial performance improvement for a certain class of workloads. Such workloads are associated with secure web transactions. We propose a new programming model where one compute-intensive thread performs only RSA public key encryption operations and another thread performs memory access-intensive tasks. We have shown that RSA is an ideal companion thread for four representative memory access-intensive workloads when Simultaneous Multi-threading is used resulting in 14%-2X potential performance gain.

The most benefit for the system comes when a thread performing dependent memory lookups is paired with an RSA thread. The throughput of the memory thread almost doubles, reaching the value it has if not paired with RSA. Another way to interpret the same result is that the RSA computation comes for free due to SMT. In reality the RSA computation is hidden under the very long stall times of the memory thread. We also observe that the throughput of a single memory thread is increased approximately by 30% when SMT is switched ON and the memory thread is multiplexed with another memory thread. The same throughput is almost doubled when the memory thread is paired with an RSA thread. These results indicate that RSA is a much better companion thread than a second memory thread due to the fact that one workload is memory access-intensive and the other is compute-intensive. The RSA performance also increases by 21%.

New Processor Instructions

In addition to SMT, in the next generation of the Intel® Core™ Micro-architecture following Nehalem (Westmere) a new set of instructions that enable high performance and secure symmetric encryption and decryption will be introduced. These instructions are AESENC (AES round encryption), AESENCLAST (AES last round encryption), AESDEC (AES round decryption) and AESDECLAST (AES last round decryption). These instructions accelerate the Advanced Encryption Standard (AES) AES by a factor of 3-4X. The Advanced Encryption Standard (AES) is the United States Government standard for symmetric encryption, defined by FIPS Publication #197 (2001). It is used in a large variety of applications where high throughput and security are required. Two additional instructions are also introduced for implementing the key schedule transformation: AESIMC and AESKEYGENASSIST. Together with the AES instructions, Intel will also introduce a new instruction supporting carry-less multiplication named PCLMULQDQ. The PCLMULQDQ instruction performs carry-less multiplication of two 64-bit quad words and can support high performance and secure implementation of encryption modes of operation suitable for high speed networking (e.g., AES-GCM).

Superior Key Establishment Software

Last, Intel has developed superior Integer arithmetic software that can accelerate big number multiplication and modular reduction by at least 2X. Such routines are used not only in RSA public key encryption but also in Diffie Hellman key exchange and Elliptic Curve Cryptography. Using our software we are able to accelerate RSA 1024 from a performance of approximately 1500 signatures per second (OpenSSLg) to potentially 2900 signatures per second on a single Nehalem Core. Similarly we are able to accelerate other popular cryptographic schemes like RSA 2048 and Elliptic Curve Diffie-Hellman based on the NIST B-233 curve. In summary, Intel® is researching new technologies that offer orders of magnitude cryptographic algorithm acceleration. Our ultimate goal is to make general purpose processors capable of processing and forwarding encrypted traffic at very high speeds so that the Internet can be gradually transformed to a completely secure information delivery infrastructure.

Authors: Michael E. Kounavis is a research scientist and Satyajit Grover is a network software engineer working on cryptographic algorithm acceleration in Intel’s Network Technology Lab.

4 Responses to https://everywhere! Encrypting the Internet

Perhaps I misunderstand, but do you sincerely mean encrypt everything? That would be incredibly wasteful.
Certain properties inherent to SSL mean it will always be slow and expensive, i.e. the network round-trips required for negotiation and key exchange, in spite of dedicated hardware (fancy hardware can’t change the laws of physics).
Should the BBC web site encrypt the public news? Consider the waste of dedicated hardware, RAM, CPU cycles etc. to implement this when public key signatures could suffice, with practically zero server-side cost. I can’t help but suspect this is part of a rushed marketing campaign that was insufficiently vetted by real world engineers before release.

We believe that many Internet applications including e-mail, web searches and video distribution will benefit from encryption. In the case of video broadcast one can imagine that higher quality streams are encrypted so that they are accessible to customers who pay some premium fee.
We also understand that network delays will always be there. But network links become faster and faster (e.g., 10, 100 Gbps). Our work aims to reduce the crypto overheads on the server side. We want to make a server, built using general purpose CPUs, to be capable of sending/receiving encrypted traffic at line rates.

Seriously encrypting everything? Security is not just about randomly and forcefully encrypting everything. A black-box traffic will invalidate all the intermediate infrastructure advantages we have right now.
But hey.. This will help me jump over Great FireWall. So why not?