Spyridon Mastorakis
UCLA
Tahrina Ahmed
Intel Corporation
Jayaprakash Pisharath
Intel Corporation
This work is part of the author’s internship at Intel Corporation.

\numberofauthors

3

Abstract

Nowadays, enterprises widely deploy Network Functions (NFs) and server applications in the cloud. However, processing of sensitive data and trusted execution cannot be securely deployed in the untrusted cloud. Cloud providers themselves could accidentally leak private information (e.g., due to misconfigurations) or rogue users could exploit vulnerabilities of the providers’ systems to compromise execution integrity, posing a threat to the confidentiality of internal enterprise and customer data.

In this paper, we identify (i) a number of NF and server application use-cases that trusted execution can be applied to, (ii) the assets and impact of compromising the private data and execution integrity of each use-case, and (iii) we leverage Intel’s Software Guard Extensions (SGX) architecture to design Trusted Execution Environments (TEEs) for cloud-based NFs and server applications. We combine SGX with the Data Plane Development KIT (DPDK) to prototype and evaluate our TEEs for a number of application scenarios (Layer 2 frame and Layer 3 packet processing for plain and encrypted traffic, traffic load-balancing and backend server processing). Our results indicate that NFs involving plain traffic can achieve almost native performance (e.g., ∼22 Million Packets Per Second for Layer 3 forwarding for 64-byte frames), while NFs involving encrypted traffic and server processing can still achieve competitive performance (e.g., ∼12 Million Packets Per Second for server processing for 64-byte frames).

Network Functions (NFs) deployed on routers, switches, and middleboxes are a vital part of today’s network infrastructure, improving network performance, reliability, availability, and security. At the same time, large-scale applications require a large amount of storage and processing power. Both resources are vital for modern enterprises, which have to either deploy their own in-house IT infrastructure or use cloud-based network and server services. Previous work [45] shows that the latter approach is less expensive and complicated in terms of management for enterprises, and more flexible and resilient in case of failures. As a result, nowadays, NFs and applications are widely deployed in the cloud and cloud routers, switches, middleboxes, and servers are commonly used by enterprises.

NFs and applications that maintain private data, process sensitive user data, and require trusted execution cannot be securely deployed in the untrusted cloud. In a cloud environment, providers may accidentally leak sensitive user information (e.g., because of a server misconfiguration) or malicious users may exploit vulnerabilities of the providers’ systems [3, 1, 4] to compromise execution integrity. Such concerns about data confidentiality and execution integrity discourage enterprises from moving their entire operation to the cloud [42].

Previous work on securing NFs deployed in the cloud has considered to directly apply network processing over encrypted data [46, 29]. In this paper, we explore the approach of leveraging Intel’s Software Guard Extensions (SGX)[19, 20] Instruction Set Architecture (ISA) to create Trusted Execution Environments (TEEs) for NF and server processing. To prototype our TEEs, we combine SGX with the Data Plane Development KIT (DPDK)[21], which allows for rapid prototyping of high-performance data plane applications. Previous work on combining SGX and DPDK to provide a secure middlebox framework for cloud-based NFs [51] does not consider packet switching, server processing, and NFs for the processing of encrypted traffic (e.g., VPN endpoints based on IPsec [26] and MACsec capable switches [41]).

The limited study and experimentation with SGX-based TEEs for NFs and server processing motivated our work. We aim to contribute to the further understanding of the performance and trade-offs of applying SGX to cloud-based network and server solutions. Our contribution is twofold: (i) we discuss NF and server processing use-cases, where SGX can be applied to, and the assets and impact of compromising the private data and execution integrity of each use-case, and (ii) we present and evaluate proof-of-concept designs for a number of application scenarios (Layer 2 frame and Layer 3 packet processing for plain and encrypted traffic, traffic load-balancing, and backend server processing) through an experimental study.

The rest of our work is organized as follows: in section 2, we give some background on the SGX and DPDK frameworks and discuss our related work. In section 3, we discuss NF and server application use-cases, where trusted execution can be applied to. In section 4, we present our design approach. Section 5, describes implementation-specific details of our work, and section 6, presents our experimental evaluation study. In section 7, we describe the lessons learned and open issues of the current work that we plan to address in the future, and, finally, section 8 concludes our work.

In this section, we present a brief overview of SGX and DPDK to help the reader gain better understanding of what will be discussed in the rest of the paper. We also discuss some related work.

2.1 Background

Sgx

The Intel SGX architecture offers a set of x86-64 ISA extensions that enable applications to instantiate a secure software container, called an enclave; an area in the virtual address space of the application, which is protected by the processor from accesses of any software that does not reside in it (e.g., other applications, OS, BIOS).

The enclave data is stored in a reserved memory cache, called Enclave Page Cache (EPC). The Memory Encryption Engine (MEE) encrypts the enclave data in EPC to avoid memory attacks (e.g., memory snooping). To access enclave data in EPC, the processor enters a new CPU mode, called enclave mode, which applies additional hardware verifications to each memory access. Specifically, the data in EPC is decrypted only when entering the CPU package (enclave mode) and is encrypted again and stored to EPC when leaving the CPU package.

Untrusted code can make incoming calls (ECALLs) to trusted enclave functions defined and exposed by developers, while enclave code can make outgoing calls (OCALLs) to untrusted code. In cases that the enclave execution is interrupted due to asynchronous events, such as interrupts and exceptions, the processor state is securely saved inside the enclave to prevent any leakage of secrets. After the event is serviced, the processor state can be restored and the enclave execution resumes from the point that was interrupted.

An enclave can prove that it has been properly instantiated on a platform through CPU-based attestation. There are 2 attestation categories; local and remote. Local attestation enables 2 enclaves instantiated on the same platform to authenticate each other, while remote attestation enables an enclave instantiated on a remote platform to attest that it is “trusted” to a remote attestation provider, so that secrets can be provisioned to it [22].

Dpdk

DPDK consists of a set of libraries and optimized Network Interface Card (NIC) drivers for highly-scalable and fast packet processing, which is designed to run on any processor. It avoids the overhead imposed by Linux kernel processing (e.g., system calls, context switching on blocking I/O, copying data from kernel to user space, interrupts) and achieves high performance by: 1) leveraging processor affinity, 2) allocating huge memory pages to avoid swaps and reduce TLB misses, 3) placing device drivers in user space to achieve zero-copy packet processing, 4) accessing all devices by polling, 5) achieving synchronization without locks, and 6) handling large batches of packets and distributing them to processing threads for unified processing.

Each DPDK process (application) occupies one CPU core in full, but can actually use one or more of its logical cores. To exchange data among logical cores, lock-less First-In-First-Out (FIFO) ring structures are used. Each application can make use of DPDK libraries that provide network packet buffer management and packet forwarding mechanisms, and implement the TCP/IP protocol stack.

2.2 Related Work

In this section, we discuss software and hardware-based approaches that protect applications against unauthorized access. We also present some related work based on SGX and a few approaches that study the application of NFs directly to encrypted data.

Software-Based Protection

One of the very first works towards protecting applications and their sensitive data from unauthorized access by privileged software is NGSCB[37]. NGSCB made use of virtualization to run trusted and untrusted OSs simultaneously on the same machine enabling critical applications to use the trusted OS. A similar approach was also taken by Proxos[50] that requires application developers to specify which system calls are sensitive, so that they are forwarded to a trusted private OS, protecting applications against an untrusted OS.

Approaches, such as Overshadow[13], Virtual Ghost[15] and InkTag[18], assumed a trusted virtualization layer to protect sensitive application data and aimed to reduce the size of TCB. Specifically, Overshadow offers different views of physical memory for each memory access, therefore, an application can have a normal view of its resources, but the OS an encrypted one. While Overshadow focuses on ensuring that applications are isolated from the OS, InkTag allows applications to use the services of an untrusted OS and define their own access control policies on secure files. Virtual Ghost utilizes compiler support to secure applications from an untrusted OS and creates secure memory, which cannot either be read or written by the OS. MiniBox[32] is a two-way sandbox that protects critical applications from a malicious OS, as well as an OS from malicious applications.

All these approaches are purely based on software and do not require any special hardware support. Therefore, they can be used in cases that hardware-based solutions, such as SGX, cannot be deployed.

Hardware-Based Protection

Several systems have utilized trusted hardware to secure applications running on them along with sensitive data from unauthorized access.

Trusted Platform Modules (TPMs)[17] offer a dedicated micro-controller to offer secure generation of cryptographic keys and restrict their accessibility. TPMs also support remote attestation and data sealing functions. However, there are privacy concerns associated with the Direct Anonymous Attestation (DAA) scheme used by TPM when a small number of keys is used for the entire platform lifetime [31]. To address those concerns, SGX extends DAA by using an Enhanced Privacy ID (EPID) key during remote attestation [22].

Secure co-processors[48, 16] offer hardware that can be trusted even in cases of physical attacks, so that trusted computations can be performed on untrusted remote devices. However, they are expensive and their performance is limited due to thermal throttling issues.

ARM TrustZone[7] is a security technology for System-on-a-Chip (SoC) and CPU systems based on the concept of physically separated trusted and untrusted worlds. It has been used to build an embedded virtualization system on commodity hardware [38] and a multi-layer security architecture for mobile devices [30]. TrustZone is mainly used by embedded systems and does not offer memory encryption, therefore, attacks are possible in cases of physical DRAM access.

AMD’s memory encryption technology[23] is integrated into the x86 CPU architecture and offers a security subsystem for key generation, platform boot, off-chip storage for sensitive data, protection against physical memory attacks and support for encrypted virtual machines. However, it specializes in memory encryption and does not provide a framework to run applications in trusted mode.

SGX-Based Approaches

SGX offers CPU features that enable applications to instantiate secure and trusted enclaves. Research areas, to which SGX has been applied, include networked and distributed systems, cloud systems and applications.

In Network Function Virtualization (NFV) environments, the design of an enclavized NAT, policy control and intrusion detection application, HTTP and web caching proxy [47], and an extension to enclavize the Click modular router [14] have been presented. These approaches do not study the performance and overhead of applying SGX to frame processing and trusted encryption/decryption for IPsec and MACsec traffic, while the provided experimental results are limited.

Designs that explore how SGX could strengthen the security and privacy of peer-to-peer anonymity networks, such as Tor [28, 27], network protocols, such as TLS [8], and distributed services, such as the Apache ZooKeeper [11], have been studied. These protocols and services operate on a higher layer of the TCP/IP protocol stack and the performance of pure network forwarding and switching is not evaluated. Slick [51] proposes a trusted middlebox framework to deploy network functions on untrusted servers. This work tackles a problem related to ours and some of their design decisions and optimizations can be used to further enhance our approach and vice versa.

Haven [9] protects the confidentiality and integrity of applications and their associated data from the untrusted cloud on which they run, while VC3 [43] allows users to keep their data and secrets safe during the execution of distributed MapReduce computations in the cloud. The idea of an inverted cloud infrastructure has been discussed [49], where mini providers use SGX to secure confidential information, so that they can join forces to provide cloud services instead of receiving services by a single major provider. SGX has also been used to secure content-based routing mechanisms [39] and Database Management Systems (DBMS) [6] operating on the cloud. These pieces of work focus on trusted cloud applications and services running on top of today’s networks, rather than the performance of the underlying network infrastructure itself.

Network Functions Over Encrypted Data

Starting with APLOMB [45], the idea of out-sourcing NF processing to the cloud emerged, without taking into account though its security considerations. BlindBox [46], extending the approach taken by APLOMB, performs deep packet inspection directly over encrypted network traffic. Later on, Embark [29] added support for a wider set of NFs over encrypted network data. This work focuses exclusively on protecting network traffic through encryption, rather than investigating how the execution of the NF software itself can be secured. Moreover, the studied middleboxes operate on the network layer and above, without considering any link layer devices or server-side applications.

Each application and NF deployment processes is coupled with private data and executes software vital for its secure and legitimate operation. We present a few example use cases along with the assets of each case that can be protected through SGX in Table 1 to show that trusted execution can apply to a wide set of systems and applications with different security concerns. We categorize the assets into data and data structures, and software. Given that SGX provides encrypted EPC to each enclave, whose access is forbidden to any untrusted entity, the data and data structures crucial for the operation of each use-case can be stored in EPC. Crucial software pieces can be executed in enclave mode, fully protected and isolated from the untrusted part of the system.

To avoid exposing topology information, routing and management policies to attackers, a router should: 1) protect its internal data structures (e.g., routing table, Domain Name Server (DNS) cache, access control lists) by storing them in EPC, and 2) protect the actual software that performs operations using those structures (e.g., longest or exact IP prefix match, DNS cache lookup) by executing it in enclave mode. To avoid exposing forwarding policy information, legacy switches can store their forwarding table in EPC and execute in enclave mode all the operations related to it, while Software Defined Networking (SDN) switches can protect their flow table and the related operations from attackers.

The same applies to servers; for instance, DNS servers can leak information about the mapping of domain names to IP addresses, therefore, their DNS cache should be stored in protected enclave memory, while software that performs operations (e.g., lookups, insertions, deletions, updates) over this cache should be executed in enclave mode. Similarly, in a cloud multi-tenant environment, a server hosting multiple Virtual Machines (VMs) can secure each tenant’s application instance and data in an enclave.

Middleboxes, such as load balancers and firewalls can reveal to attackers load balancing policies and information about the blocked and accepted traffic respectively. To this end, they can protect their policies in EPC and enforce them to incoming traffic in enclave mode.

Since the source and destination IP address of a packet is typically in plaintext, on-path eavesdroppers can easily identify its source and destination. In a LAN, any Network Interface Card (NIC) that is present on it can listen to all the frames transmitted by any other NIC regardless of their destination MAC address, identifying the source and destination MAC address of the frames. To provide end-to-end security directly at the network and link layer of TCP/IP, security extensions have been added to the legacy IP and MAC protocols, formulated as the IPsec and MACsec protocols respectively. VPN endpoints and MACsec-capable switches make use of those protocols to encrypt network packets and Ethernet frames respectively. Since traffic encryption and authentication keys can be leaked, they should be stored in EPC, while encryption and authentication operations should be executed in enclave mode.

We should note that the size of EPC is currently limited to 128MB across all the enclaves, therefore, the data to be stored in it should be carefully selected. Exceeding the EPC size results in EPC paging (a mechanism for secure paging to the unprotected memory supported by SGX), which imposes additional execution performance overhead. We discuss how the performance impact from EPC paging can be alleviated in section 7.2.

Our design consists of 2 major building blocks: DPDK and SGX. We first present a baseline approach utilizing a DPDK application and a single SGX packet processing enclave as well as the work-flow facilitated by this approach. We also present an approach to scale trusted processing and an approach to implement a trusted processing pipeline by leveraging multiple local enclaves and the SGX local attestation feature.

4.1 Baseline Approach

Following the SGX application design principles [19], we divide our DPDK application into 2 parts; trusted, which is implemented within a processing enclave to protect vital data structures and code against unauthorized access, and untrusted, which refers to application code not protected by SGX. Our baseline design approach is illustrated in Figure 1. The enclave is assigned to a dedicated logical core, while one or more logical cores are assigned to the untrusted part of the application. The code integrity inside an enclave can be authenticated through the SGX remote attestation feature mentioned in section 2.1.1.

Initially, packets are received by the untrusted part of the application through one or more receiving (Rx) queues and are placed in DPDK buffers. The memory address (pointer) of each buffer is enqueued to a receiving (Rx) DPDK ring. The enclave repeatedly dequeues pointers from the Rx ring, so that the buffers are processed by a secure module within the enclave. Once processing is done, the buffer pointers are enqueued to a transmission (Tx) DPDK ring. The untrusted application part dequeues them from the transmission ring and transmits the buffers through one or more transmission (Tx) queues. We use the user-space DPDK I/O mechanisms to read buffers from the NICs and enqueue/dequeue them to/from the Rx, and Tx rings, since such libraries have been specifically optimized for high traffic rates.

This design can be used for programmable routing and switching applications. Our experimental results (section 6.2.1) indicate that such trusted application models based on this design can achieve forwarding performance of ∼22 million packets/frames per second for 64-byte frames.

Figure 1: Secure Packet Processing Design

4.2 Parallel Packet Processing Approach

Depending on the requirements of each application, an enclave might have to perform costly operations, which can result in considerable performance degradation. To enable applications to scale their performance in such cases, multiple processing enclaves that operate in parallel can be instantiated (transition from sequential to parallel processing). To this end, in Figure 2, we present our design approach to scale packet processing by utilizing multiple SGX enclaves, which implement the same processing logic within the same DPDK application. The integrity of the code executed by each enclave can be authenticated through remote attestation. Similar to section 4.1, a separate logical core is assigned to each enclave, while the untrusted part of the application may be assigned one or more logical cores. Each enclave dequeues buffer pointers from the Rx ring and processes the associated buffers. Finally, it enqueues the buffer pointers to the Tx ring for transmission.

This design can be used for multi-threaded applications and systems (e.g., load-balancers, spark, hadoop and other data processing systems) and, more general, in scenarios, where a single processing enclave results in low performance (e.g., because of EPC paging). Such example scenarios are the processing of encrypted L2 and L3 traffic (similar to MACSec and IPSec) presented in sections 6.2.2 and 6.2.3, where this design achieves a performance gain of 3-4x compared to the baseline sequential design.

Figure 2: Secure Parallel Packet Processing Design

4.3 Packet Processing Pipeline Approach

In Figure 3, we present the design of a secure packet processing pipeline design consisting of a number of stages. Each stage applies a specific action to packets and forwards them to the subsequent processing stage, and is implemented as a separate SGX enclave. The code integrity of the first stage enclave is authenticated through remote attestation, while each next stage enclave and its previous one authenticates each other and establishes a protected communication channel through local attestation. Each stage enclave is assigned to a separate logical core.

The first stage enclave dequeues buffer pointers from the Rx ring and initiates secure processing. The result of the first stage processing will be the input of the second stage and so on, so forth. The final processing stage of the pipeline enqueues the processed buffer pointers to the Tx ring for transmission.

This design can be used in networking and systems environments that require an action sequence to be applied to received packets, such as Software Defined Networking (SDN) switches [34], Deep Packet Inspection (DPI) systems [5], and cloud applications following the microservice architecture [35]. We expect that it would allow for more flexible and diverse processing than the previous approaches, but the performance evaluation of this design is left to our future work.

Figure 3: Secure Packet Processing Pipeline Design

4.4 Threat Model

We focus on cases, where systems and applications may process confidential information, therefore they leverage trusted execution to secure crucial data and perform critical processing operations. Such cases can occur either in an untrusted cloud setup due to an accidental leak of confidential information or malicious tenants that try to compromise the execution integrity of others, or in setups, where resources are distributed at the edge of the network. Similar to the use-cases mentioned in section 3, attackers may attempt to learn routing and forwarding policies, compromise encryption and authentication keys, monitor encrypted traffic, etc. According to the SGX threat model [33], we assume that an attacker can compromise software components, including privileged code (e.g., OS and BIOS) and launch physical attacks.

Following the SGX threat model [33] and prior related work [27, 28], DoS and DDoS attacks are out of the scope of our work, since compromised software or hardware can deny the service at any point of the execution (e.g., restart or crash the system or flush all the unprotected DPDK memory). The same assumption applies to side channel attacks against SGX (e.g., page fault and cache-based side channel attacks). Software techniques to protect SGX-capable applications against attacks aiming to exploit bugs (e.g., buffer overflows, synchronization bugs, etc.) [52, 44] are also out of the scope of our work.

In this section, we describe implementation-specific details on our effort to combine DPDK and SGX, as well as the developed application scenarios.

5.1 Combining DPDK and SGX

We used DPDK version 17.02 and implemented our applications as enclaves using SGX version 1.8. To make the enclave ECALL/OCALL API accessible to DPDK context, we had to first compile the enclavized application and then compile the DPDK context, which was using the enclave API as a shared library. To be able to make use of more DPDK context in enclave mode, we had to modify the DPDK codebase to alleviate its coupling with standard C libraries (e.g., by modifying functions for log collection and printing, converting inline functions to macros), since SGX currently supports only the use of memory allocation and deallocation standard C libraries in enclave mode. However, in some cases, the coupling was so tight (e.g., DPDK libraries for hash and flow table implementation and lookup) that a specific OCALL had to be executed. We discuss further details and our optimization to alleviate this additional overhead for such cases in section 5.2.

The API of our application enclaves exposes a single ECALL, which is required to be called only once during the enclave instantiation. No further ECALLs are required during the execution, since the communication between DPDK and the enclave is achieved through DPDK rings, a structure optimized for performance. Most of the developed codebase, including all the performance-sensitive enclave components, is written in C instead of C++ for compatibility with the DPDK codebase and performance purposes.

5.2 Application Scenarios

We implemented a few DPDK application scenarios to experiment with. The enclavized components of each application scenario are summarized in Table 2.

Layer 2 (L2) forwarding: A frame processing application implementing the operation of a network switch. We use a single processing enclave for the trusted applications following the design of section 4.1. The untrusted application part receives frames, which enqueues to the Rx ring. The processing enclave dequeues frames from this ring and performs a number of frame sanity checks, destination MAC address lookup and source MAC address rewriting. Once processing is done, the enclave enqueues the frames to the Tx ring.

Layer 3 (L3) forwarding: A packet processing application implementing the operation of a network router. The enclave operations include a few packet header sanity checks and the longest prefix match lookup of the destination address of a packet.

Encrypted L2 forwarding: A frame processing application for encrypted Ethernet traffic implementing the operation of a MACsec-capable switch. We use the frame format of MACsec [41], where the Integrity Check Value (ICV) field consists of an 128-byte CMAC hash, and multiple processing enclaves following the design of section 4.2. The untrusted application part receives encrypted frames, which are enqueued to a processing enclave. A processing enclave dequeues and decrypts them, while it also generates the ICV of the raw frame data and compares it with the ICV of the received frames to verify their integrity. If the integrity verification is successful, the raw frames are processed (MAC table lookup and MAC address rewriting) and their new ICV is generated. Finally, the enclave encrypts the processed frames and enqueues them to the Tx ring to be forwarded.

Encrypted L3 forwarding: A packet processing application for encrypted network level traffic implementing the operation of a VPN endpoint (router). We use multiple processing enclaves following the design of section 4.2 and the packet format provided by the ”Encapsulating Security Payload” function of IPsec [26], where the Integrity Check Value (ICV) field consists of an 128-byte CMAC hash. The workflow is similar to the one explained for the encrypted L2 forwarding application.

Load-balancing & backend server processing: An application implementing the operation of a load balancer distributing traffic to multiple backend server processes (either VMs or containers) running on the same physical machine. The load-balancing process that maintains a flow table (with 1 million flow entries) and classifies the received packets into flows based on their destination IP address. The backend server processes filter and forward the distributed packets based on their destination IP address (hash-based forwarding). The load balancer forwards traffic to the backend processes through a number of DPDK rings (one ring per backend process).

An enclave of a load balancing or server process has to make an explicit OCALL for every batch of packets dequeued from the Rx ring. The vanilla DPDK application uses DPDK libraries for the flow table and the hash table lookups that leverage standard C libraries, which are not trusted and cannot be used in enclave mode. To allow the DPDK libraries to access the enclave buffer containing the keys for the lookups, this buffer has to be copied during the OCALL transition from the enclave’s EPC to untrusted memory. To return the lookup results back to the enclave, a separate buffer has to be copied from the untrusted memory to the enclave’s EPC when the OCALL returns. In addition to the copy operation itself, additional checks have to be performed by SGX to ensure that the full memory range of the buffers passed to the untrusted application part is within the enclave.

To enhance system performance and alleviate overheads, we performed the following optimizations:

[leftmargin=*]

Increased the number of packets dequeued as batch by a server or load balancing enclave from its corresponding ring to reduce the total number of performed OCALLs.

Used untrusted memory for the buffers of the lookup keys and results to avoid memory copy from/to the EPC and the additional checks. Essentially, we traded security for performance, as explained in section 7.1.

According to the design presented in section 4.2, we enabled each backend process to instantiate two enclaves implementing the same processing logic.

We first discuss our experimental setup, methodology and workload and then present the results of our study to evaluate the performance and the SGX related overhead of our design for the application scenarios mentioned in Section 5.2.

6.1 Experimental Setup

Testbed: For our experimental evaluation, we use a testbed consisting of 2 machines (Figure 4). On the first one, we run the DPDK packet generator (version 3.2) and on the second one, the application under test (trusted and vanilla DPDK applications). Each machine is equipped with an Intel Xeon CPU E3-1240L v5 (2.10GHz, 4 cores), 32GB of RAM, and a dual-port Intel XL710 40GbE NIC. The NIC is connected to the processor on each machine through a PCIe Gen3 x8 bus. Each of port 0 and port 1 of the generator is connected to port 0 and port 1 respectively of the application under test through a 40 GbE Ethernet link.

Methodology: The packet generator creates traffic of variable size (64 to 1500-byte frames or up to the size that saturates the available link capacity) and sends it through port 0 of its NIC to port 0 of the application’s NIC. The application processes the received traffic and forwards it through port 1 of its NIC to port 1 of the packet generator’s NIC. The performance is quantified by measuring the received packet rate in Million Packets Per Second (MPPS) and throughput in Gbps at port 1 of the packet generator’s NIC. We run each experiment 10 times (each experiment’s duration is 1 minute) and we report on the average results among the runs, since the run-to-run result variation is negligible.

To calculate the SGX overhead, we use the following equation, where the “Vanilla App (MPPS)” term refers to the measured MPPS processed by the vanilla DPDK application and the “SGX App (MPPS)” term to the measured MPPS processed by the trusted DPDK application:

Overhead(%)=VanillaApp(MPPS)−SGXApp(MPPS)VanillaApp(MPPS)∗100

Workload: For this initial evaluation of our design, we use a synthetic traffic workload rather than real-world traffic traces for 2 reasons: 1) to study the impact of gradually greater sizes of traffic patterns to the SGX performance, and 2) to stress on the specific fields and layer of the TCP/IP protocol stack that processing and forwarding are based on for each scenario. For the L3 related and the load balancing & server processing scenarios, our workload consists of traffic flows with 1 million distinct source and destination IP addresses, and the MAC address of the generator’s and application’s port 0 as source and destination respectively. For the L2 related scenarios, our workload consists of 1 million distinct source and destination MAC addresses.

6.2 Experimental Results

L2 & L3 Forwarding

In Figure 5, we present the performance overhead of a trusted L2 and L3 forwarding application compared to the corresponding vanilla DPDK L2 and L3 forwarding application. For 64-byte frames, the overhead is ∼2.1% and ∼1.8%, while it decreases to ∼1.8% and ∼1.2% for 128-byte frames respectively. When we increase the frame size to 256 and 512 bytes, the transmission and reception delays become the bottleneck of the overall system, therefore, the performance of the trusted application reaches the performance of the vanilla one.

In Table 3, we present the performance (in MPPS) and the achieved wire throughput (determined by the illustrated frame size plus 8 bytes of preamble and 12 bytes of interframe gap) for both the vanilla and the trusted applications. For small-sized frames, we cannot saturate the available link capacity, because the NICs are limited by the current CPU power per core, which is not sufficient for our network bandwidth. Similar results have been presented by DPDK performance evaluation studies with similar setup [24, 25]. As the frame size increases, our applications are required to forward fewer frames, while we approach the saturation of the available bandwidth for 512-byte frames.

Figure 5: Trusted L2 & L3 Forwarding Overhead

Frame Size (Bytes)

Performance (MPPS)/Wire Throughput (Gbps)

L2 Forwarding

L3 Forwarding

Vanilla

Trusted

Vanilla

Trusted

64

21.80/14.65

21.33/14.33

21.89/14.71

21.50/14.45

128

21.34/25.30

20.95/24.80

21.45/25.30

21.19/25.09

256

13.75/30.37

13.75/30.36

13.83/30.54

13.84/30.56

512

8.67/36.91

8.68/36.93

8.68/36.93

8.68/36.93

Table 3: L2 & L3 Forwarding Performance & Wire Throughput

Encrypted L2 Forwarding

In this scenario, we vary the number of enclaves used for the processing of encrypted L2 traffic. Our goal through this experiment is not to replicate the cipher suite used by MACsec or the specifics of the protocol, but rather study a trusted application that uses the mechanisms provided by SGX for encryption/decryprtion.

In Figure 6 and 7, we present the performance (MPPS) and the wire throughput (Gbps) respectively of trusted L2 forwarding for encrypted traffic. The results show that the system performance is low, however, it scales as we increase the number of enclaves. The potential bottlenecks in this scenario can be: 1) the SGX encryption/decryption overhead, and 2) the integrity verification process and the new ICV generation. To further investigate these bottlenecks, we repeated the same experiment without performing integrity verification and ICV generation to focus on the overhead of the SGX encryption/decryption. The results are presented in Figure 8 and 9. We observe that we achieve a speedup of ∼2.2x, however, the results do not approach the ones for plain traffic. Therefore, we can conclude that the performance is dominated by the SGX encryption/decryption overhead.

Encrypted L3 Forwarding

In this scenario, we vary the number of enclaves used for the processing of encrypted L3 traffic. In Figure 10 and 11, we present the performance (MPPS) and the achieved wire throughput (Gbps) respectively. The performance is again low, but it scales as we increase the number of enclaves. To further investigate the performance bottleneck, we repeated the same experiment without performing integrity verification and ICV generation. The results were similar to section 6.2.2, achieving the same speedup of ∼2.2x, and leading us to the conclusion that the system bottleneck is again the SGX encryption/decryption.

Load Balancing & Backend Server Processing

In Figure 12 and 13, we present the overhead of using trusted and untrusted memory for the lookup key and result buffers respectively (i.e., with and without copying the buffer memory during OCALLs). For trusted memory, 4 bytes are copied from EPC to untrusted memory and 1 byte from untrusted memory to EPC per packet for load balancing and 4 bytes from EPC to untrusted memory and vice versa per packet for server processing.

We observe that for load balancing only (no server processes running), the SGX overhead is ∼10.1% and ∼6.5% for 64-byte frames, and slightly decreases to ∼9.7% and ∼6.2% for 128-byte frames (use of trusted and untrusted memory respectively). As we increase the number of server processes, the overhead increases and the performance degrades, since more enclaves have to run on the same physical machine and the number of performed OCALLs per dequeued packet batch doubles (one OCALL by the load balancing enclave and one by the corresponding server enclave).

In Tables 4 and 5, we present the performance (MPPS) and the achieved wire throughput (Gbps) for both the vanilla and trusted applications. As we increase the frame size, our applications are able to process fewer frames, while the achieved throughput approaches the maximum link capacity.

In this section, we summarize the lessons we have learned through our work, and identify a number of open issues that we are planning to address in our future work.

7.1 Lessons Learned

Our work highlighted a number of important technical points that we would like to share with the broader community. There is a clear trade-off between building a secure and trusted system and the performance that this system can achieve even in cases that execution integrity is directly integrated into its architecture. There are many different threat models and attack scenarios; building a system that provides confidentiality and security against multiple attack scenarios at the same time comes with a performance penalty.

Our experimental evaluation indicted that there is a considerable ECALL/OCALL transition performance overhead. Transferring code execution from/to enclave mode to/from the untrusted application part does not come for free, since multiple hardware-level checks have to be performed. For example, as described in section 6.2.4, enclavized applications that require access to standard C libraries (other than the memory allocation and deallocation libraries, which are the only ones supported by SGX in enclave mode) have to invoke explicit OCALLs, which impacts their performance. To achieve better amortized transition performance, the number of ECALLs and OCALLs should be minimized and ensure that the application executes work inside the enclave for as long as possible.

Multiple checks are also performed in the case of memory copy to/from EPC during ECALLs and OCALLs. As shown in section 6.2.4, such checks impose an additional performance penalty of 5-10%. One way to overcome this penalty is the use of untrusted memory by an enclave to avoid copying memory from/to EPC for ECALLs and OCALLs which, however, raises certain security concerns; the data in the untrusted memory is not encrypted and can potentially be accessible by malicious actors, thus leaking enclave secrets, while it can also be altered at any point of the execution without notice.

7.2 Open Issues

In our current work, we focused on evaluating the performance and trade-offs of the baseline and packet processing scaling design approaches (sections 4.1 and 4.2). The evaluation of our packet processing pipeline approach (section 4.3), which could be used for the design of SDN-capable switches, DPI systems and cloud applications using the microservice architecture, is left to our future work. To conduct further evaluation, we would also like to use workloads consisting of real-world traffic traces, such as the CAIDA anonymized Internet traces [2] for backbone Internet traffic, and the IMC 2010 Data Center Measurements [10] for cloud and data center traffic, more trusted server applications, such as Hadoop and Spark, and traffic loading balancing algorithms that could boost performance [40, 12].

To eliminate the need of invoking explicit OCALLs to access standard C libraries, we are planning to implement trusted standard C-like libraries, which should boost the overall system performance. In cases that the enclave size and the required protected data and software exceeds the maximum EPC size, exit-less services [36] can be used to alleviate the enclave exit overhead imposed by EPC paging. This is achieved by creating a secure virtual memory abstraction that implements application-level paging inside the enclave.

In this paper, we presented proof-of-concept designs of TEEs for NFs and server applications deployed on the untrusted cloud. Our designs are based on Intel SGX, which we combine with DPDK to prototype high-performance trusted applications. Through this work, we learned several valuable lessons and identified remaining open issues, which we shared with the broader community. Our experimental evaluation showed that NFs involving plain traffic can achieve close to native performance, while NFs involving encrypted traffic and server processing can still achieve competitive performance.