TCP offload is a technique to improve TCP/IP networking performance of a network computer system by moving (parts of) TCP processing from the host processor to the network interface. There are several ways to achieve offload. The typical full offload moves all TCP functionalities to the network interface, and TCP processing is performed exclusively on the network interface. However, when the network interface has limited processing power, full offload creates a bottleneck at the network interface and degrades system performance. In contrast, TCP offload based on connection handoff allows the operating system to move a subset of connections to the network interface. This way, both the host processor and the network interface perform TCP processing, and the operating system can control the amount of work performed on the host processor and the network interface. Thus, by using connection handoff, the system can fully utilize the processing power of the network interface without creating a bottleneck in the system. The goal of this project is to create a more effective framework for using the network interface (or other coprocessor) to accelerate TCP processing. By using connection handoff, the operating system can maintain control of the network subsystem, while still utilizing the processing and storage capabilities of the network interface to improve networking performance. Similarly, network interface data caching can improve overall performance by storing frequently transmitted data directly on the network interface.

Reconfigurable and Programmable Gigabit Ethernet NICs

Networking has become an integral part of modern computer systems. While the network interface has traditionally been a simple device that forwards raw data between the network and the operating system, its role is changing. The wide variety of services that are migrating to the network interface clearly motivates the need for directed research into the most effective services and abstractions that can be implemented by the network interface. However, existing programmable network interfaces do not provide enough computational power or memory capacity to implement these services efficiently. We are developing a reconfigurable and programmable Gigabit Ethernet NIC using an FPGA to surpass the performance limitations of these existing NICs. This will enable exploration of processor architectures for network interfaces, as well as the implementation of these new services on an actual network interface. This new network interface will be made freely available for use in research and education.

Network Subsystem Design for Scalable Internet Servers

As technology trends push future microprocessors toward chip multiprocessor designs, operating system network stacks must be parallelized in order to keep pace with improvements in network bandwidth. The most efficient network stacks in modern operating systems are single-threaded, forcing the network stack to only run on a single processor core at a time. However, for network-intensive applications, parallelism within the operating system is all but required in order to exploit the parallel nature of modern and future hardware to saturate ever increasing network bandwidths. The goal of this project is to explore the range of network stack parallelization strategies on modern parallel architectures and to improve upon the best organizations by redefining the hardware/software interface between the network interface and the operating system appropriately.

I/O Virtualization for Virtual Machine Monitors

Virtual machine monitors (VMMs) allow multiple virtual machines running on the same physical machine to share hardware resources such as a disk, video display, or network interface card (NIC). To provide network support, for example, a VMM must present each virtual machine with a software interface that is multiplexed onto the actual physical NIC. While sharing the physical device, the VMM must prevent one virtual machine from altering data in another virtual machine through the hardware device, either maliciously or through programmer error. Additionally, the VMM should at minimum provide each virtual machine an approximately equal opportunity to use the physical interface. The overhead incurred by a purely software-based virtualization approach can significantly degrade performance. The goal of this project is to develop efficient I/O virtualization architectures that use both hardware and software techniques to minimize the overhead of multiplexing, data protection, and flexible resource scheduling.

Embedded Systems Architecture

In spite of the increasing capacity of embedded memories on current and future SoCs, application, cost, and time-to-market requirements will continue to necessitate the use of external commodity memories in many embedded systems. These commodity memories and their associated interconnect can dissipate as much or more power than the SoC. The passive nature of these commodity memories motivates the development of novel solutions to manage power and energy within the memory controller on the SoC. This project aims to reduce the power and energy consumption of the memory system in order to address the requirements of future low power and high performance embedded systems.

Furthermore, existing architectural simulators are not well-suited to embedded systems designs. First, embedded systems with programmable processors also incorporate nonprogrammable units such as direct memory access (DMA) and medium access control (MAC) units that asynchronously interact with the host I/O interconnect, external networks, and local memory. Second, most embedded systems are I/O intensive, and the workload consists not only of the firmware to implement those tasks, but also the I/O interactions with the external world. This project aims to produce a flexible simulation infrastructure that allows accurate simulation of such embedded systems

High-performance MPI using TCP/IP

MPI applications, like other parallel applications, perform two distinct functions ¿ computation and communication. The computation aspect is mostly performed by the application directly, whereas the MPI library provides the communication support to the application. Thus, the overall performance of a MPI application, depends as much on individual nodes' computation power, as on the communication substrate used and the library support available for communication over that substrate. As the computation power of individual nodes has increased with faster processors over the past several years, the focus of attention for improving MPI performance on workstation clusters has gradually shifted towards the communication medium and the MPI library. TCP/IP over Ethernet has significant advantages as a messaging substrate in MPI: TCP is ubiquitous, highly portable and extremely robust. Furthermore, Ethernet-based solutions are relatively inexpensive compared to existing specialized solutions. This project aims to overcome the drawbacks of TCP/IP over Ethernet compared to specialized networks as a messaging substrate for MPI applications.

Thomas Barr, Alan L. Cox, and Scott Rixner. SpecTLB: A mechanism for speculative address translation. In Proceedings of the International Symposium on Computer Architecture (ISCA), San Jose, CA, June 2011.

In Proceedings of International Symposium on Distributed Autonomous Robotics Systems (DARS)Lausanne, Switzerland

In Proceedings of the Symposium on Architectures for Net-working and Communications Systems (ANCS), La Jolla, CA, October 2010

In Proceeding of the International Symposium on performance Analysis of Systems and Software (ISPASS), White Plains, NY, March 2010

In proceedings of the Symposium on Architectures for Networking and Communications Systems (ANCS), La Jolla, CA, October 2010

In proceeding of the Internatinal Symposium on Computer Architecture (ISCA), Saint Malo, France, June 2010.

Peter Mattson, William J. Dally, Scott Rixner, Ujval J. Kapasi, and John D. Owens "Communication Scheduling." Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (11/2000)