CARIBOU: SMART DISTRIBUTED STORAGE FOR THE DATACENTER

Assistant Research Professor at the IMDEA Software Institute
Madrid, Spain

Caribou: Smart Distributed Storage for the Datacenter

There is a widening gap in the datacenter between data growth and stagnating CPU performance. This gap limits our ability to solve more complex problems and prompts us to revisit both the architecture of servers and the ways we manage and process data. In my work, I aim to narrow this gap using specialization and HW/SW co-design. As a specific example, I will talk about building energy-efficient distributed storage for large-scale data processing applications.
Most modern data-intensive applications are designed to run on tens or hundreds of nodes, often splitting them between "compute" and "storage" layers. This separation increases scalability but also introduces data movement bottlenecks. We show how these bottlenecks can be lifted without increasing the nodes' power consumption by pushing down computation into storage nodes built using specialized hardware (FPGAs). Caribou implements data management, data processing, and replication for fault tolerance in a micro-server footprint. It delivers network-bound throughput and, even though it is based on specialized hardware, it can be efficiently shared by multiple tenants.
Caribou is open-source and acts as a platform for exploring ideas related to, on the one hand, distributed data management using specialized hardware and, on the other hand, near-data processing in emerging data science workloads.
Presenter bio: Zsolt Istvan is an Assistant Research Professor at the IMDEA Software Institute in Madrid, Spain. Before that, he earned his PhD in the Systems Group at ETH Zurich, Switzerland, working with FPGAs and distributed storage. In his research, he explores ideas around specialization as a way of lifting bottlenecks in distributed systems and databases.