Abstract

Active storage clouds are an attractive platform for executing large data intensive workloads found in many fields of science. However, active storage presents new system management challenges. A large system of fault-prone machines with local persistent state can easily degenerate into a mess of unreferenced data and runaway computations. To address this challenge, we advocate adapting the notion of distributed transactions from traditional databases. We demonstrate the use of distributed transactions in the context of DataLab, a software system for executing data parallel workloads on active storage clouds. We detail the underlying capabilities required from each node, explain how transactions are coordinated, and demonstrate the robust scaling of the system to 250 nodes while running an image processing application