Publication

Cloud systems such as Hadoop, Spark and Zookeeper are
frequently written in Java or other garbage-collected languages.
However, GC-induced pauses can have a signifi-
cant impact on these workloads. Specifically, GC pauses
can reduce throughput for batch workloads, and cause
high tail-latencies for interactive applications.
In this paper, we show that distributed applications
suffer from each node’s language runtime system making
GC-related decisions independently. We first demonstrate
this problem on two widely-used systems (Apache
Spark and Apache Cassandra). We then propose solving
this problem using a Holistic Runtime System, a distributed
language runtime that collectively manages runtime
services across multiple nodes.
We present initial results to demonstrate that this
Holistic GC approach is effective both in reducing the
impact of GC pauses on a batch workload, and in improving
GC-related tail-latencies in an interactive setting.