Venue

Publication Year

Authors

BibTeX

Abstract

Evaluating the performance of large compute clusters requires benchmarks with
representative workloads. At Google, performance benchmarks are used to obtain
performance metrics such as task scheduling delays and machine resource
utilizations to assess changes in application codes, machine conﬁgurations, and
scheduling algorithms. Existing approaches to workload characterization for high
performance computing and grids focus on task resource requirements for CPU,
memory, disk, I/O, network, etc. Such resource requirements address how much
resource is consumed by a task.

However, in addition to resource requirements, Google workloads commonly include
task placement constraints that determine which machine resources are consumed by
tasks. Task placement constraints arise because of task dependencies such as
those related to hardware architecture and kernel version.

This paper develops methodologies for incorporating task placement constraints
and machine properties into performance benchmarks of large compute clusters. Our
studies of Google compute clusters show that constraints increase average task
scheduling delays by a factor of 2 to 6, which often results in tens of minutes
of additional task wait time. To understand why, we extend the concept of
resource utilization to include constraints by introducing a new metric, the
Utilization Multiplier (UM). UM is the ratio of the resource utilization seen by
tasks with a constraint to the average utilization of the resource. UM provides a
simple model of the performance impact of constraints in that task scheduling
delays increase with UM. Last, we describe how to synthesize representative task
constraints and machine properties, and how to incorporate this synthesis into
existing performance benchmarks.

Using synthetic task constraints and machine properties generated by our
methodology, we accurately reproduce performance metrics for benchmarks of Google
compute clusters with a discrepancy of only 13% in task scheduling delay and 5%
in resource utilization.