Google Cluster Workload Traces 2019

Dataset type

Dataset size

License type

Publication year

Related publications

Description

This is a trace of the workloads running on eight Google Borg compute clusters for the month of May 2019. The trace describes every job submission, scheduling decision, and resource usage data for the jobs that ran in those clusters.

It builds on the May 2011 trace of one cluster, which has enabled a wide range of research on advancing the state-of-the-art for cluster schedulers and cloud computing, and has been used to generate hundreds of analyses and studies.

Since 2011, machines and software have evolved, workloads have changed, and the importance of workload variance has become even clearer. The new trace allows researchers to explore these changes. The new dataset includes additional data, including:

CPU usage information histograms for each 5 minute period, not just a point sample;

information about alloc sets (shared resource reservations used by jobs); and

job-parent information for master/worker relationships such as MapReduce jobs.

Just like the last trace, these new ones focus on resource requests and usage, and contain no information about end users, their data, or access patterns to storage systems and other services.

The trace data is being made available via Google BigQuery so that sophisticated analyses can be performed without requiring local resources. This site provides access instructions and a detailed description of what the traces contain.