HPC systems are becoming more complex and heterogeneous, and this trend is likely to continue in the future. As a result, it is important to have accurate tools to analyse and understand the performance of these machines under operational conditions. However, existing benchmarking tools target HPC subsystems in isolation, for example benchmarking compute separate from I/O. As such they struggle to capture the consequences of subsystem and inter-job interactions and resource contention.
Kronos aims to model and execute a workload based on real-life profiling data but in a machine agnostic and easily portable manner. This can be used to benchmark and evaluate HPC systems making use of all of the subsystems simultaneously, including compute, I/O, network and scheduling.
The workload is modelled using machine learning algorithms, and a meta-schedule is generated consisting of idealised tasks. This meta-schedule can be scaled up or down in each of its components for the purpose of benchmarking HPC systems of different scales. The executor can then generate concrete synthetic applications and submit them to the HPC scheduling system.