Abstract: High performance computing (HPC) serves many large-scale applications that rely on simulation or measurement of real-time experiments such as medical research, weather prediction, and experiments with high-energy subatomic particles. Large distributed storage systems must be ef cient to feature suf cient performance and scale for demanding scientic workloads. Poor usage of system resources will lead to contention between workloads which reduces the maximum throughput of a system. Additionally, software bugs, faulty hardware, or shifting workloads can reduce performance, and the size and complexity of large storage systems inhibits rapid diagnosis and resolution of problems that reduce overall system performance.

We propose Geomancy, a tool to autonomously analyze the placement of data within a distributed storage system. It will suggest changes in the layout of data throughout the system to reduce contention and increase overall throughput. As workloads change over time, Geomancy will continuously evaluate storage system performance and builds an in depth knowledge of how workload variations affect performance.The design and hardware differences between production HPC environments demands a machine learning algorithm that can quickly adapt to changes in system hardware while maintaining the ability to be used across many discrete platforms. Using a combination of LSTM and convolutional layers, Geomancy will forecast when a bottleneck may happen due to changing workloads, and suggest changes in the layout that will mitigate or eliminate the bottleneck. Our approach to optimizing throughput will offer numerous benets for storage systems.