Sign up or log in to save this to your schedule and see who's attending!

In the HPC landscape, the number of cores in client machines is continually increasing. However, even in parallel applications, single-thread Input/Output (I/O) remains very common. While Lustre was originally optimized for multiple I/O operations at the same time, single-threaded applications cannot utilize this optimization if a single core is slow. Therefore, the time cost of single thread I/O in parallel applications cannot be reduced by simply adding more compute nodes.

There is optimization that can be done in the Lustre client to provide a significant performance gain for single-threaded applications or parallel applications containing large amounts of single-threaded I/O. Each stage of the Lustre I/O flow has been analyzed and an overview of the potential solution will be presented that will be a critical improvement for the utilization of many-core and multiple network interface architectures that we see in clusters today.

A proof-of-concept solution has been developed and tested with a real-world Hybrid Coordinate Ocean Model (HYCOM) application to demonstrate the significant performance gains that can be realized on many-core architectures. The HYCOM application performs a large amount of data reads when launched, before beginning intensive compute operations to analyze the data. If the I/O process on a many-core architecture is slow, it extends the total run time and is a primary bottleneck to increasing application throughput. With the proof-of-concept solution that will be presented, the I/O time for this application was significantly improved and the application’s performance was no longer restricted by I/O operations.

This development is being targeted for the community 2.10 release and full details can be read in the JIRA ticket - https://jira.hpdd.intel.com/browse/LU-8964