Recent forecasts in high-performance computing predict that programming models of the future will be asynchronous in nature. However, opportunistic execution of available work can lead to interference with segments of the computation that should execute synchronously.

This paper describes a scheduling methodology that tightly synchronizes parts of an otherwise asynchronous parallel algorithm to obtain higher performance. Specifically, we apply exclusive scheduling classes to both asynchronous collectives and application-specific work units.

Our exploration of exclusive scheduling classes and other techniques arises from implementing a dense LU solver in a message-driven programming model and scaling it on modern supercomputers. The other techniques include mapping schemes beyond the traditional block-cyclic distribution and a method for decreasing network contention by ad-hoc agglomeration of data requests. Our findings suggest that future programming models will be hybrid models: asynchrony is beneficial, but these models must incorporate mechanisms that allow highly synchronous operations to perform efficiently.