Abstract in another language

During the last decades, parallel computer systems with tremendous computational power, thousands of processors and deep and complex memory hierarchies have been developed. Even though modern parallel architectures provide great opportunities to solve computationally intensive problems, their complexity confronts software developers with immense challenges, since the performance of programs strongly depends on the characteristics of the target platform, such as multi-core processor design and cache architecture. To
obtain optimal performance, applications would have to be tuned for each specific target architecture. But the rapidly growing variety of parallel platforms and the fast time to market makes such manual tuning a time-consuming and costly process. The problem of manual tuning is also compounded by the fact that the performance of many programs depends on input data. Consequently, such programs have to be tuned for every possible set of input data.
Auto-tuning is a promising way to avoid manual tuning. The key idea of auto-tuning is the generation of several implementation variants of an algorithm based on program transformations and optimization techniques such as loop
interchange, loop tiling, and scheduling. Then, the variant with the best performance on the target machine is selected from the set of
generated implementation variants.
This work targets auto-tuning for the efficient solution
of initial value problems (IVPs) of systems of ordinary differential equations (ODEs) on modern computer systems.
As an example for solution methods for IVPs, a class of explicit predictor--corrector methods of Runge--Kutta type is considered, which possesses a deeply nested loop
structure with potential for different loop transformations and different types of parallelism.
In this work online auto-tuning algorithms for the sequential and the parallel execution of a given method are presented. Online auto-tuning is supported by offline benchmarks and exploits the time-stepping nature of ODE
methods to selects suitable parameters for variants and the best implementation variant from a candidate pool at runtime during the first time steps.
The auto-tuning algorithms include the selection of suitable tile sizes for implementation variants containing tiled loops. Suitable tile sizes are selected by a combination of an analytical model, based on the working spaces of the loops and regarding the cache organization of the target platform, and an empirical search.