It is expected that many breakthroughs in computational science and engineering (CSE) are imminent with the computational power brought to bear upon critical problems in CSE. Yet, parallel programming is an increasingly challenging task, with new considerations such as heterogeneity, power, temperature, and load imbalances. This is further exacerbated by the trend towards multi-physics codes, and increasing use of sophisticated algorithms in CSE as higher resolution simulations are attempted. I will present a parallel programming approach that we have developed over the past twenty years and discuss its successes and its utility in this context.
The foundational ideas in this approach are over-decomposition, message-driven execution, and migratibility of work-units and data-units. These features of the programming model empower an adaptive runtime system (RTS) which controls the assignment of these units to processors, and their scheduling. This separates performance concerns by automating load balancing and communication optimizations, and supports latency tolerance, compositionality, modularity, and fault tolerance. Charm++ and AMPI are programming systems based on this RTS. More recently, we have developed higher level notations that capture common patterns of interactions. I will argue that such a model is necessary as we go to exascale.
I will describe our experience with applications collaboratively developed using this approach that have scaled to more than 200,000 processors and are used routinely by scientists. These include (1) NAMD, used for biomolecular simulations, (2) OpenAtom, a Car-Parinello code for quantum mechanical simulations used for modeling nanotechnology and material properties, (3) ChaNGa, for computational astronomy, as well as other applications including BRAMS, a weather forecasting code. I will illustrate various aspects of our programming model with examples from these and other new applications.