Parallel programming is certainly more difficult than sequential programming because of the additional issues one has to deal with in a parallel program. One has to decide what computations to perform in parallel, which processors will per- form which computations, and how to distribute the data, at least in the predominant distributed memory programming models. To alleviate this difficulty, one should try to automate a parallel programming effort as much as possible. But which aspects of parallel programming should be automated? A reasonable strategy is to automate what the “system” can do well, while leaving what the programmers can do well to them. This “optimal division of labor between the system and the programmer” is a guiding principle behind Charm++. In particular, we believe that the programmers can decide what to do in parallel, while the runtime system should decide the assignment of data and computations to processors, and the order in which processors will execute their assigned work. This approach has led to Charm++, an intuitive parallel programming system, with a clear cost model, and a highly adaptive, intelligent runtime system (RTS), along with an ambitious research agenda for runtime optimization.
In this study, we first briefly describe Charm++ and Adaptive MPI (AMPI)— an implementation of MPI that leverages the adaptive RTS of Charm++. We then describe multiple adaptive aspects of Charm++/AMPI, including adaptive overlap of communication and computation, dynamic measurement-based load balancing, communication optimizations, adaptations to available memory, and fault tolerance. We then present an overview of a few applications and describe some recent higher level parallel programming abstractions built using Charm++.