Automatic parallelization for clusters is a promising alternative to
time-consuming, error-prone manual parallelization. However, automatic
parallelization is frequently limited by the imprecision of static analysis.
Moreover, due to the inherent fragility of static analysis, small changes to
the source code can significantly undermine performance. By replacing static
analysis with speculation and profiling, automatic parallelization becomes more
robust and applicable. A naive automatic speculative parallelization does not
scale for distributed memory clusters, due to the high bandwidth required to
validate speculation. This work is the first automatic speculative DOALL
(Spec-DOALL) parallelization system for clusters. We have implemented a
prototype automatic parallelization system, called Cluster Spec-DOALL, which
consists of a Spec-DOALL parallelizing compiler and a speculative runtime for
clusters. Since the compiler optimizes communication patterns, and the runtime
is optimized for the cases in which speculation succeeds, Cluster Spec-DOALL
minimizes the communication and validation overheads of the speculative
runtime. Across 8 benchmarks, Cluster Spec-DOALL achieves a geomean speedup of
43.8x on a 120-core cluster, whereas DOALL without speculation achieves only
4.5x speedup. This demonstrates that speculation makes scalable fully-automatic
parallelization for clusters possible.