Irregular applications, i.e., programs that manipulate pointer-based
data structures such as graphs and trees, constitute a challenging target for
parallelization because the amount of parallelism is input dependent and changes
dynamically. Traditional dependence analysis techniques are too conservative to
expose this parallelism. Even manual parallelization is difficult, time consuming,
and error prone. The Galois system parallelizes such applications using an
optimistic approach that exploits higher-level semantics of abstract data types.

In this paper, we study the performance and scalability of a Galoised, i.e., automatically parallelized, version of Delaunay mesh refinement (DR) on a
shared-memory system with 128 CPUs. DR is an important irregular application
that is used, e.g., in graphics and finite-element codes. The parallelized program
scales to 64 threads, where it reaches a speedup of 25.8. For large numbers of
threads, the performance is hampered by the load imbalance and the nonuniform
memory latency, both of which grow as the number of threads increases. While
these two issues will have to be addressed in future work, we believe our results
already show the Galois approach to be very promising.