Abstract

Petascale systems will present several new challenges to
performance and correctness tools. Such machines may contain
millions of cores, requiring that tools use scalable data
structures and analysis algorithms to collect and to process
application data. In addition, at such scales, each tool itself
will become a large parallel application – already, debugging
the full BlueGene/L (BG/L) installation at the Lawrence
Livermore National Laboratory requires employing 1664 tool
daemons. To reach such sizes and beyond, tools must use a
scalable communication infrastructure and manage their own tool
processes efficiently. Some system resources, such as the file
system, may also become tool bottlenecks.

In this paper, we present challenges to petascale tool
development, using the Stack Trace Analysis Tool (STAT) as a
case study. STAT is a lightweight tool that gathers and merges
stack traces from a parallel application to identify process
equivalence classes. We use results gathered at thousands of
tasks on an Infiniband cluster and results up to 208K processes
on BG/L to identify current scalability issues as well as
challenges that will be faced at the petascale. We then present
implemented solutions to these challenges and show the resulting
performance improvements. We also discuss future plans to meet
the debugging demands of petascale machines.