Abstract

The design of high-performance computing architectures demands performance analysis of large-scale parallel applications to derive various parameters concerning hardware design and software development. The process of performance analysis and benchmarking an application can be done in several ways with varying degrees of fidelity. One of the most cost-effective ways is to do a coarse-grained study of large-scale parallel applications through the use of program skeletons. The concept of a “program skeleton” that we discuss in this article is an abstracted program that is derived from a larger program where source code that is determined to be irrelevant is removed for the purposes of the skeleton. In this work, we develop a semiautomatic approach for extracting program skeletons based on compiler program analysis. Finally, we demonstrate correctness of our skeleton extraction process by comparing details from communication traces, as well as show the performance speedup of using skeletons by running simulations in the SST/macro simulator.

@article{osti_1183091,
title = {Static analysis techniques for semiautomatic synthesis of message passing software skeletons},
author = {Sottile, Matthew and Dagit, Jason and Zhang, Deli and Hendry, Gilbert and Dechev, Damian},
abstractNote = {The design of high-performance computing architectures demands performance analysis of large-scale parallel applications to derive various parameters concerning hardware design and software development. The process of performance analysis and benchmarking an application can be done in several ways with varying degrees of fidelity. One of the most cost-effective ways is to do a coarse-grained study of large-scale parallel applications through the use of program skeletons. The concept of a “program skeleton” that we discuss in this article is an abstracted program that is derived from a larger program where source code that is determined to be irrelevant is removed for the purposes of the skeleton. In this work, we develop a semiautomatic approach for extracting program skeletons based on compiler program analysis. Finally, we demonstrate correctness of our skeleton extraction process by comparing details from communication traces, as well as show the performance speedup of using skeletons by running simulations in the SST/macro simulator.},
doi = {10.1145/2778888},
journal = {ACM Transactions on Modeling and Computer Simulation},
number = 1,
volume = 26,
place = {United States},
year = {2015},
month = {6}
}

The need to process streaming data, which arrives continuously at high-volume in real-time, arises in a variety of contexts including data produced by experiments, collections of environmental or network sensors, and running simulations. Streaming data can also be formulated as queries or transactions which operate on a large dynamic data store, e.g. a distributed database. We describe a lightweight, portable framework named PHISH which enables a set of independent processes to compute on a stream of data in a distributed-memory parallel manner. Datums are routed between processes in patterns defined by the application. PHISH can run on top of eithermore » message-passing via MPI or sockets via ZMQ. The former means streaming computations can be run on any parallel machine which supports MPI; the latter allows them to run on a heterogeneous, geographically dispersed network of machines. We illustrate how PHISH can support streaming MapReduce operations, and describe streaming versions of three algorithms for large, sparse graph analytics: triangle enumeration, subgraph isomorphism matching, and connected component finding. Lastly, we also provide benchmark timings for MPI versus socket performance of several kernel operations useful in streaming algorithms.« less

Event traces are valuable for understanding the behavior of parallel programs. However, automatically analyzing a large parallel trace is difficult, especially without a specific objective. We aid this endeavor by extracting a trace's logical structure, an ordering of trace events derived from happened-before relationships, while taking into account developer intent. Using this structure, we can calculate an operation's delay relative to its peers on other processes. The logical structure also serves as a platform for comparing and clustering processes as well as highlighting communication patterns in a trace visualization. We present an algorithm for determining this idealized logical structure frommore » traces of message passing programs, and we develop metrics to quantify delays and differences among processes. We implement our techniques in Ravel, a parallel trace visualization tool that displays both logical and physical timelines. Rather than showing the duration of each operation, we display where delays begin and end, and how they propagate. As a result, we apply our approach to the traces of several message passing applications, demonstrating the accuracy of our extracted structure and its utility in analyzing these codes.« less

Here, we present Magellan/IMACS, Anglo-Australian Telescope/AAOmega+2dF, and Very Large Telescope/GIRAFFE+FLAMES spectroscopy of the Carina II (Car II) and Carina III (Car III) dwarf galaxy candidates, recently discovered in the Magellanic Satellites Survey (MagLiteS). We identify 18 member stars in Car II, including two binaries with variable radial velocities and two RR Lyrae stars. The other 14 members have a mean heliocentric velocitymore » $${v}_{\mathrm{hel}}=477.2\pm 1.2$$ $$\mathrm{km}\,{{\rm{s}}}^{-1}$$ and a velocity dispersion of $${\sigma }_{v}={3.4}_{-0.8}^{+1.2}$$ $$\mathrm{km}\,{{\rm{s}}}^{-1}$$. Assuming Car II is in dynamical equilibrium, we derive a total mass within the half-light radius of $${1.0}_{-0.4}^{+0.8}\times {10}^{6}$$ $${M}_{\odot }$$, indicating a mass-to-light ratio of $${369}_{-161}^{+309}$$ $${M}_{\odot }$$/$${L}_{\odot }$$. From equivalent width measurements of the calcium triplet lines of nine red giant branch (RGB) stars, we derive a mean metallicity of $${\rm{[Fe/H]}}=-2.44\pm 0.09$$ with dispersion $${\sigma }_{{\rm{[Fe/H]}}}={0.22}_{-0.07}^{+0.10}$$. Considering both the kinematic and chemical properties, we conclude that Car II is a dark-matter-dominated dwarf galaxy. For Car III, we identify four member stars, from which we calculate a systemic velocity of $${v}_{\mathrm{hel}}={284.6}_{-3.1}^{+3.4}$$ $$\mathrm{km}\,{{\rm{s}}}^{-1}$$. The brightest RGB member of Car III has a metallicity of $${\rm{[Fe/H]}}\,=-1.97\pm 0.12$$. Due to the small size of the Car III spectroscopic sample, we cannot conclusively determine its nature. Although these two systems have the smallest known physical separation ($${\rm{\Delta }}d\sim 10\,\mathrm{kpc}$$) among Local Group satellites, the large difference in their systemic velocities, $$\sim 200\,\mathrm{km}\,{{\rm{s}}}^{-1}$$, indicates that they are unlikely to be a bound pair. One or both systems are likely associated with the Large Magellanic Cloud (LMC), and may remain LMC satellites today. No statistically significant excess of γ-ray emission is found at the locations of Car II and Car III in eight years of Fermi-LAT data.« less

We present Magellan/IMACS, Anglo-Australian Telescope/AAOmega+2dF, and Very Large Telescope/GIRAFFE+FLAMES spectroscopy of the Carina II (Car II) and Carina III (Car III) dwarf galaxy candidates, recently discovered in the Magellanic Satellites Survey (MagLiteS). We identify 18 member stars in Car II, including two binaries with variable radial velocities and two RR Lyrae stars. The other 14 members have a mean heliocentric velocity and a velocity dispersion of . Assuming Car II is in dynamical equilibrium, we derive a total mass within the half-light radius of , indicating a mass-to-light ratio of /. From equivalent width measurements of the calcium triplet lines of nine red giant branch (RGB)more » stars, we derive a mean metallicity of with dispersion . Considering both the kinematic and chemical properties, we conclude that Car II is a dark-matter-dominated dwarf galaxy. For Car III, we identify four member stars, from which we calculate a systemic velocity of . The brightest RGB member of Car III has a metallicity of . Due to the small size of the Car III spectroscopic sample, we cannot conclusively determine its nature. Although these two systems have the smallest known physical separation () among Local Group satellites, the large difference in their systemic velocities, , indicates that they are unlikely to be a bound pair. One or both systems are likely associated with the Large Magellanic Cloud (LMC), and may remain LMC satellites today. No statistically significant excess of γ-ray emission is found at the locations of Car II and Car III in eight years of Fermi-LAT data.« less

Here, this paper explores key differences of MPI match lists for several important United States Department of Energy (DOE) applications and proxy applications. This understanding is critical in determining the most promising hardware matching design for any given high-speed network. The results of MPI match list studies for the major open-source MPI implementations, MPICH and Open MPI, are presented, and we modify an MPI simulator, LogGOPSim, to provide match list statistics. These results are discussed in the context of several different potential design approaches to MPI matching–capable hardware. The data illustrate the requirements for different hardware designs in terms ofmore » performance and memory capacity. Finally, this paper's contributions are the collection and analysis of data to help inform hardware designers of common MPI requirements and highlight the difficulties in determining these requirements by only examining a single MPI implementation.« less