Tuning codes for GPGPU architectures is challenging because few
performance tools can pinpoint the exact causes of execution
bottlenecks. While profiling applications can reveal execution behavior
with a particular architecture, the abundance of collected information
can also overwhelm the user. Moreover, performance counters provide
cumulative values but does not attribute events to code regions,
whichmakes identifying performance hot spots difficult. This research
focuses on characterizing the behavior of GPU application kernels and
its performance at the node level by providing a visualization and
metrics display that indicates the behavior of the application with
respect to the underlying architecture. We demonstrate the effectiveness
of our techniques with LAMMPS and LULESH application case studies on a
variety of GPU architectures. By sampling instruction mixes for kernel
execution runs, we reveal a variety of intrinsic program characteristics
relating to computation, memory and control flow.