Publication Year

Authors

BibTeX

Abstract

In today's multi-core systems, cache contention due to true and false sharing can
cause unexpected and significant performance degradation. A detailed understanding
of a given multi-threaded application's behavior is required to precisely identify
such performance bottlenecks. Traditionally, however, such diagnostic information
can only be obtained after lengthy simulation of the memory hierarchy.

In this paper, we present a novel approach that efficiently analyzes interactions
between threads to determine thread correlation and detect true and false
sharing. It is based on the following key insight: although the slowdown caused
by cache contention depends on factors including the thread-to-core binding and
parameters of the memory hierarchy, the amount of data sharing is primarily a
function of the cache line size and application behavior. Using memory shadowing
and dynamic instrumentation, we implemented a tool that obtains detailed sharing
information between threads without simulating the full complexity of the memory
hierarchy. The runtime overhead of our approach --- a 5x slowdown on average
relative to native execution --- is significantly less than that of detailed
cache simulation. The information collected allows programmers to identify the
degree of cache contention in an application, the correlation among its threads,
and the sources of significant false sharing. Using our approach, we were able to
improve the performance of some applications up to a factor of 12x. For other
contention-intensive applications, we were able to shed light on the obstacles
that prevent their performance from scaling to many cores.