Tools

"... This paper describes the new features available in the Sim-Point 3.0 release. The release provides two techniques for drastically reducing the run-time of SimPoint: faster searching to find the best clustering, and efficiently clustering large numbers of intervals. SimPoint 3.0 also provides an opti ..."

This paper describes the new features available in the Sim-Point 3.0 release. The release provides two techniques for drastically reducing the run-time of SimPoint: faster searching to find the best clustering, and efficiently clustering large numbers of intervals. SimPoint 3.0 also provides an option to output only the simulation points that represent the majority of execution, which can reduce simulation time without much increase in error. Finally, this release provides support for correctly clustering variable length intervals, taking into consideration the weight of each interval during clustering. This paper describes SimPoint 3.0’s new features, how to use them, and points out some common pitfalls. 1

...ults in this paper we use an interval size of 10 million instructions. 4.1.1 Support for Variable Length Intervals Ideally we should align interval boundaries with the code structure of a program. In =-=[24]-=-, we examine an algorithm to produce variable length intervals aligned with the procedure call, return and loop transition boundaries found in code. A Variable Length Interval (VLI) is represented by ...

by
Priya Nagpurkar, Michael Hind Ch
- In The International Symposium on Code Generation and Optimization, 2006

"... Today’s virtual machines (VMs) dynamically optimize an application as it is executing, often employing optimizations that are specialized for the current execution profile. An online phase detector determines when an executing program is in a stable period of program execution (a phase) or is in tra ..."

Today’s virtual machines (VMs) dynamically optimize an application as it is executing, often employing optimizations that are specialized for the current execution profile. An online phase detector determines when an executing program is in a stable period of program execution (a phase) or is in transition. A VM using an online phase detector can apply specialized optimizations during a phase or reconsider optimization decisions between phases. Unfortunately, extant approaches to detecting phase behavior rely on either offline profiling, hardware support, or are targeted toward a particular optimization. In this work, we focus on the enabling technology of online phase detection. More specifically, we contribute (a) a novel framework for online phase detection, (b) multiple instantiations of the framework that produce novel online phase detection algorithms, (c) a novel client- and machine-independent baseline methodology for evaluating the accuracy of an online phase detector, (d) a metric to compare online detectors to this baseline, and (e) a detailed empirical evaluation, using Java applications, of the accuracy of the numerous phase detectors. 1

"... An essential step in designing a new computer architecture is the careful examination of different design options. It is critical that computer architects have efficient means by which they may estimate the impact of various design options on the overall machine. This task is complicated by the f ..."

An essential step in designing a new computer architecture is the careful examination of different design options. It is critical that computer architects have efficient means by which they may estimate the impact of various design options on the overall machine. This task is complicated by the fact that different programs, and even different parts of the same program, may have distinct behaviors that interact with the hardware in different ways. Researchers use very detailed simulators to estimate processor performance, which models every cycle of an executing program. Unfortunately,

...xecution using fixed length intervals. Programs exhibit patterns of repetitive behavior, and these patterns are largely due to procedure call and looping behavior. Our software phase marker approach (=-=Lau et al., 2006-=-) detects recurring call chains and looping patterns and identifies the source code instructions to which they correspond. We then mark specific procedure calls and loop branches, so that when they oc...

"... CPU vendors are starting to explore trade offs between die size, number of cores on a die, and power consumption leading to performance asymmetry among cores on a single chip. For efficient utilization of these performanceasymmetric multi-core processors, application threads must be assigned to core ..."

CPU vendors are starting to explore trade offs between die size, number of cores on a die, and power consumption leading to performance asymmetry among cores on a single chip. For efficient utilization of these performanceasymmetric multi-core processors, application threads must be assigned to cores such that the resource needs of a thread closely matches resource availability at the assigned core. This significantly complicates the task of an average programmer. The contribution of this work is a technique for automatically determining the mapping between threads and performance-asymmetric cores of a processor. Our approach, which we call phase-guided thread-to-core assignment, builds on a well-known insight that programs exhibit phase behavior. We first take code sections and group them into clusters such that each section in a cluster is likely to exhibit similar runtime characteristics. The key idea is that with this clustering, characteristics of a small number of representative sections in a cluster give insight into the behavior of the entire cluster. Thus the exhibited characteristics of the representative sections on different types of cores can be used for automating thread-to-core assignment at a lower runtime cost. Variations of our technique show up to an average 150 % improvement in throughput over the stock Linux scheduler for systems with a constant feed of jobs, while maintaining comparable fairness and efficiency. 1.

...entified as phase-transition points. Each phase-transition point is statically instrumented to insert a small code fragment, phase mark. The idea of phase marking is similar to the work by Lau et al. =-=[19]-=-, however, we do not use a program trace to determine our phase marks and make our selections based on a different criteria. A phase mark contains information about the phase type for the current sect...

"... Many programs go through phases as they execute. Knowing where these phases begin and end can be beneficial. For example, adaptive architectures can exploit such information to lower their power consumption without much loss in performance. Architectural simulations can benefit from phase informatio ..."

Many programs go through phases as they execute. Knowing where these phases begin and end can be beneficial. For example, adaptive architectures can exploit such information to lower their power consumption without much loss in performance. Architectural simulations can benefit from phase information by simulating only a small interval of each program phase, which significantly reduces the simulation time while still yielding results that are representative of complete simulations. This paper presents a lightweight profile-based phase detection technique that marks each phase change boundary in the program’s binary at the basic block level with a critical basic block transition (CBBT). It is independent of execution windows and does not explicitly employ the notion of threshold to make a phase change decision. We evaluate the effectiveness of CBBTs for reconfiguring the L1 data cache size and for guiding architectural simulations. Our CBBT method is as effective at dynamically reducing the L1 data cache size as idealized cache reconfiguration schemes are. Using CBBTs to statically determine simulation intervals yields as low a CPI error as the well-known SimPoint method does. In addition, experimental results indicate the CBBTs ’ effectiveness in both the self-trained and cross-trained inputs, demonstrating the CBBTs ’ stability across different program inputs. BB24: do { BB24: a[i] = 3*a[i];

...critical basic block transitions (CBBTs). CBBTs mark phase transition points in the program’s binary and are used to delineate the program phases. A CBBT can be thought of as a program’s phase marker =-=[9, 15]-=- that requires two reference points, a previous and a next BB, to signal a phase change. To motivate MTPD and CBBTs, consider the code in Figure 1 once more. The BB working set of the first loop is {B...

"... Abstract—The latest trend towards performance asymmetry among cores on a single chip of a multicore processor is posing new challenges. For effective utilization of these performanceasymmetric multicore processors, code sections of a program must be assigned to cores such that the resource needs of ..."

Abstract—The latest trend towards performance asymmetry among cores on a single chip of a multicore processor is posing new challenges. For effective utilization of these performanceasymmetric multicore processors, code sections of a program must be assigned to cores such that the resource needs of code sections closely matches resource availability at the assigned core. Determining this assignment manually is tedious, error prone, and significantly complicates software development. To solve this problem, we contribute a transparent and fully-automatic process that we call phase-based tuning which adapts an application to effectively utilize performance-asymmetric multicores. Compared to the stock Linux scheduler we see a 36 % average process speedup, while maintaining fairness and with negligible overheads. I.

...nt for a phase type, all future phase marks for that phase type reduce to simply making appropriate core switching decisions 2 . Thus, 1 The idea of phase marking is similar to the work by Lau et al. =-=[13]-=-, however, we do not use a program trace to determine our phase marks and make our selections based on a different criteria. 2 Huang et al. [14] show that basing processor adaptation on code sections ...

"... Utility programs, which perform similar and largely independent operations on a sequence of inputs, include such common applications as compilers, interpreters, and document parsers; databases; and compression and encoding tools. The repetitive behavior of these programs, while often clear to users, ..."

Utility programs, which perform similar and largely independent operations on a sequence of inputs, include such common applications as compilers, interpreters, and document parsers; databases; and compression and encoding tools. The repetitive behavior of these programs, while often clear to users, has been difficult to capture automatically. We present an active profiling technique in which controlled inputs to utility programs are used to expose execution phases, which are then marked, automatically, through binary instrumentation, enabling us to exploit phase transitions in production runs with arbitrary inputs. Experiments with five programs from the SPEC benchmark suites show that phase behavior is surprisingly predictable in many (though not all) cases. This predictability can in turn be used for optimized memory management leading to significant performance improvement. 1.

...mply a better program or a better system. Depending on the use, one type of phase may be better than another type. Program phase analysis takes a loop, subroutine, or other code structures as a phase =-=[3, 16, 20, 23, 25, 26]-=-. For this experiment, we mainly consider procedure phases and follow the scheme given by Huang et al., who picked subroutines by two thresholds, θweightsIPC IPC 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 2 1....

...icking per-binary simulation points. SimPoint 3.0 provides support for Variable Length Intervals (VLIs), which allows intervals to represent different amounts of executed instructions as described in =-=[4, 5]-=-. Prior work has not proposed a good method to break a program into VLIs for architecture simulation. The approach we provide in this paper, examined in Section 3, provides the first usable approach f...

"... It is well known that a program execution exhibits time-varying behavior, i.e., a program typically goes through a number of phases during its execution with each phase exhibiting relatively homogeneous behavior within a phase and distinct behavior across phases. In fact, several recent research stu ..."

It is well known that a program execution exhibits time-varying behavior, i.e., a program typically goes through a number of phases during its execution with each phase exhibiting relatively homogeneous behavior within a phase and distinct behavior across phases. In fact, several recent research studies have been exploiting this time-varying behavior for various purposes. This paper proposes phase complexity surfaces to characterize a computer program’s phase behavior across various time scales in an intuitive manner. The phase complexity surfaces incorporate metrics that characterize phase behavior in terms of the number of phases, its predictability, the degree of variability within and across phases, and the phase behavior’s dependence on the time scale granularity. 1