Abstract

Providing a sufficient voltage/frequency (V/F) scaling range is critical for effective power management. However, it has been fraught with decreasing nominal operating voltage and increasing manufacturing process variability that makes it harder to scale the minimum operating voltage (VMIN). In this paper, we first present a resource and core scaling (RCS) technique that jointly scales (i) the resources of a processor and (ii) the number of operating cores to maximize the performance of power-constrained multi-core processors. More specifically, we uniformly scale the resources that are both associated with each core (e.g., L1 caches and execution units (EUs)) and shared by all the cores (e.g., last-level cache (LLC)) as a means to compensate for lack of a V/F scaling range. Under the maximum power constraint, disabling some resources allows us to increase the number of operating cores, and vice versa. We demonstrate that the best RCS configuration for a given application can improve the geometric-mean performance by 21%. Second, we propose a runtime system that predicts the best RCS configuration for a given application and adapts the processor configuration accordingly at runtime. The runtime system only needs to examine a small fraction of runtime to predict the best RCS configuration with accuracy well over 90%, whereas the runtime overhead of prediction and adaptation is small. Finally, we propose to selectively scale the resources in RCS (dubbed sRCS) depending on application's characteristics and demonstrate that sRCS can offer 6% higher geometric-mean performance than RCS that uniformly scales the resources.

Original language

English (US)

Title of host publication

PACT 2014 - Proceedings of the 23rd International Conference on Parallel Architectures and Compilation Techniques

abstract = "Providing a sufficient voltage/frequency (V/F) scaling range is critical for effective power management. However, it has been fraught with decreasing nominal operating voltage and increasing manufacturing process variability that makes it harder to scale the minimum operating voltage (VMIN). In this paper, we first present a resource and core scaling (RCS) technique that jointly scales (i) the resources of a processor and (ii) the number of operating cores to maximize the performance of power-constrained multi-core processors. More specifically, we uniformly scale the resources that are both associated with each core (e.g., L1 caches and execution units (EUs)) and shared by all the cores (e.g., last-level cache (LLC)) as a means to compensate for lack of a V/F scaling range. Under the maximum power constraint, disabling some resources allows us to increase the number of operating cores, and vice versa. We demonstrate that the best RCS configuration for a given application can improve the geometric-mean performance by 21%. Second, we propose a runtime system that predicts the best RCS configuration for a given application and adapts the processor configuration accordingly at runtime. The runtime system only needs to examine a small fraction of runtime to predict the best RCS configuration with accuracy well over 90%, whereas the runtime overhead of prediction and adaptation is small. Finally, we propose to selectively scale the resources in RCS (dubbed sRCS) depending on application's characteristics and demonstrate that sRCS can offer 6% higher geometric-mean performance than RCS that uniformly scales the resources.",

N2 - Providing a sufficient voltage/frequency (V/F) scaling range is critical for effective power management. However, it has been fraught with decreasing nominal operating voltage and increasing manufacturing process variability that makes it harder to scale the minimum operating voltage (VMIN). In this paper, we first present a resource and core scaling (RCS) technique that jointly scales (i) the resources of a processor and (ii) the number of operating cores to maximize the performance of power-constrained multi-core processors. More specifically, we uniformly scale the resources that are both associated with each core (e.g., L1 caches and execution units (EUs)) and shared by all the cores (e.g., last-level cache (LLC)) as a means to compensate for lack of a V/F scaling range. Under the maximum power constraint, disabling some resources allows us to increase the number of operating cores, and vice versa. We demonstrate that the best RCS configuration for a given application can improve the geometric-mean performance by 21%. Second, we propose a runtime system that predicts the best RCS configuration for a given application and adapts the processor configuration accordingly at runtime. The runtime system only needs to examine a small fraction of runtime to predict the best RCS configuration with accuracy well over 90%, whereas the runtime overhead of prediction and adaptation is small. Finally, we propose to selectively scale the resources in RCS (dubbed sRCS) depending on application's characteristics and demonstrate that sRCS can offer 6% higher geometric-mean performance than RCS that uniformly scales the resources.

AB - Providing a sufficient voltage/frequency (V/F) scaling range is critical for effective power management. However, it has been fraught with decreasing nominal operating voltage and increasing manufacturing process variability that makes it harder to scale the minimum operating voltage (VMIN). In this paper, we first present a resource and core scaling (RCS) technique that jointly scales (i) the resources of a processor and (ii) the number of operating cores to maximize the performance of power-constrained multi-core processors. More specifically, we uniformly scale the resources that are both associated with each core (e.g., L1 caches and execution units (EUs)) and shared by all the cores (e.g., last-level cache (LLC)) as a means to compensate for lack of a V/F scaling range. Under the maximum power constraint, disabling some resources allows us to increase the number of operating cores, and vice versa. We demonstrate that the best RCS configuration for a given application can improve the geometric-mean performance by 21%. Second, we propose a runtime system that predicts the best RCS configuration for a given application and adapts the processor configuration accordingly at runtime. The runtime system only needs to examine a small fraction of runtime to predict the best RCS configuration with accuracy well over 90%, whereas the runtime overhead of prediction and adaptation is small. Finally, we propose to selectively scale the resources in RCS (dubbed sRCS) depending on application's characteristics and demonstrate that sRCS can offer 6% higher geometric-mean performance than RCS that uniformly scales the resources.