Scientists and engineers always want faster and faster computers, and in general faster computers require more energy. With rising energy costs, there is an urgent need for energy-efficient computing at many different levels. This tutorial focuses on energy-aware resource management in heterogeneous parallel and distributed computing systems. We address the problem of assigning tasks to machines in a heterogeneous computing environment that is a collection of machines with different computational capabilities and energy-usage characteristics. These machines execute a workload composed of different tasks, where the tasks have diverse computational and energy requirements. The execution time and energy consumption of each task on each machine is based on how the task’s computational requirements interact with the machine’s capabilities.

In heterogeneous parallel and distributed computing systems, a critical research problem is energy-aware allocation of resources to tasks to optimize some performance objective, possibly under a given constraint. Often, these allocation decisions must be made when there is uncertainty in relevant system parameters, such as the data-dependent execution time of a given task on a given machine. It is important for system performance to be robust against uncertainty. We have designed models for defining, deriving, and quantifying the degree of robustness of a resource allocation using history-based stochastic (probabilistic) information about the execution times of tasks on different machines.

Energy-aware resource allocation heuristics for several example environments will be presented. The first two involve static heuristics, which are executed off-line, where a collection of independent tasks (“bag-of-tasks”) is to be assigned to machines in a heterogeneous computing system. For the first of these environments, the goal is to minimize the energy needed to complete these tasks given a robustness constraint based on meeting a common deadline. The second environment is the design of resource allocation heuristics that maximize robustness, which in this case is the probability that our workload finishes by a common deadline, while maintaining a specified probability that the energy consumed is within a given energy budget constaint.

The second two environments involve dynamic heuristics, which are executed on-line for situations where tasks must be assigned resources as they arrive into the system. For the first of these environments, the goal is to complete as many tasks as possible by their individual deadlines, with a constraint on total energy consumption. In the second example dynamic environment, each task has a utility function associated with it, which is monotonically decreasing over time. This utility function represents the value of a task based on the task’s completion time, and the goal of the heuristics is to maximize the total utility earned from all task completions over an interval of time while satisfying an energy constraint.

Finally, we provide an analysis framework that will allow a system administrator to investigate the tradeoffs between minimizing system energy consumption and optimizing the computing performance achieved by a system, typically two conflicting goals. This can be modeled as a bi-objective optimization problem. We present two examples of a method to create a set of different resource allocations that illustrate the tradeoffs. One is where the performance metric is the system utility described above, which is to be maximized. For the second example, the performance metric is makespan, the total amount of time it takes for all the tasks in a batch to finish executing, which is to be minimized.

The resource management approaches presented can be applied to a variety of computing and communication system environments, including parallel, distributed, cluster, grid, Internet, cloud, embedded, multicore, content distribution networks, wireless networks, and sensor networks. Furthermore, the approaches can be used with many different system performance metrics and constraints.

create energy-aware resource management methods that use system energy consumption as a performance measure or constraint

learn how to use bi-objective optimization to derive sets of resource allocation solutions that can be used to analyze the tradeoffs between the conflicting goals of minimizing energy consumption and optimizing system computing performance

INTENDED AUDIENCE

This course is intended for faculty, graduate students, engineers, and scientists who want to learn how to model and manage resources in parallel and distributed computing systems (including clusters and clouds) in a way that is energy-aware. In particular, energy can be used as a constraint when trying to optimize a system computing performance metric, or energy can be optimized while meeting a computing performance constraint goal.

BIOGRAPHY OF INSTRUCTOR

H. J. Siegel has been the George T. Abell Endowed Chair Distinguished Professor of Electrical and Computer Engineering at Colorado State University (CSU) since 2001, where he is also a Professor of Computer Science. From 2002 to 2013, he was the first Director of the CSU Information Science and Technology Center (ISTeC), a university-wide organization for enhancing CSU’s activities pertaining to the design and innovative application of computer, communication, and information systems. From 1976 to 2001, he was a professor in the School of Electrical and Computer Engineering at Purdue University. He received two B.S. degrees from the Massachusetts Institute of Technology (MIT), and the M.A., M.S.E., and Ph.D. degrees from Princeton University. He is a Fellow of the IEEE and a Fellow of the ACM. Prof. Siegel has co-authored over 420 published technical papers in the areas of parallel and distributed computing and communications, which have been cited over 12,000 times according to Google Scholar. He was a Coeditor-in-Chief of the Journal of Parallel and Distributed Computing, and was on the Editorial Boards of the IEEE Transactions on Parallel and Distributed Systems and the IEEE Transactions on Computers.. For more information, please see www.engr.colostate.edu/~hj