Transcription

1 iswitch: Coordinating and Optimizing Renewable Energy Powered Server Clusters Chao Li, Amer Qouneh and Tao Li Intelligent Design of Efficient Architectures Laboratory (IDEAL) Department of Electrical and Computer Engineering, University of Florida {chaol, Abstract Large-scale computing systems such as data centers are facing increasing pressure to cap their carbon footprint. Integrating emerging clean energy solutions into computer system design therefore gains great significance in the green computing era. While some pioneering work on tracking variable power budget show promising energy efficiency, they are not suitable for data centers due to lack of performance guarantee when renewable generation is low and fluctuant. In addition, our characterization of wind power behavior reveals that data centers designed to track the intermittent renewable power incur up to 4X performance loss due to inefficient and redundant load matching activities. As a result, mitigating operational overhead while still maintaining desired energy utilization becomes the most significant challenge in managing server clusters on intermittent renewable energy generation. In this paper we take a first step in digging into the operational overhead of renewable energy powered data center. We propose iswitch, a lightweight server power management that follows renewable power variation characteristics, leverages existing system infrastructures, and applies supply/load cooperative scheme to mitigate the performance overhead. Comparing with state-of-the-art renewable energy driven system design, iswitch could mitigate average network traffic by 75%, peak network traffic by 95%, and reduce 8% job waiting time while still maintaining 96% renewable energy utilization. We expect that our work can help computer architects make informed decisions on sustainable and high-performance system design.. Introduction Today, cloud computing is redefining the IT infrastructure. Data centers have become essential to the operation of businesses, academic, and governmental institutions. Nevertheless, the power-provisioning problem is challenging data center designers, as the environmental impact of IT becomes a growing concern worldwide. It has been shown that worldwide data centers run the risk of doubling their energy consumption every 5 years []. In a recent report on the carbon impact of cloud computing, the environmental activist group Greenpeace called on data center operators to make renewable energy a priority as more data centers are being built to meet the cloud needs [2]. In addition, the government also imposes a carbon tax on energy-hungry IT companies while giving federal tax credit (e.g., 3% to the total cost) for using renewable energy. Consequently, there has been an increasing push towards the vision of renewable energy powered sustainable computer system design. In this study, we investigate an emerging data center design scheme that integrates on-site renewable energy sources into the data center infrastructure. Such a design scheme has recently drawn considerable attention as the IT industry is starting to assume responsibility for supporting long-term computing system sustainability. Internet giants, such as Google, Microsoft and Yahoo! all power part of their data centers using renewable energy resources. Using grid utilities as backup, many Web Hosting service providers power their data centers with on-site renewable energy sources as well [3, 4, 5, 6]. With those green computing initiatives, each data center is able to eliminate nearly 2, lbs or more carbon dioxide emission per year [3, 4]. While many research efforts are focused on reducing idle server power [7], lowering provisioned power capacity [8], and optimizing power allocation [9,, ], designing renewable energy powered data centers is still challenging and requires careful exploration. Due to the intermittent nature of renewable energy, existing designs typically use on-site renewable generation to compensate part of their total data center power requirement. When renewable energy contributes a large portion of the load power demand (e.g., > 5%), variations in renewable power supply have a significant impact on load operation [2]. Existing power management schemes miss the opportunity to harvest renewable generation since they typically assume a fixed and predictable power supply and cannot handle the power variation gracefully. Recent proposals leverage load tuning mechanisms (e.g. DVFS, CPU power states) to track the renewable power variation [3, 4] but incur unnecessary load tuning activities. Over-tuning of the load power degrades system response time and provides very limited efficiency return. Figure shows the tradeoff we found between power tuning overhead and energy utilization. The evaluated system tracks the renewable power budget whenever renewable generation decreases (to avoid brownout); it tracks the power supply surge with a pre-defined coverage factor (CF). As shown in Figure -a, compared to no-tracking (i.e., CF=), always tracking the power variation (i.e., CF = ) increases the load tuning overhead by 2X. Nevertheless, the range of energy utilization return is less than 5%, as shown in Figure -b.

2 Control Overhead Coverage Factor Coverage Factor (a) Normalized control overhead (b) Average energy utilization Figure : Power management overhead vs. renewable energy utilization. Always tracking the power surge shows less than 5% energy utilization improvement but experiences 2X control overhead, which affects execution latency, downtime, and efficiency. Both figures show average value across four different renewable energy sites which are detailed in Section 6 In this paper, we explore the design tradeoffs between energy utilization and load tuning overhead in renewable energy driven computing systems. We propose iswitch, a power management scheme that maintains a desirable balance between renewable energy utilization and data center performance. The novelty of our design is two-fold. First, iswitch is a supply-aware power management scheme. It applies the appropriate power management strategy for wind variation scenarios to achieve the best design tradeoff. Second, iswitch has a built-in supply/load cooperative optimization mechanism that is able to minimize the performance degradation due to load power tuning overhead while still maintaining high renewable energy utilization and low cost. Although we describe our design in the context of wind power supply, it applies to a variety of intermittent renewable energy sources. This paper makes the following contributions: Design: We propose iswitch control architecture, an application-independent hierarchical load tuning scheme that leverages load migration to best utilize the renewable energy generation. Characterization: Our characterization of renewable power variability and data center load fluctuation reveals that power tracking can be done in a less frequent, light-weight manner. In this case, we can significantly reduce the load tuning overhead with negligible efficiency degradation. Optimization: We propose supply/load cooperative optimization that not only avoids redundant load tuning activities invoked by severe renewable power supply variation, but also minimizes unnecessary power control activities invoked by stochastic data center load fluctuation. Compared to the state-of-the-art renewable energy-driven designs, iswitch could reduce job waiting time by 8%, mitigate average network traffic by 75%, and rush hour traffic by 95%. Moreover, iswitch still maintains 96% of the energy efficiency. The rest of this paper is organized as follows: Section 2 provides background and motivation. Section 3 characterizes wind power variation. Section 4 presents an overview of iswitch and its control architecture. Section 5 introduces the two key constituents of iswitch optimization. Section 6 describes our evaluation framework. Section 7 presents experimental results. Section 8 discusses related work and Section 9 concludes the paper. Energy Utilization Background and Motivation Renewable energy supply (RES) is drawing growing attention in today s IT industry. In this section, we describe the motivations of renewable energy driven design and discuss why conventional solutions limit the development of sustainable IT. 2.. Renewable Energy Driven Design In spite of the intermittent nature of renewable energy sources, designing renewable energy powered data centers has many benefits beyond low carbon footprint. For instance, renewable power supplies are highly modular in that their capacity can be increased incrementally to match the gradual load growth [5, 6]. This greatly reduces the over-provisioning loss of a data center since it takes a long time for the server load to catch up with the upgraded provisioning capacity. In addition, the latency between initiation and construction (a.k.a. construction lead-time) of renewable generation is significantly shorter than that of conventional power plants, reducing the financial and regulatory risks [5]. Moreover, the price and availability of renewable resources remain stable, simplifying long-term planning for IT companies [5, 7] Problem with Conventional Solutions Conventionally, extra-scale battery farms can be used to regulate renewable power supply fluctuation. Such approach requires additional large capital investment and is not energy-efficient: the round-trip energy loss of batteries ranges between 5%~25% [8]. Furthermore, frequent charging and discharging the battery accelerates its aging and quickly wears it out [9]. This will further increase the environmental burden (i.e. recycling problem) and the downtime for maintenance. Alternatively, feedback mechanisms such as net-metering [2] directly connects on-site renewable energy source to the local utility grid to gain high power provisioning availability. Nevertheless, net-metering is still in its infancy stage and aggressively relying on it can be hazardous to the utility operation. This is because grid operators are forced to switch their power stations frequently between operation and spinning standby modes to meet the unexpected feedback power surge. In addition, to ensure stability, the maximum renewable power penetration of the

3 transmission line also has a limit [5]. In the foreseeable future, these problems can be a major hurdle for going green. Waiting for the grid to be smart will unnecessarily delay the long-term goal of sustainable IT. The motivation of this work is that the IT facility itself can be an enabler of sustainability and efficiency. Instead of adding additional power provisioning units and server clusters, our technique leverages existing data center infrastructure and load migration mechanisms to manage the time-varying renewable power. As an alternative to entirely relying on power grid or energy storage, we exploit intelligent data center self-tuning to manage the IT power provisioning. By reducing the dependence on large-scale batteries and utility power grids, our design improves data center sustainability with low cost while providing the operators with more active control of their server clusters. 3. Implications of Wind Power Variability and Intermittency While the importance of managing renewable energy intermittency is recognized by recent studies [3, 4], characterizing power variation behavior is still an unexplored area in the context of data center design. In this study we primarily focus on wind power variation since wind energy is cheaper and is also the most widely used renewable power technique for large scale facilities [5]. In this section, we first discuss wind power characteristics. We then demonstrate three critical power management problems in wind energy driven data centers and the associated optimization opportunities. 3.. Wind Power Characteristics A wind turbine generates electrical power by extracting kinetic energy from the air flow. While operating, the turbine converts wind energy to mechanical power through a rotor with multiple blades. Figure 2 shows the output characteristics of a GE wind turbine, whose power curve is divided into three regions by the designated operating wind speeds. The cut-in speed is the minimum speed at which the rotor and blade starts to rotate. The cut-off speed is the wind speed at which the turbine ceases its generation and shuts down for protecting the blade assembly. In Figure 2, we refer to the three regions as intermittent power outage period (Region-I), variable power generation period (Region-II) and stable power generation period (Region-III), respectively. In Region-I, wind power is intermittently unavailable because the wind speed is either too low or too high. In Region-II, the mechanical power 3 delivered to the turbine generator is given by p =.5ρ Av C [5], where ρ is the air density, A is the swept area of the blades, v is the wind speed and C is called power coefficient factor. In Region-III, the wind turbine operates at its designated rated power. Wind power has the highest variability in Region-II. To understand this, we show the probability distribution of wind speed in Figure 3. The variations in wind speed are typically described by the Weibull distribution [5]: k k k v ( v ) f( v) = e c, v [, ) c c () In Equation-, k is the shape parameter, c is the scale parameter and v is the wind speed. At most wind farm sites, the wind speed has the Weibull distribution with k = 2, which is specifically known as the Rayleigh distribution [5]. As shown in Figure 3, the Rayleigh distribution function in Region-II is not monotonic. In this region, wind speed has equally high possibilities at a wide range of values. As a result, the wind turbine is more likely to incur time-varying wind speed in Region-II. In addition, the wind turbine output is a steep curve due to the cubic relation between wind power and wind speed. In this case, a small change of the wind speed can lead to large wind generation fluctuation. Therefore, the renewable power variability is typically significant in Region-II Power Management Regions Figure 4 shows real traces of wind power supply and data center power demand. It illustrates the aforementioned three power generation scenarios, namely, intermittent wind power outage period (Region-I), low renewable generation with frequent fluctuation (Region-II), and full renewable generation with relatively stable output (Region-III). Similar region partition method is also applicable for intermittent renewable energy sources such as solar power and tidal power. In the following paragraphs we will discuss data center design considerations for each region. Region-I: tune wisely. During the low generation period (i.e. Region-I), it is wise to shift data center load from renewable energy supply side to utility power. To tune the load power footprint, existing practices either put servers into low power states or apply power cycling techniques [4] on the hardware. Although these approaches show impressive power control capability, they sacrifice the computing throughput. In addition, it typically takes a long time (about tens of minutes as we observed) for the renewable energy generation to resume. As a result, for mission-critical systems, putting servers into sleep state and waiting for the renewable energy to resume is not wise, especially for those parallel computing machines with inter-node workload dependency. Region-II: track wisely. Whenever the load power fluctuates or renewable energy generation varies, load matching is performed as a common practice to handle the power discrepancy [3, 4]. In Region-II, the wind generation oscillates severely. The load power tuning is largely a result of the power supply variation, as shown in Figure 4. However, aggressively matching the load to the supply results in little energy benefits but disturbs the normal server operation and degrades the performance of parallel workload. Therefore, seeking appropriate tracking timing becomes especially important.

4 Wind Power (MW) Cut-in Wind Speed Actual Output Theoretical Power Rated Wind Speed Cut-off Wind Speed I II III I Fluctuating Region 2 3 Wind Speed (m/s) Figure 2: Wind power output characteristics. The cubic relationship between power and wind speed increases the output variability Frequency Wind Speed (m/s) Figure 3: Wind speed variations in most wind farms are typically best described by Rayleigh distribution Power (kw) I II III I cut-in rated cut-off Datacenter Wind Figure 4: Power variation scenarios in wind energy driven data centers (x-axis: minutes). Region-II and Region-III are two critical regions that require special care Region-III: schedule wisely. When the renewable energy is relatively stable, frequent load fluctuation contributes to a number of load tuning operations. In Figure 4 (Phase III), although the data center power has a relatively small dynamic range, frequent variation invokes a large number of back-and-forth load migration operations. Those tuning activities have little contribution to the overall renewable energy utilization but increase network traffic significantly. A well designed job scheduling that mitigates load power fluctuation will help lower the overhead. We believe a renewable energy powered data center will frequently experience the aforementioned three power management regions throughout its lifetime. To improve overall quality of service, the best design practice should put the above scenarios together and provide a cooperative power management scheme. Figure 5: iswitch load power balancing in wind energy powered server clusters. It does not require increasing the number of servers to handle workload surge 4. An Overview of iswitch Architecture In this section we propose iswitch, a holistic data center coordination and optimization scheme that ensures high renewable energy utilization and low operational overhead. As shown in Figure 5, iswitch is designed to provide autonomic load power balancing between conventional utility grid and renewable energy generation. In Figure 5, the on-site renewable power supply provides the data center with clean energy through a separate power line. We choose not to synchronize renewable power to the grid because the fluctuating nature of renewable energy often challenges the grid stability [2]. On the other hand, although one can leverage dual-corded servers to utilize two power supplies simultaneously, it is not energy-efficient when the computing load is low [22]. In this study, iswitch explores computing load migration as an alternative of energy source integration. 4.. Switching Mechanism The basic idea behind iswitch is switching, or the operation of performing a switch. In this study, the switch is defined as a load migration event that leads to redistribution of load power between different power supplies. As an alternative to load power throttling, iswitch intelligently shifts the computing load from one energy source to another to achieve best load power matching. We use virtual machine (VM) live migration to implement iswitch since it is the most convenient way to perform load power shifting in a virtualized computing environment. Existing virtual machine power metering [23] also eases the monitoring and coordination of each individual VM. Note that in Figure 5, iswitch does not require increasing the number of servers to meet the workload surge. In emergency scenarios, we use backup energy storage to temporarily support the load. According to Fan et al. [24], server clusters can spend more than 8% of the time within 8% of their peak power, and 98% of the time within 9% of their peak power. Therefore, the chance of workload triggered emergency is small. In this study, we assume that the number of renewable energy powered servers is less than 4% of the overall deployed machines since a data center typically consumes about 6% of its actual peak power [24]. In this case, even if the wind power is extremely low, the utility grid can still take over most of the load.

5 RES Budget iswitch Scheduler Central Switch Controller... Cluster-level Switch Controller PDU-level Power Controller Load History Records Cluster-level Utilization Monitor Server Rack... Central Power Controller Figure 6: iswitch global (facility-level) control 4.2. Control Mechanism To handle the time-varying, intermittent renewable power, iswitch dynamically allocates and de-allocates (i.e., switches ) the renewable energy powered server load. The supply/load variability makes the switch tuning challenging since the control should ) globally respect the time-varying renewable budget and 2) locally avoid any power failure induced by load fluctuation. To this end, iswitch uses a hierarchical switching control scheme, which can be easily incorporated into existing hierarchical power management methods such as [25]. Facility level: Figure 6 shows a global view of iswitch control mechanism. The switching operation is controlled by a central switch controller (CSC), which communicates with a central power controller (a typical facility-level data center power controller), a switch scheduler and multiple cluster level switch controllers. CSC performs switch tuning based on the discrepancy between the load power consumption and the RES budget. Whenever needed, switching operations are scheduled by the switch scheduler, which stores profiling information for each server load and optimizes switching using load history records. Cluster level: The switching allocation is assigned to local computing nodes via cluster-level switching controllers, which are counterparts to PDU-level power controllers. The cluster-level switching controller collects switching outcomes (i.e., the number of switching operations accomplished/failed) of each local computing node and feeds the information to the CSC for switching scheduler updates. The cluster-level controller improves the manageability of dynamic switching and reduces the overhead of CSC communication traffic. Rack level: As shown in Figure 7, a rack-level switch controller executes power supply switching and sends the execution outcomes to the CSC via a cluster-level switch controller. It also interacts with the rack-level power controller throughout the switching process to avoid any brownout. For example, whenever the power consumption of a server rack reaches its local renewable power budget, the power controller will signal the rack-level switch controller to throttle the switching activities. In addition, the rack-level power controller is able to perform power capping by manipulating the voltage and frequency modulator of the server. This will prevent over-loading if power switching cannot handle the load surge in time. Figure 7: iswitch local (rack-level) control 5. Optimizing Load Tuning Activities This section proposes the supply/load cooperative optimization scheme of iswitch. Our technique features a lazy tracking scheme on the supply side and a power demand smoothing scheme on the load side. This cooperative optimization is readily supported by existing data center architectures and is orthogonal to other system-level control and workload optimizations. The switching scheduler is the key architecture for iswitch, as shown in Figure 8. It monitors the power provisioning status (i.e. powered by renewable energy or utility grid) of each server load (i.e., VMs). All the running loads within each cluster are indexed consecutively in a switch allocation buffer (SAB). A switching history table is used to store the switching frequency for each load. An optimizer computes the optimal switching assignment and a tracking module initiates the switching process. To make load tuning decisions, iswitch scheduler needs profiling information such as server utilization data from the load history table. The central power controller invokes scheduling activities in response to variations in renewable power supply and load power demand. Whenever necessary, the scheduler sends a sequence of load switching commands to the central switch controller for execution. Figure 8: Architecture of iswitch scheduler 5.. Lazy Power Supply Tracking The first idea of iswitch is to avoid tracking the severely fluctuant renewable power in Region-II (detailed in Section 3). Inside the scheduler, a tracking module manages iswitch power tracking, as shown in Figure 8. We call it lazy tracking because the module only harvests the relatively stable renewable energy generation. Note that iswitch carefully distributes the switching activities across all the loads evenly to avoid local traffic jam.

6 Number of VMs Switch Frequencies Figure 9: Histogram of switching frequency using round-robin virtual machine selection during load migration Lazy tracking: At each fine-grained interval, when switching is triggered by the CSC, an estimated switching assignment will be sent to the scheduler for calculating the switch operation balance (e.g., estimated assignment minus the baseline). If the switch balance indicates a reduced number of servers to RES connection, the scheduler signals the CSC to schedule the estimated assignment to avoid brownout. On the other hand, if the switching balance suggests an increased number of servers to RES connection (e.g., due to temporally decreased load or increased supply), the scheduler will signal CSC only if the balance is larger than a preset threshold (e.g., % of the baseline). In this case, we ignore the high-frequency switching surge which brings little benefit on renewable energy utilization but leads to excessive unnecessary load migration. LRU distribution: Within each cluster, iswitch allocates switch operations with least recently used (LRU) method, which avoids aggressively tuning a small set of computing nodes. Note that a naive switching allocation can result in unbalanced switching allocation. In Figure 9 we show the switching distribution as a result of round-robin scheduling. The average switching frequency is 2 times per day per VM. A small group of VMs receives up to 4 times per day. As a result, some racks may incur more performance penalty due to high communication traffic. To implement LRU, iswitch uses the switch frequency record stored in the switching history table. The operation of iswitch scheduler relies on the load history record of the previous control period. This record can be implemented using a round-robin database (circular buffer) with constant storage occupation over time. Such round-robin database is easy to reconfigure and manage in most Linux servers with RRDtool [26], which is an industry standard, high performance data logging system Load Power Demand Smoothing Optimizing the supply-side fluctuation alone cannot achieve significant overhead mitigation (detailed in Section 7.). To this end, iswitch leverages the heterogeneity of server clusters to minimize load fluctuation-induced overhead in power management of region III (Section 3). Figure illustrates the switch management timeline of iswitch. The controller re-shuffles the renewable energy powered servers at a coarse-grained time interval R (e.g., 5 minutes as the default value in our experiments). During each re-shuffling interval, the average load utilization is Figure : Optimization timeline of iswitch. Lazy tracking is fine-grained and demand smoothing is coarse-grained recorded in a fine-grained time interval (e.g., minute) and is used to predict the load for the next period. Upon rescheduling, the optimizer in the iswitch scheduler updates the baseline switch operations of each server cluster in SAB with the goal of mitigating the likelihood of severe load power fluctuation in the next control period. Each switch tuning invoked by CSC will be assigned based on the updated SAB. At the beginning of each control period, iswitch recalculates the optimal switch assignment. To simplify the problem, we assume the data center servers are logically divided into c clusters and the load is balanced within each cluster (i.e., almost homogeneous server utilization). Let Ui = [ ui ui2 uic] denotes the average utilization of each cluster at time stamp i. The utilization history record for the previous control period that consists of m time stamps is: u u2 uc u2 u22 u2c U = (2) um um2 umc Assuming a total number of N virtual machines is to be connected to the renewable power supply in the next control period. The migration decision for the next control period is S = [ s s2 sk sc], where s k is the number of VMs selected to be tuned for cluster k. To reduce unnecessary load tuning in the future, we want the aggregate power consumption of the selected VMs to have small oscillations in the next control period. In other words, the standard deviation of the aggregate utilization should be minimized. The aggregate utilization given by: a T ij =U S (3) m The standard deviation of the expected utilization in next control period can be calculated as: m ( ) 2 m 2 m 2 σ = ai μ = ai ( ai), m i= m i= m i= c and a = u s (4) i ik k k=

7 In Equation-4, ai is the aggregate utilization of renewable energy powered load and μ is the mean utilization during the past control period R. The re-shuffling problem is therefore formulated as: 2 2 Objective: m m min{ ai ( ai) } (5) m i= m i= Constraints: s k N (6) We solve the above non-linear minimization problem with simulated annealing (SA). Given the utilization history records, our SA solver is capable of finding the desired global extreme very fast. Note that the renewable power supply fluctuation typically occurs on a coarse-grained time interval (several minutes). As a result, the execution time (several seconds in our experiments) of SA solver does not affect the optimization effectiveness. At the beginning of the re-shuffling period, the switching operations are assigned in proportion to the number of servers in the cluster. During the computation, the SA solver iteratively generates a stochastic perturbation for the switching assignment and checks whether or not the optimum solution is reached. 6. Evaluation Methodologies We evaluate our design with trace-driven simulation. We have developed a framework that simulates dynamic load tuning and hierarchical power control in renewable energy powered data centers. For each scheduled job requests, we calculate its contribution to the overall data center power consumption based on the number of nodes requested and the job s specific resource utilization statistics. Our framework takes realistic wind energy generation traces as supply-side input. 6.. Data Center Configurations We assume a raised floor data center consisting of 4,8 servers and the servers are organized as twelve rows with each row powered by a KW PDU. There are ten 4U server racks in each row and the server we modeled resembles an HP ProLiant DL 36 G6. The peak and idle power of the modeled server are 86W and 62W respectively. We convert the server utilization traces to its power consumption using the published SPECPower results [27], which have been widely used for data center power evaluation [7, 8]. Since the SPECPower results only reflect the server power at intervals of % utilization, we use linear interpolation to approximate the power across different load levels. We evaluate data centers with both homogeneous and heterogeneous load variations. The homogeneous workload configuration assumes that all the servers are running the same workload and have similar utilization levels. As shown in Table, we generate the homogeneous utilization traces from the raw data provided by the Internet Traffic Archive [28]. We convert the request rate (requests per minute) into server utilization by investigating the maximum request rate (corresponding to % loading) and minimum request rate (corresponding to % loading). The server utilization traces we generated represent a one-week server load variation including idle period, peak hours and daily surge. Our heterogeneous data center configuration is based on an academic high-performance computing (HPC) center, which hosts more than 6 servers. The HPC center has five major clusters (C-I~C-V) with different service targets and loads, as detailed in Table 2. Those clusters have 2 to 4 computing nodes and their average utilization ranges from 25% to more than 9%. All the clusters are managed with RRDtool [26], which enables autonomic data logging and trace generation. Since we have limited access to industrial data center traces, we collect real-world workload logs from a well-established online repository [29]. The workload logs provide information such as job arrival times, start time, completion time, size in number of computing nodes, etc. We choose the cleaned version of each trace log [3]. These traces have been already scrubbed to remove workload flurries and other anomalous data that could skew the performance evaluation [3]. Table 3 summarizes our evaluated workload trace combinations. We build various workload mixes to mimic today s data centers that are composed of many small co-located clusters. Each workload set in Table 3 consists of five traces [29] which run on the aforementioned five clusters (C-I to C-V) respectively. To form representative workload sets, we characterize workload traces based on their average job size and runtime. In Table 3, Mix-High includes traces that have larger job size (resulting in >3% average data center utilization) and Mix-Low contains traces that have small job size (resulting in <% utilization). On the other hand, Mix-Stable consists of five traces that feature relatively longer job runtime and Mix-Bursty consists of traces that have very short job runtime. Trace Description Avg. Loading Load Level Cluster ID % of Overall Deployed Servers Avg. Loading Calgary University Web Server 2.8% Very low C-I 5% 97% U of S University Web Server 7.5% Low C-II 63% 6% NASA Kennedy Space Center Server 27.8% Moderate HPC C-III 7% 57% Clark Clark WWW Server 33.4% High C-IV 3% 54% UCB UC Berkeley IP Server 43.2% Busy C-V 2% 25% Table : Traces of homogeneous server utilization [28] Table 2: Configuration of heterogeneous clusters

8 Workload Set Description Workload Trace Combination Mix-High High utilization HPC2N + LANL CM5 + LPC EGEE + SDSC BLUE + LLNL Thunder Mix-Low Low utilization DAS- fs + DAS2-fs + DAS2-fs2 + DAS2-fs3 + DAS2-fs4 Mix-Stable Stable power demand HPC2N + KTH SP2 + LANL CM5 + DAS2-fs + SDSC BLUE Mix-Bursty Bursty power demand DAS2-fs2 + DAS2-fs3 + DAS2-fs4 + LPC EGEE + OSC Cluster Mix-Rand Random combination LLNL Thunder + OSC Cluster + LPC EGEE + LANL CM5 + KTH SP2 Dept-HPC Traces collected from departmental high-performance computing center Table 3: The evaluated heterogeneous datacenter workload sets. Each workload set consists of five parallel workload traces [29] which are fed to clusters C-I~C-V shown in Table 2 Trace Abbr. Wind Energy Potential Locations (Station ID) Capacity Factor (CF) Power Density I Low California (925) 5% 95 W/m 2 Medium Arizona (67) 25% 338 W/m 2 Typical Colorado (563) 35% 58 W/m 2 High Texas (36) 45% 78 W/m 2 II LVS Low variation trace Wyoming (5895) 5% 2 W/m 2 HVS High variation trace Utah (967) 2% 67 W/m 2 Table 4: The evaluated wind power supply traces [3]. Group-I highlights different wind power generation potentials. Group-II focuses on variation intensity of the supply and is used for characterization purposes Abbr. Design Philosophy Tracking Stored Energy Load Deferment Utility Fully relies on utility grid No No No Battery Relies on battery to provide reliable renewable power No Yes No Green Focuses on sustainability, % renewable energy powered Yes No Yes Tracking Maximizes energy utilization with aggressive supply tracking Yes Yes No iswitch Aims at achieving high sustainability, low overhead and latency Yes Yes No Table 5: The evaluated power management schemes 6.2. Renewable Energy Supply Traces We use wind power data traces from The Wind Integration Datasets [3] of the National Renewable Energy Laboratory. These wind generation datasets are time-series data of wind power output derived from commercially prevalent wind turbines characteristics and the manufacturer s rated wind power curve. We carefully selected two groups of traces across different geographic locations and their characteristics are listed in Table 4. In Table 4, capacity factor (CF) is the ratio of the actual wind turbine output to the theoretical peak generations. Since we are interested in the supply variation and intermittency, we have selected a group of traces with various CF values (i.e., Group-I). While a typical capacity factor of wind turbine is 3% [5], higher capacity factor usually represents better wind energy potential, small power variation and less generation stoppage. The total installed wind turbine capacity in this study equals to the nameplate power of the studied data center. The actual power budget is therefore only affected by the capacity factor. Note that we also evaluate power supply traces of two extreme scenarios (i.e., Group-II): one has very smooth and stable generation and another has high output fluctuation rate. All the other renewable supply traces can be seen as a combination of the two basic traces plus the intermittently unavailable periods. 7. Experimental Results This section quantifies the performance and efficiency of iswitch on a wide range of workload configurations. We first characterize the impact of supply/load power variability on data center load matching using homogeneous workload traces. We then compare the performance overhead and energy utilization of iswitch to some state-of-the-art approaches. In Table 5, Utility and Battery are two conventional schemes which do not involve supply-driven load matching. Green is the most sustainable design. Tracking represents emerging design approaches [3, 4] which leverage load adaptation to actively track every joule of renewable energy generation. 7.. Impact of Power Variability To understand how power variation affects load matching behavior, we monitored the load migration frequency of each VM across different server utilization levels and renewable power supply scenarios. Figure shows the average switching frequency for all VMs for one week s duration. Figure 2 characterizes the standard deviation of the migration activities of a single VM. In Figures and 2, Idle, Typical and Busy are data center traces that have average utilization levels of %, 3% and 6%, respectively. We generate these three synthetic data center traces to mimic low load variation data centers.

9 Frequency High Variation Supply (HVS) Low Variation Supply (LVS) UCB Clark NASA UofS Calgary Idle Typical Busy High Variation Load (HVL) Low Variation Load (LVL) Figure : Average load switching frequency iswitch Tracking Mix-High Mix-Low Mix-Stable Mix-Bursty Mix-Rand Dept-HPC - Figure 3: The average network traffic (all the results are normalized to Tracking) High variation load (HVL): When the supply variation is high (i.e. HVS), heavy switch tuning across a large group of data center servers is common. Therefore, the average switch frequency is high and the standard deviation is not very large. For low supply variation (i.e. LVS), however, the standard deviation increases by 66% since the switching triggered by load oscillations typically stresses a relatively small group of server loads. Low variation load (LVL): Since the load variation is less severe, the total switching activities are reduced in both cases (i.e. HVS or LVS) and the average switching frequency is small. For example, Typical has similar loading compared to NASA. However, the former reduces average switching frequency by 3% when the supply variation is high (i.e. HVS) and by 9% when the supply variation is low (i.e. LVS). In Figures and 2, a combination of LVL and LVS manifests the lowest control effort since the mean value and standard deviation of per VM switching are both small. To summarize, choosing a subset of server load that has lower total power variation can significantly reduce the load switching demand, especially when the supply variation is low. When the supply variation is high, simply dampening the load power variation has limited impact and in this case switch capping can be used to avoid unnecessary tuning Operational Overhead Frequent load matching activities result in operational overhead, which is our primary concern in the design of renewable energy powered computing systems. In a virtualized environment, iswitch could effectively reduce the VM migration rate and help to save data center network bandwidth significantly. The data migration traffic is calculated at rack-level. Each VM live migration transfers approximately the size of the VM s memory between hosts [32]. We assume a VM memory size of.7gb in our calculation, which is the default memory size of Amazon EC2 standard instance. Avg. Frequency High Variation Supply (HVS) Low Variation Supply (LVS) UCB Clark NASA UofS Calgary Idle Typical Busy High Variation Load (HVL) Low Variation Load (LVL) Figure 2: Standard deviation of the switching operation iswitch Tracking Mix-High Mix-Low Mix-Stable Mix-Bursty Mix-Rand Dept-HPC - Figure 4: The peak network traffic (all the results are normalized to Tracking) Figure 3 shows the average communication traffic across various workload configurations and wind power supply levels. All the results are normalized to Tracking. We do not show the results of Green because it has the same power tracking frequency as Tracking. As can be seen, on average, iswitch could reduce 75% of the rack-level traffic and therefore significantly releases the network bandwidth burden. The results are even more impressive for peak traffic hours when the renewable energy fluctuates severely. In Figure 4 we calculate the communication traffic during the top % high-traffic hours. Because iswitch puts a limit on the power tracking activities during fluctuant supply period, it shows only 5% network traffic compared with Tracking Latency per Hour Another advantage of iswitch is that it reduces the migration frequency of each VM instance and thereby improves the job turnaround time. Due to the intermittency of renewable generation, stand-alone systems such as Green experience long waiting time (about tens of minutes as observed in our experiments). These systems typically leverage deferrable load [33] to meet the power budget or simply perform load shedding [34] to avoid brownout. Even for utility-connected systems such as iswitch and Tracking, latency exists due to the data migration time. For example, the time needed for a GB VM migration takes about 2 seconds to complete [35]. In this study, we use latency per hour (LPH) to evaluate the performance overhead. For example, a LPH of means each individual VM instance experiences seconds waiting time per hour on average. Figure 5 shows the average LPH across the entire data center servers. The average LPH of iswitch is about 3 seconds per hour while the average LPH of Tracking reaches 26 seconds per hour. The average LPH of Green, however, is about 5 seconds per hour 5 times that of iswitch. Therefore, waiting for renewable energy to resume (e.g. Green) should be the last resort. Avg.

10 Latency per Hour iswitch Tracking Green Energy Utilization Mix-High Mix-Low Mix-Stable Mix-Bursty Mix-Rand Dept-HPC Avg. Figure 5: The average latency of job requests Battery Tracking iswitch Green Mix-High Mix-Low Mix-Stable Mix-Bursty Mix-Rand Dept-HPC Avg. Figure 6: The normalized overall wind energy utilization 7.4. Renewable Energy Utilization We evaluate the renewable energy utilization (REU) of data centers with different wind power provisioning capacities and workload behaviors. The REU is defined as the amount of wind energy that is actually utilized by the load divided by the total amount of wind energy generation. A higher REU indicates better supply/demand coordination, which reduces on-site energy storage capacity, improves return-on-investment (ROI) and data center sustainability, and eases the initial infrastructure planning. While iswitch uses a lazy power tracking scheme, it does not sacrifice energy utilization significantly. As shown in Figure 6, iswitch can achieve an average renewable energy utilization of 94% higher than Green (92%) but lower than Tracking (98%). The reason why Tracking outperforms iswitch on energy utilization is that Tracking tracks every joule of wind energy generation aggressively. Note that a 4% decrease in energy utilization does not mean that iswitch is less preferable in our study; iswitch significantly reduces network traffic and improves the performance by 4X. In contrast to Tracking, iswitch trades off energy utilization for better job turnaround time Optimization Effectiveness Improperly setting the iswitch re-shuffling intervals leads to degraded optimization effectiveness. To understand the tradeoff, we characterize the design space by varying the load re-shuffling intervals. In the following discussion, S-x means iswitch with a re-shuffling interval of x minutes. We analyze the average network traffic under various re-shuffling intervals. In Figure 7-a, all the results are normalized to S-5. It shows that increasing the re-shuffling interval could mitigate the overhead and reduce network traffic. For example, S-2 manifests 35% traffic reduction compared with S-5. However, an extended re-shuffling interval could also degrade iswitch energy utilization due to the decreased adaptivity. We evaluate the impact of long re-shuffling periods on direct renewable energy utilization (DREU), as shown in Figure 7-b. Here, direct renewable energy utilization means the renewable energy directly utilized by the system without passing through batteries. Compared to S-5, S-2 yields about 24% DREU degradation (which means increased battery capacity is required to store the remaining generation).. (a) Mix-High.9 Mix-Low.8 Mix-Stable.7 Mix-Bursty.6 Mix-Rand.5 Dep-HPC S-5 S-3 S-6 S (b) Mix-High.8 Mix-Low.75 Mix-Stable Mix-Bursty.7 Mix-Rand.65 Dep-HPC.6 S-5 S-3 S-6 S-2 Figure 7: iswitch with different control intervals. The results show all six workload trace sets. The wind energy trace used is which has typical wind energy potential Normalized Traffic Energy Utilization To understand DREU degradation, we show a fraction of data center power consumption trace controlled by S-2, which uses a two-hour load history record as prediction input. In Figure 8, iswitch does not react to gradual renewable power supply increase. We recommend a control period of 5~3 minutes. In this case, iswitch could reduce 75% average load tuning while still maintaining more than 8% direct wind energy utilization (94% overall wind energy utilization if combined with battery).

11 Power (KW) Load Power (Tracking) Load Power (iswitch) Wind Power Budget Time (min) Figure 8: The impact of long re-shuffling interval of iswitch In sum, the length of iswitch re-shuffling interval affects iswitch load matching in two ways. First, short re-shuffling intervals limit the load fluctuation mitigation because a short load history record gives less accurate prediction. Second, a shorter time interval means more frequent re-shuffling and therefore magnified overall control overhead. This control overhead arises because the SA solver in iswitch may request additional switch tuning to mitigate the total variation. For example, instead of directly switching virtual machines from utility to RES, the scheduler may first disconnect 2 high-variation wind-powered VM instances and then connect 2 low-variation VM instances to the wind turbine Total Cost of Ownership Because iswitch is capable of utilizing the renewable power directly, we save a large amount of battery capacities. Otherwise, we have to use a large-scale battery to store the unused excess generation, which is not economic and environment-friendly. In Figure 9 we show the projected annual operating cost of designing a wind energy-powered small-scale data center (i.e. 89KW server deployment). The average retail price of electricity in industrial sector is about $.7/KW. The estimated energy storage cost is $3/KW for lead-acid batteries [36]. A lower direct renewable energy utilization rate leads to increased battery capacity demand. In Figure 9, the operating cost of S-2 in the first year is 35% of a utility powered data center (Utility). After three years, the average annual operating cost is only 2% of Utility. The implementation cost is amortized by the renewable energy in the following deployment duration. 8. Related Work While power-efficient computer architecture is a well studied area in both industry and academia, designing renewable energy powered computing systems gained its popularity only recently [3, 4, 37, 38, 39]. In this section we highlight a number of representative works that strive to improve IT efficiency and sustainability in different aspects. Data center power management: Substantial research has been done on optimizing power efficiency of data centers. At the server level, the most widely used tuning knobs are dynamic voltage and frequency scaling (DVFS), power capping and power state switching [7, 4]. At data center level, various mechanisms can be found in recent Annual Cost ($) 3, 25, 2, 5,, 5, Utility Power Cost Stored Enegy Cost S-5 S-3 S-6 S-2 S-5 S-3 S-6 S-2 S-5 S-3 S-6 S-2 Tracking Battery Utility iswitch (st Year) iswitch (2nd Year) iswitch ( 3rd Year) Conventional Figure 9: Annual operating cost of various design scenarios proposals. For example, some major techniques include virtual machine resource monitoring and capping [23], job scheduling with queueing theory [], dynamic data channel width adaptation [4], and intelligent server consolidation for co-optimization of cooling and idle power [9, ]. Although all those works can be leveraged to track the time-varying renewable power supply, none of them is aware of the supply variation attributes. As a result, they incur significant power management overhead but gain very limited energy efficiency return. Emerging green computing techniques: Power intermittency is the most significant challenge in a renewable energy powered computing system. To avoid brownout, recent studies propose load deferment [33] and load shedding [34] to match the demand to the supply. Both approaches are not suitable for data centers which have strict performance requirement specified in the service level agreement (SLA). Although instantaneous performance cannot be guaranteed, one can still use load power adaptation to improve the overall renewable energy utilization and optimize the workload performance with additional power budget [3, 4]. Additionally, one can further leverage various renewable energy integration points [39], energy storage elements [42], and distributed UPS system [43] to improve renewable energy utilization. In contrast to existing work, this paper explores the benefits of putting utility power, energy storage, and load migration coordination together. The key novelty of our design is that we propose a supply/load cooperative optimization that significantly reduces existing renewable power management overhead while still maintaining desirable renewable energy utilization. 9. Conclusions Environmental and energy price concerns have become key drivers in the market for sustainable computing. The advance of renewable energy technologies and continuously decreasing renewable power cost have made renewable energy driven data centers a proven alternative to conventional utility-dependent data centers and the market is rapidly growing. Matching the variable load power consumption to the intermittent power supply appropriately is the crux of designing a renewable energy powered data center. Conventional workload-driven power management has less adaptivity to the power supply variation while existing

CHAPTER 1 INTRODUCTION 1.1 Background The command over cloud computing infrastructure is increasing with the growing demands of IT infrastructure during the changed business scenario of the 21 st Century.

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

Tulevaisuuden datakeskukset Ympäristö- ja energianäkökulmia A single-minded approach to data center operations can be detrimental to the life and profitability of the data center. - Dr. Jonathan Koomey

EXECUTIVE STRATEGY BRIEF Operating highly-efficient datacenters is imperative as more consumers and companies move to a cloud computing environment. With high energy costs and pressure to reduce carbon

CHAPTER 7 Contributions to Gang Scheduling In this Chapter, we present two techniques to improve Gang Scheduling policies by adopting the ideas of this Thesis. The first one, Performance- Driven Gang Scheduling,

May 2013 Navigating the Pros and Cons of Structured Cabling vs. Top of Rack in the Data Center Executive Summary There is no single end-all cabling configuration for every data center, and CIOs, data center

The Benefits of POWER7+ and PowerVM over Intel and an x86 Hypervisor Howard Anglin rhbear@us.ibm.com IBM Competitive Project Office May 2013 Abstract...3 Virtualization and Why It Is Important...3 Resiliency

Research Challenges Overview May 3, 2010 Table of Contents I 1 What Is It? Related Technologies Grid Computing Virtualization Utility Computing Autonomic Computing Is It New? Definition 2 Business Business

Power efficiency and power management in HP ProLiant servers Technology brief Introduction... 2 Built-in power efficiencies in ProLiant servers... 2 Optimizing internal cooling and fan power with Sea of

Overview of Green Energy Strategies and Techniques for Modern Data Centers Introduction Data centers ensure the operation of critical business IT equipment including servers, networking and storage devices.

The Quest for Energy Efficiency A White Paper from the experts in Business-Critical Continuity Abstract One of the most widely discussed issues throughout the world today is the rapidly increasing price

Power Management in Cloud Computing using Green Algorithm -Kushal Mehta COP 6087 University of Central Florida Motivation Global warming is the greatest environmental challenge today which is caused by

Solve your IT energy crisis WITH AN ENERGY SMART SOLUTION FROM DELL overcome DATA center energy challenges IT managers share a common and pressing problem: how to reduce energy consumption and cost without

Integrated Application and Data Protection NEC ExpressCluster White Paper Introduction Critical business processes and operations depend on real-time access to IT systems that consist of applications and

Directions for VMware Ready Testing for Application Software Introduction To be awarded the VMware ready logo for your product requires a modest amount of engineering work, assuming that the pre-requisites

Running a Workflow on a PowerCenter Grid 2010-2014 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise)

How Microsoft Designs its Cloud-Scale Servers How Microsoft Designs its Cloud-Scale Servers Page 1 How Microsoft Designs its Cloud-Scale Servers How is cloud infrastructure server hardware design different

OpenFlow Based Load Balancing Hardeep Uppal and Dane Brandon University of Washington CSE561: Networking Project Report Abstract: In today s high-traffic internet, it is often desirable to have multiple

Hadoop and other distributed systems are increasingly the solution of choice for next generation data volumes. A high capacity, any to any, easily manageable networking layer is critical for peak Hadoop

Juniper Networks QFabric: Scaling for the Modern Data Center Executive Summary The modern data center has undergone a series of changes that have significantly impacted business operations. Applications

Date: March 10, 2011 System and Storage Virtualization For ios (AS/400) Environment How to take advantage of today s cost-saving technologies for legacy applications Copyright 2010 INFINITE Corporation.

Virtualization Management Traditional IT architectures are generally based in silos, with dedicated computing resources to specific applications and excess or resourcing to accommodate peak demand of the

Performance Testing Definition: Performance Testing Performance testing is the process of determining the speed or effectiveness of a computer, network, software program or device. This process can involve