Transcription

1 DYNAMIC RESOURCE MANAGEMENT IN INTERNET HOSTING PLATFORMS A Dissertation Presented by BHUVAN URGAONKAR Submitted to the Graduate School of the University of Massachusetts Amherst in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY September 25 Computer Science

4 ABSTRACT DYNAMIC RESOURCE MANAGEMENT IN INTERNET HOSTING PLATFORMS SEPTEMBER 25 BHUVAN URGAONKAR B.Tech., INDIAN INSTITUTE OF TECHNOLOGY, KHARAGPUR, INDIA M.S., UNIVERSITY OF MASSACHUSETTS AMHERST Ph.D., UNIVERSITY OF MASSACHUSETTS AMHERST Directed by: Professor Prashant J. Shenoy Internet applications such as on-line news, retail, and financial sites have become commonplace in recent years. Due to the prevalence of these applications, platforms that host them have become an important and attractive business. These platforms, called hosting platforms, typically employ large clusters of servers to host multiple applications. Hosting platforms provide performance guarantees to the hosted applications, such as guarantees on response time or throughput, in return for revenue. Two key features of Internet applications make the design of hosting platforms challenging. First, modern Internet applications are extremely complex. Existing resource management solutions rely on simple abstractions of these applications and are therefore fail to accurately capture this complexity. Second, these applications exhibit highly dynamic workloads with multi-time-scale variations. Managing the resources in a hosting platform to realize the often opposing goals of meeting application performance targets and achieving high resource utilization is therefore a difficult endeavor. In this thesis, we present resource management mechanisms that an Internet hosting platform can employ to address these challenges. Our solution consists of resource management mechanisms operating at multiple time-scales. We develop a predictive dynamic capacity provisioning technique for Internet applications that operates at the time-scale of hours or days. A key ingredient of this technique is a model of an Internet application that is used for deriving the resource requirements of the application. We employ both queuing theory and empirical measurements to devise models of Internet applications. The second mechanism is a reactive provisioning technique that operates at the time-scale of a few minutes and utilizes virtual machine monitors for agile switching of servers in the hosting platform among applications. Finally, we develop a policing technique that operates at a per-request level. This technique allows a hosted application to remain operational even under extreme overloads where the arrival rates are an order of magnitude higher than the provisioned capacity. Our experiments on a prototype hosting platform consisting of forty Linux machines demonstrate the utility and feasibility of our techniques. vii

9 LIST OF TABLES Table Page 1.1 Summary of contributions Notation used in describing the MVA algorithm Performance of VM-based switching; n/a stands for not applicable A sample service-level agreement Summary of profiles. Although we profiled both CPU and network usage for each application, we only present results for the more constraining resource. Abbreviations: WS=Apache, SMS=streaming media server, GS=Quake game server, DBS=database server, k=number of clients, dyn.=dynamic, Res.=Resource Effectiveness of kernel resource allocation mechanisms. All results are shown with 95% confidence intervals Capsule Placement and Reservations Capsule Placement and Reservations Failure Handling Times (with 95% Confidence Intervals) xii

10 LIST OF FIGURES Figure Page 1.1 Hosting platform architecture A three-tier application Request processing in an online auction application Modeling a multi-tier application using a network of queues Response time of Rubis with 95% confidence intervals. A concurrency limit of 15 for Apache and 75 for the middle Java tier is used. Figure (a) depicts the deviation of the baseline model from observed behavior when concurrency limit is reached. Figure (b) depicts the ability of the enhanced model to capture this effect Multi-tier application model enhanced to handle concurrency limits. Since each tier has only one replica, we use only one subscript in our notation Rubis based on Java servlets: bottleneck at CPU of middle tier. The concurrency limits for the Apache Web server and the Java servlets container were set to be 15 and 75, respectively Rubis based on Java servlets: bottleneck at CPU of database tier. The concurrency limits for the Apache Web server and the Java servlets container were set to be 15 and 75, respectively Rubis based on EJB: bottleneck at CPU of middle tier. The concurrency limits for the Apache Web server and the Java servlets container were set to be 15 and 75, respectively Rubbos based on Java servlets: bottleneck at CPU of middle tier. The concurrency limits for the Apache Web server and the Java servlets container were set to be 15 and 75, respectively Caching at the database tier of Rubbos Load imbalance at the middle tier of Rubis. (a) and (b) present number of requests and response times classified on a per-replica basis; (c) presents response times classified according to most loaded, second most loaded, and most loaded replicas and overall average response times Rubis serving sessions of two classes. Sessions of class 1 were generated using workload W 1 while those of class 2 were generated using workload W Model-based dynamic provisioning of servers for Rubis xiii

11 2.14 Maximizing revenue via differentiated session policing in Rubis. The application serves two classes of sessions The workload prediction algorithm Virtual Machine Based Hosting Platform Architecture Rubbos: Independent per-tier provisioning Rubbos: Provision only the Tomcat tier Rubbos: Model-based multi-tier provisioning Rubis: Blackbox provisioning Rubis: Model-based multi-tier provisioning Provisioning on day 6 typical day Provisioning on day 7 moderate overload Provisioning on day 8 extreme overload The Hosting Platform Architecture Working of the sentry. First, the class a request belongs to is determined. If the request conforms to the leaky bucket for its class, it is admitted to the application without any further processing. Otherwise, it is put into its class-specific queue. The admission control processes the requests in various queues at frequencies given by the class-specific delays. A request is admitted to the application if there is enough capacity, else it is dropped Demonstration of the working of the admission control during an overload Scalability of the admission control Performance of the threshold-based admission control. At t = 135 seconds, the threshold was set to reject all Bronze requests; at t = 18 seconds, it was updated to reject all Bronze and Silver requests; at t = 21 seconds it was updated to also reject Gold requests with a probability.5; finally, at t = 39 seconds, it was again set to reject only Bronze requests Dynamic provisioning of sentries. [S=n] means the number of sentries is n now Dynamic provisioning and admission control: Performance of Applications 1 and 2. D: Default invocation of provisioning, T: Provisioning triggered by excessive drops, [N=n]: size of the server set is n now. Only selected provisioning events are shown Architecture of a shared hosting platform. Each application runs on one or more nodes and shares resources with other applications An example of an On-Off trace Derivation of the usage distribution and token bucket parameters xiv

12 5.4 Profile of the Apache Web server using the default SPECWeb99 configuration Profiles of Various Server Applications Demonstration of how an application overload may be detected by comparing the latest resource usage profile with the original offline profile Benefits of resource under-provisioning for a bursty Web server application, a less bursty streaming server application and for application mixes Effect of different levels of provisioning on the PostgreSQL server CPU profile An example of the gap-preserving reduction from the Multi-dimensional Knapsack problem to the general offline placement problem An example of striping-based placement A bipartite graph indicating which capsules can be placed on which nodes An example of reducing the minimum-weight maximum matching problem to the minimum-weight perfect matching problem Sharc architecture and abstractions. Figure (a) shows the overall Sharc architecture. Figure (b) shows a sample cluster-wide virtual hierarchy, a physical hierarchy on a node and the relationship between the two Various scenarios that occur while trading resources among capsules Predictable CPU allocation and trading. Figures (a) and (b) show the CPU allocation for the database server and the Web server capsules, Figure (c) shows the progress of the two bursts processed by these database severs Predictable network allocation and trading. Figure (a), (b) and (c) depict network allocations of capsules of the File download application Predictable allocation and resource trading. Figure (a), (b) and (c) depict CPU usages and allocations of capsules residing on node Application Isolation in Sharc. The allocations of all capsules on the three nodes are shown (due to space constraints, CPU usages of these capsules have been omitted) Impact of resource trading. Figure (a) shows the number of playback discontinuities seen by the three clients of the overloaded video server with and without the trading of network bandwidth. Figures (b) and (c) show a portion of the reception and playback of the second stream for the two cases Overheads imposed by the nucleus Overheads imposed by the control plane Impact of tunable parameters on capsule allocations xv

13 CHAPTER 1 INTRODUCTION AND MOTIVATION An Internet application is an application delivered to users from a server over the Internet. A popular class of Internet applications consists of Web applications such as Web-mail, online retail sales, online auctions, wikis, discussion boards, Web-logs etc. Web applications are popular due to the ubiquity of the Web browser as a client, sometimes called a thin client. The ability to update and maintain Web applications without distributing and installing software on potentially thousands of client computers is a key reason for their popularity. Not all Internet applications are Web based, for example some streaming media servers [16] or game servers [46]. During the past decade we have increasingly come to rely on these applications to conduct both our personal and business affairs. We use the terms Internet application and Internet service interchangeably in this thesis 1. A data center is a facility used for housing a large amount of electronic equipment, typically computers and communications equipment. As the name implies, a data center is usually maintained by an organization for the purpose of handling the data necessary for its operations. A bank for example may have a data center, where all its customers account information is maintained and transactions involving this data are carried out. Practically every company mid-sized and upwards has some kind of data center, and large companies often have dozens of data centers. Most large cities have many purpose-built data center buildings to provide data center space in secure locations close to telecommunications services. Due to the prevalence of Internet applications, data centers that host them have become an important and attractive business. We refer to such data centers as hosting platforms. To make an application available to the Internet community, it needs to be hosted on one or more servers. For example, a Web site needs to be hosted on a Web server which is a powerful computer that can accommodate thousands of requests for the Web site pages. A Web server has to be connected to the Internet 24 hours a day so that users can access it anytime. The high complexity and cost of maintaining a hosting platform infrastructure has resulted in a growing trend among businesses and institutions to have their applications hosted on platforms managed by another party. A Web hosting provider is an example of such a hosting platform that sells space on its servers to Website owners. They provide a full-time, high-bandwidth connection to the Internet, so that visitors can access the sites easily. An example is Yahoo s Small Business Web hosting service [126]. We list below some examples of the complexity and cost involved in maintaining a hosting platform: 1. Servers and software (Web server, mail server, firewall, virus protection etc.) can be expensive. 2. The server needs a 24/7 high speed connection to the Internet, which is relatively costly. 3. Setting up all the configurations including mail server, FTP server, and DNS server can be complicated. 4. Server maintenance requires twenty-four hour support, special skills, and knowledge. Hosting platforms enable entrepreneurs and emerging organizations to focus on their business rather than technology. Hosting platforms are typically expected to provide performance guarantees to the hosted applications (such as guarantees on response time or throughput) in return for revenue [95]; these contracts are expressed using service-level agreements. Two key features of Internet applications make the design of hosting platforms challenging. First, modern Internet applications are extremely complex. Existing resource management solutions rely on simple abstractions of these applications and are therefore fail to accurately capture this complexity. Second, these applications exhibit highly dynamic workloads with multi-time-scale 1 Notice that our focus is exclusively on applications based on the client-server model. We do not consider the recently popular peer-to-peer applications [48, 81] in this work. 1

14 variations. Managing the resources in a hosting platform to realize the often opposing goals of meeting service-level agreements and achieving high resource utilization is therefore a difficult endeavor. In this thesis, we present resource management mechanisms that an Internet hosting platform can employ to address these challenges. The rest of this chapter is organized as follows. Section 1.1 describes two fundamentally different models of hosting employed by hosting platforms. Section 1.2 discusses the key challenges in the design of a hosting platform and Section 1.3 argues about the inadequacies of existing work in this area. Section 1.4 summarizes the main contributions of this thesis. In section 1.5 we present a high-level overview of our hosting platform design and introduce terminology used throughout this thesis. Finally, Section 1.6 describes the organization of the rest of this thesis. 1.1 Models of Hosting Due to rapid advances in computing and networking technologies and falling hardware prices, server clusters built using commodity hardware have become an attractive alternative to the traditional large multiprocessor servers for constructing hosting platforms. Depending on the resource requirements of the applications and the strictness of the performance or resource guarantees they require, a platform may employ a dedicated or a shared model for hosting them. We elaborate on these two models of hosting applications next. Henceforth, we use the terms server and node interchangeably Dedicated Hosting In dedicated hosting each application runs on a subset of the servers and a server is allocated to at most one application component at any given time. Dedicated hosting is used for running large clustered applications where server sharing is infeasible due to the workload demand imposed on each individual application. In dedicated hosting either an entire cluster runs a single application (such as a Web search engine), or each individual processing element in the cluster is dedicated to a single application (as in the managed hosting services provided by some data centers [74]) Shared Hosting Shared hosting platforms run a large number of different third-party applications (Web servers, streaming media servers, multi-player game servers, e-commerce applications, etc.), and the number of applications typically exceeds the number of nodes in the cluster. More specifically, each application runs on a subset of the nodes and these subsets may overlap. Whereas dedicated hosting platforms are used for many niche applications that warrant their additional cost, economic reasons of space, power, cooling, and cost make shared hosting platforms an attractive choice for many application hosting environments. For example, now-a-days Web hosting is very cheap (usually starting from under $5/month). There are free Web hosting companies also that recover their costs by showing advertisements on the hosted Websites. 1.2 Internet Hosting Platform Design Challenges and Requirements The objective of a hosting platform is to maximize the revenue generated from the hosted applications while satisfying the service-level agreements. Designing a hosting platform is made challenging by the following characteristics of Internet applications and their workloads. Application and Platform Idiosyncrasies 1. Complex multi-tier software architecture: Modern Internet applications are complex, distributed software systems designed using multiple tiers. A multi-tier architecture provides a flexible, modular approach for designing such applications. Each application tier provides certain functionality to its preceding tier and uses the functionality provided by its successor to carry out its part of the overall request processing. The various tiers participate in the processing of each incoming request during its lifetime in the system. Additionally, these applications may employ replication and caching at one or 2

15 more tiers. These characteristics of Internet applications make inferring requirements and provisioning capacity non-trivial tasks. 2. Dynamic content: An increasing fraction of the content delivered by Internet applications is generated dynamically [11]. Generation of dynamic content is significantly more resource intensive than generation of static content which accounted for the bulk of the Internet traffic a few years ago. 3. Diverse software components: Internet applications are built using diverse software components. For example, a typical e-commerce application consists of three tiers a front-end Web tier that is responsible for HTTP processing, a middle tier Java enterprise server that implements core application functionality, and a backend database that stores product catalogs and user orders. These application have vastly different performance characteristics. 4. Heterogeneous hardware: In most hosting platforms, hardware resources get added or removed incrementally resulting in heterogeneity in the hardware. Internet Workload Characteristics 1. Multi-time-scale workload variations: Internet applications see dynamically changing workloads that contain long-term variations such as time-of-day effects [53] as well as short-term fluctuations such as transient overloads [1]. Predicting the peak workload of an Internet application and capacity provisioning based on this estimate are known to be notoriously difficult. 2. Extreme overloads: There are numerous documented examples of Internet applications that faced outages due to unexpected overloads. For instance, the normally well-provisioned Amazon.com site suffered a forty-minute down-time due to an overload during the popular holiday season in November The load seen by on-line brokerage Web sites during the unexpected 1999 stock market crash was several times greater than the normal peak load, resulting in degraded performance and possible financial losses to users. 3. Session-based workloads: Modern Internet workloads are often session-based, where each session comprises a sequence of requests with intervening think-times. For instance, a session at an online retailer comprises the sequence of user requests to browse the product catalog and to make a purchase. Sessions are stateful from the perspective of the application. 4. Multiple session classes: Internet applications typically classify incoming sessions into multiple classes. To illustrate, an online brokerage Web site may define three classes and may map financial transactions to the Gold class, customer requests such as balance inquiries to the Silver class, and casual browsing requests from non-customers to the Bronze class. Typically such classification helps the application to preferentially admit requests from more important classes during overloads and drop requests from less important classes. To meet its goal of maximizing revenue given the above challenges, a hosting platform needs to carefully multiplex its resources among the hosted applications. For this, a hosting platform requires the following mechanisms. 1. Requirement inference: A hosting platform should be able to accurately infer the resource requirements of applications. While underestimating the resource requirements of an application can cause violations of its performance guarantees (e.g., degraded response times), overestimation of requirements will result in wasted platform resources. Requirement inference may be based on analytical models of applications or on empirical observations. 2. Application placement: Application placement refers to the problem of determining where on the cluster the various components of a newly arrived application should run. It is desirable for a hosting platform to employ a placement algorithm that allows it to maximize the revenue generated by the hosted applications. 3

16 3. Workload prediction: Being able to predict the workloads of the hosted applications is desirable for determining their changing resource demands. This allows the hosting platform to decide which applications to divert its resources to during a given time period. 4. Dynamic capacity provisioning: A hosting platform should employ mechanisms to be able to dynamically change the allocation of resources to the hosted applications to match their dynamic workloads. In a dedicated hosting platform, this would mean changing the number of servers assigned to an application; in a shared hosting platform, dynamic capacity provisioning might imply changing the CPU shares (and possibly shares of other resources) of applications on some nodes. 5. Policing: To protect the applications from unanticipated overloads, a hosting platform should employ request policing mechanisms. A policer allows an application to discard excessive requests so that the admitted requests continue to experience desired performance even during overloads. Further, it is desirable for a hosting platform to preferentially admit more important requests during overloads this is in accordance with the goal of maximizing the platform s revenue. 6. Appropriate resource sharing OS mechanisms: A shared hosting platform needs support from the operating systems on the constituent nodes to effectively partition resources such as CPU, network bandwidth, memory etc. among the hosted application components. Additionally, a hosting platform should be robust. We elaborate on what we mean by this below. 1. Scalability: The hosted applications should be able to operate even when the request arrival rate is much higher than the anticipated workload. 2. Failure handling: The hosting platform should employ mechanisms to handle various kinds of software and hardware failures that may occur. 1.3 The Case for a Novel Resource Management Approach: Inadequacies of Existing Work During the past decade, several researchers have contributed to different facets of the resource management problem in hosting platforms. In this section (i) we describe the problems that have been solved (and that our thesis builds on) and (ii) we argue that there are several problems that this body of work has either not addressed at all or not solved to satisfaction. Predictable resource allocation within a single machine is a well-researched topic. Several techniques for predictable allocation of resources within a single machine have been developed over the past decade. New ways of defining resource principals have been proposed that go beyond the traditional approach of equating resource principals with entities like processes and threads. Banga et al. provide a new operating system abstraction called a resource container which enables fine grained allocation of resources and accurate accounting of resource consumption in a single server [15]. Scheduling domains in the Nemesis operating system [69], activities in Rialto [6], and Software Performance Units [117] are other examples. Numerous approaches have been proposed for predictable scheduling of CPU cycles and network bandwidth on a single machine among competing applications. These include proportional-share schedulers such as Borrowed Virtual Time [38] and Start-time Fair Queuing [51], and reservation-based schedulers as in Rialto [6] and Nemesis [69]. There has also been work on predictable allocation of memory, disk bandwidth and shared services in single servers. Verghese et al. [117] address the problem of managing resources in a shared-memory multiprocessor to provide performance guarantees to high-level logical entities (called software performance units (SPUs)) such as a group of processes that comprise a task. Their resource management scheme, called performance isolation, has been implemented on the Silicon Graphics IRIX operating system for three system resources: CPU, memory, and disk bandwidth. Of particular interest is their mechanism for providing isolation with respect to physical memory, which works by having dynamically adjustable limits on the number of pages that different SPUs are entitled to based on their usage and importance. They also implement some mechanisms for managing shared kernel resources such as spinlocks and semaphores. Reumann et al. [61] propose an OS abstraction called Virtual Service (VS) to eliminate the performance interference 4

17 caused by shared services such as DNS, proxy cache services, time services, distributed file systems, and shared databases. VSs provide per-service resource partitioning and management by dynamically deciding resource bindings for shared services in a manner transparent to the applications. Also the resource bindings for shared services are delayed until it is known who they work for. In our work we build on such single-node resource management mechanisms and extend their benefits to distributed applications running on a cluster. Current application models are too simplistic. Most of the existing work on modeling Internet applications has looked at single-tier applications such as replicated Web servers [37, 24, 7, 3, 75]. Since these efforts focus primarily on single-tier Web servers, they are not directly applicable to applications employing multiple tiers, or to components such as Java enterprise servers or database servers employed by multi-tier applications. Further, many of the above efforts assume static Web content, while multi-tier applications, by their very nature, serve dynamic Web content. Although a few recent efforts have focused on the modeling of multi-tier applications, many of these efforts either make simplifying assumptions or are based on simple extensions of single-tier models [119, 92, 62]. These models are not sophisticated enough to capture the various application idiosyncrasies we had described earlier. Dynamic capacity provisioning has been studied only in the context of single-tier applications. Several papers have addressed the problem of dynamic resource allocation to competing applications running on a single server. Chandra et al. [25] propose a system architecture that combines online measurements with workload prediction and resource allocation techniques. The goal of their technique is to react to changing workloads by dynamically varying the resource shares of applications. Pradhan et al. [88] propose an observation-based approach that has the goal of designing self-managing Web servers that can adapt to changing workloads while maintaining QoS requirements of different request classes. While Chandra et al. [25] consider dynamic management of CPU, Pradhan et al. [88] manage CPU and the accept queue. Doyle et al. [37] present an approach for provisioning memory and storage resources based on simple queuing theoretic models of service behavior to predict resource requirements under changing load. All these techniques focus on resource allocation for applications running on a single server and are inadequate for platforms hosting multi-tiered applications with components distributed across multiple nodes. Existing policing mechanisms do not scale with increasing workload. Although considerable research has been conducted on developing admission control algorithms for Internet applications [3, 43, 63, 71, 118, 124], the issue of the scalability of the policer itself has been unaddressed. During extreme overloads, the policer units can become bottlenecks resulting in indiscriminate, class-unaware dropping of requests and thus causing loss in revenue. 1.4 Thesis Summary and Contributions Having discussed the shortcomings of existing work, we describe the contributions made by our thesis. Table 1.1 summarizes the contributions of this thesis. Analytical Models for Multi-tier Applications In this thesis, we propose analytical models of multi-tier Internet applications. Modeling single-tier applications such as vanilla Web servers (e.g., Apache) is well-studied [37, 75, 13]. In contrast, modeling multi-tier applications is less well-studied, even though this flexible architecture is widely used for constructing Internet applications and services. Extending single-tier models to multi-tier scenarios is non-trivial. Our models can handle applications with an arbitrary number of tiers and tiers with significantly different performance characteristics. Our models are designed to handle session-based workloads and can account for application idiosyncrasies such as replication at tiers, load imbalances across replicas, caching effects, and concurrency limits at each tier. Dynamic Capacity Provisioning in Dedicated Hosting Platforms Dynamic capacity provisioning is a useful technique for handling the multi-time-scale variations seen in Internet workloads. Dynamic provisioning of resources allocation and deallocation of servers to replicated 5

18 Resource Management Issue Application model (dedicated) Application model (shared) Dynamic provisioning (dedicated) Dynamic provisioning (shared) Overload management Application placement (dedicated) Application placement (shared) Our contribution Multi-tier applications Profiling based model Multi-tier, predictive and reactive, VMMs Multi-tier applications Scalable policing trivial Theoretical properties, online algorithms Table 1.1. Summary of contributions. applications has been studied in the context of single-tier applications, of which clustered HTTP servers are the most common example. However, it is non-trivial to extend provisioning mechanisms designed for singletier applications to multi-tier scenarios. We design a dynamic capacity provisioning approach for multi-tier Internet applications based on a combination of predictive and reactive mechanisms. We also show how a virtual machine based architecture can enable fast reactive provisioning. Overload Management We propose overload management mechanisms that allow a hosting platform to remain operational even under extreme overloads. Our mechanisms allow an application to handle request arrival rates of several thousand requests/sec. Managing Resources in Shared Hosting Platforms Shared hosting environments present us with some distinct resource management challenges and opportunities. In particular, unlike dedicated environments we need mechanisms to isolate collocated application components from each other. Furthermore, it is possible to achieve finer grain multiplexing of resources in a shared hosting environment. We devise an offline profiling based technique to infer the resource needs of applications and show how a shared platform may improve its revenue by careful under-provisioning of its resources. We formulate the application placement problem that arises in shared hosting platforms. We study the theoretical properties of this problem and develop online algorithms. 1.5 Overview of Our Hosting Platform Design We implement all our resource management algorithms in a prototype hosting platform based on a cluster of Linux machines and evaluate them using realistic applications and workloads. We present the architecture of our hosting platform in Figure 1.1. We also introduce some terminology that we use throughout this thesis. Our hosting platform consists of two main components the control plane and the nucleus that are responsible for managing resources in the cluster. The control plane manages resources on a cluster-wide basis it implements the application models, and the algorithms for application placement and dynamic provisioning. The nucleus is responsible for managing resources on each individual node. It takes various measurements that are needed by the placement, provisioning, and policing algorithms. Architecturally, the nucleus is distinct from the operating system kernel on a node. Moreover, unlike a middleware, the nucleus does not sit between applications and the kernel; rather it complements the functionality of the operating system kernel. We describe the design of these components in Chapters 3 and 7. As shown, an application may consist of multiple tiers. The figure shows a dedicated platform with each tier running on its own server. In a shared platform, we allow multiple application components to share a single server. The rest of the architecture is identical for both hosting models. Each application is guarded by a sentry which performs admission control to turn away excess requests during overloads. We elaborate on the design of a sentry in Chapter 4. 6

19 sessions Nucleus Capsule Sentry OS kernel Tier 1 Tier 2 Tier 3 Tier 1 Tier 2 Application A Application B Free Pool Control Plane Figure 1.1. Hosting platform architecture. We borrow terminology from Roscoe and Lyles [95] and refer to that component of an application that runs on an individual node as a capsule. Each application has at least one capsule and more if the application is distributed. Each capsule consists of one or more resource principals (processes, threads), all of which belong to the same application. Capsules provide a useful abstraction for logically partitioning an application into sub-components and for exerting control over the distribution of these components onto different nodes. To illustrate, consider an e-commerce application consisting of a Web server, a Java application server, and a database server. If all three components need to be collocated on a single node, then the application will consist of a single capsule with all three components. On the other hand, if each component needs to be placed on a different node, then the application should be partitioned into three capsules. Depending on the number of its capsules, each application runs on a subset of the platform nodes and these subsets can overlap with one another in shared hosting. Each server in the hosting platform can take one of the following roles: run an application component, run the control plane, run a sentry, or be part of the free pool. The free pool contains all the unallocated servers. 1.6 Dissertation Road-map The rest of this thesis is structured as follows. Chapters 2-4 are concerned with dedicated hosting platforms. In Chapter 2, we present analytical models for Internet applications. Chapter 3 considers the problem of dynamic capacity provisioning for Internet applications in a dedicated hosting environment. Chapter 4 addresses overload management in dedicated hosting platforms. Chapters 5-7 present resource management solutions unique to a shared hosting environment. We conclude with a summary of our research contributions in Chapter 8. 7

20 CHAPTER 2 APPLICATION MODELING 2.1 Introduction Modern Internet applications are complex software systems that employ a multi-tier architecture and are replicated or distributed on a cluster of servers. This chapter focuses on analytically modeling the behavior of such multi-tier Internet applications Motivation An analytical model of an Internet application is important for the following reasons. Capacity provisioning: Determining how much capacity to allocate to an application in order for it to service its peak workload. Performance prediction: Determining the response time of the application for a given workload and a given hardware and software configuration. application configuration: Determining various configuration parameters of the application in order to achieve a specific performance goal. Bottleneck identification and tuning: Identifying system bottlenecks for purposes of tuning. Request policing: Turning away excess requests during transient overloads. Modeling single-tier applications such as vanilla Web servers (e.g., Apache [5]) is well studied [37, 75, 13]. In contrast, modeling of multi-tier applications is less well studied, even though this flexible architecture is widely used for constructing Internet applications. Extending single-tier models to multi-tier scenarios is non-trivial due to the following reasons. First, various application tiers such as Web, Java, and database servers have vastly different performance characteristics and collectively modeling their behavior is difficult. Further, numerous factors complicate the performance modeling of multi-tier applications: some tiers may be replicated while others are not, the replicas may not be perfectly load balanced, and caching may be employed at intermediate tiers. Finally, modern Internet workloads are session-based, where each session comprises a sequence of requests with think-times in between. For instance, a session at an online retailer comprises the sequence of user requests to browse the product catalog and to make a purchase. Sessions are stateful from the perspective of the application, an aspect that must be incorporated into the model. The design of an analytical model that can capture the impact of these factors is the focus of this chapter. We present a model of a multi-tier Internet application based on a network of queues, where the queues represent different tiers of the application. Our model can handle applications with an arbitrary number of tiers and those with significantly different performance characteristics. A key contribution of our work is that the complex task of modeling a multi-tier application is reduced to that of modeling request processing at individual tiers and the flow of requests across tiers. Our model is designed to handle session-based workloads and can account for application idiosyncrasies such as replication at tiers, load imbalances across replicas, caching effects, and concurrency limits at each tier. We validate the model using two open-source multi-tier applications running on a Linux-based server cluster. We demonstrate the ability of our model to accurately capture the effects of a number of commonlyused techniques such as query caching at the database tier and class-based service differentiation. For a variety of scenarios, including an online auction application employing query caching at its database tier, the 8

Application Performance Testing Basics ABSTRACT Todays the web is playing a critical role in all the business domains such as entertainment, finance, healthcare etc. It is much important to ensure hassle-free

CHAPTER 2 QoS ROUTING AND ITS ROLE IN QOS PARADIGM 22 QoS ROUTING AND ITS ROLE IN QOS PARADIGM 2.1 INTRODUCTION As the main emphasis of the present research work is on achieving QoS in routing, hence this

1 Resource Overbooking and Application Profiling in a Shared Internet Hosting Platform BHUVAN URGAONKAR The Penn State University PRASHANT SHENOY University of Massachusetts and TIMOTHY ROSCOE ETH Zürich

Cloud Based Application Architectures using Smart Computing How to Use this Guide Joyent Smart Technology represents a sophisticated evolution in cloud computing infrastructure. Most cloud computing products

ABSTRACT Crystal clear requirements before starting an activity are always helpful in achieving the desired goals. Achieving desired results are quite difficult when there is vague or incomplete information

MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM? Ashutosh Shinde Performance Architect ashutosh_shinde@hotmail.com Validating if the workload generated by the load generating tools is applied

1. Comments on reviews a. Need to avoid just summarizing web page asks you for: i. A one or two sentence summary of the paper ii. A description of the problem they were trying to solve iii. A summary of

ORACLE DATABASE 10G ENTERPRISE EDITION OVERVIEW Oracle Database 10g Enterprise Edition is ideal for enterprises that ENTERPRISE EDITION For enterprises of any size For databases up to 8 Exabytes in size.

CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL This chapter is to introduce the client-server model and its role in the development of distributed network systems. The chapter

LOAD BALANCING TECHNIQUES FOR RELEASE 11i AND RELEASE 12 E-BUSINESS ENVIRONMENTS Venkat Perumal IT Convergence Introduction Any application server based on a certain CPU, memory and other configurations

Jean Arnaud, Sara Bouchenak Performance, Availability and Cost of Self-Adaptive Internet Services Chapter of Performance and Dependability in Service Computing: Concepts, Techniques and Research Directions

- Chung-Cheng Li and Kuochen Wang Department of Computer Science National Chiao Tung University Hsinchu, Taiwan 300 shinji10343@hotmail.com, kwang@cs.nctu.edu.tw Abstract One of the most important issues

CHAPTER 1 INTRODUCTION 1.1 Background The command over cloud computing infrastructure is increasing with the growing demands of IT infrastructure during the changed business scenario of the 21 st Century.

Performance Testing Definition: Performance Testing Performance testing is the process of determining the speed or effectiveness of a computer, network, software program or device. This process can involve

Web Servers Outline Chris Chin, Gregory Seidman, Denise Tso March 19, 2001 I. Introduction A. What is a web server? 1. is it anything that can be retrieved with an URL? 2. (web service architecture diagram)

Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

System Administration of Windchill 10.2 Overview Course Code Course Length TRN-4340-T 3 Days In this course, you will gain an understanding of how to perform routine Windchill system administration tasks,

White Paper How to Achieve Best-in-Class Performance Monitoring for Distributed Java Applications July / 2012 Introduction Critical Java business applications have been deployed for some time. However,

Hadoop Overview Data analytics has become a key element of the business decision process over the last decade. Classic reporting on a dataset stored in a database was sufficient until recently, but yesterday

179 CHAPTER 7 SUMMARY AND CONCLUSION This chapter summarizes our research achievements and conclude this thesis with discussions and interesting avenues for future exploration. The thesis describes a novel

Introduction to Apache YARN Schedulers & Queues In a nutshell, YARN was designed to address the many limitations (performance/scalability) embedded into Hadoop version 1 (MapReduce & HDFS). Some of the

Rapid Bottleneck Identification A Better Way to do Load Testing An Oracle White Paper June 2009 Rapid Bottleneck Identification A Better Way to do Load Testing. RBI combines a comprehensive understanding

A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Computing N.F. Huysamen and A.E. Krzesinski Department of Mathematical Sciences University of Stellenbosch 7600 Stellenbosch, South

Quality of Service Guarantees for Cloud Services CS848 Project presentation by Alexey Karyakin David R. Cheriton School of Computer Science University of Waterloo March 2010 Outline 1. Performance of cloud

Analysis of End-to-End Response Times of Multi-Tier Internet Services ABSTRACT Modern Internet systems have evolved from simple monolithic systems to complex multi-tiered architectures For these systems,

Improving Quality of Service Using Dell PowerConnect 6024/6024F Switches Quality of service (QoS) mechanisms classify and prioritize network traffic to improve throughput. This article explains the basic

CS640: Introduction to Computer Networks Aditya Akella Lecture 20 QoS Why a New Service Model? Best effort clearly insufficient Some applications need more assurances from the network What is the basic