Cluster Load Balancing for Fine-grain Network Services

Transcription

1 Cluster Load Balancing for Finegrain Network Services Kai Shen Dept. of Computer Science University of California Santa Barbara, CA 9316 Tao Yang Dept. of Computer Science UC Santa Barbara and Teoma/Ask Jeeves Lingkun Chu Dept. of Computer Science University of California Santa Barbara, CA 9316 Abstract This paper studies cluster load balancing policies and system support for finegrain network services. Load balancing on a cluster of machines has been studied extensively in the literature, mainly focusing on coarsegrain distributed computation. Finegrain services introduce additional challenges because system states fluctuate rapidly for those services and system performance is highly sensitive to various overhead. The main contribution of our work is to identify effective load balancing schemes for finegrain services through simulations and empirical evaluations on synthetic workload and real traces. Another contribution is the design and implementation of a load balancing system in a Linux cluster that strikes a balance between acquiring enough load information and minimizing system overhead. Our study concludes that: 1) Random polling based loadbalancing policies are wellsuited for finegrain network services; 2) A small poll size provides sufficient information for load balancing, while an excessively large poll size may in fact degrade the performance due to polling overhead; 3) Discarding slowresponding polls can further improve system performance. 1. Introduction Largescale clusterbased network services are increasingly emerging to deliver highly scalable, available, and featurerich user experiences. Inside those service clusters, a node can elect to provide services and it can also access services provided by other nodes. It serves as an internal server or client in each context respectively. Services are usually partitioned, replicated, aggregated, and then delivered to external clients through protocol gateways. Figure 1 illustrates the architecture of such a service cluster. In this A shorter version appears in 22 International Parallel & Distributed Processing Symposium. example, the service cluster delivers a discussion group and a photo album service to widearea browsers and wireless clients through web servers and WAP gateways. The discussion group service is delivered independently while the photo album service relies on an internal image store service. All the components (including protocol gateways) are replicated. In addition, the image store service is partitioned into two partition groups. Photo Photo Album Photo Album Photo Partition Album Partition Album 19 Partition 19 Partition Widearea Network Web Web Server Web Server Server Highthroughput lowlatency network Image Image Store Image Store Partition Store Partition 9 Partition 9 9 Service cluster Image Image Store Image Store Partition Store Partition 1 Partition Wireless Network WAP WAP Gateway WAP Gateway Gateway Discussion Discussion Group Discussion Group Discussion Partition Group Partition Group 19 Partition 19 Partition Figure 1. Architecture of a service cluster. While previous research has addressed the issues of scalability, availability, extensibility, and service replication support in building largescale network service infrastructures [3, 16, 18, 19, 26, 28], there is still a lack of comprehensive study on load balancing support in this context. This paper studies the issue of providing efficient load balancing support for accessing replicated services inside the

2 service cluster. The request distribution between widearea external clients and geographically distributed service clusters is out of the scope of this paper. A large amount of work has been done by the industry and research community to optimize HTTP request distribution among a cluster of Web servers [1, 2, 4, 7, 9, 21, 25]. Most load balancing policies proposed in such a context rely on the premise that all network packets go through a single frontend dispatcher or a TCPaware (layer 4 or above) switch so that TCP level connectionbased statistics can be accurately maintained. In contrast, clients and servers inside the service cluster are often connected by highthroughput, lowlatency Ethernet (layer 2) or IP (layer 3) switches, which do not provide any TCP level traffic statistics. This constraint calls for more complex load information dissemination schemes. Previous research has proposed and evaluated various load balancing policies for clusterbased distributed systems [6, 8, 11, 12, 17, 23, 24, 29, 3]. Load balancing techniques in these studies are valuable in general, but not all of them can be applied for clusterbased network services. This is because they focus on coarsegrain computation and often ignore finegrain jobs by simply processing them locally. For example, the job trace used in a previous tracedriven simulation study has a mean job execution time of seconds [3]. In the context of network services, with the trend towards delivering more featurerich services in real time, large number of finegrain subservices need to be aggregated within a short period of time. For example, a distributed hash table lookup for keyword search usually only takes a couple of milliseconds. Finegrain services introduce additional challenges because server workload can fluctuate rapidly for those services. The results from previous simulation studies for load balancing may not be valid because finegrain services are sensitive to various system overhead which is hard to accurately capture in simulations. In recognizing this limitation, we developed a prototype implementation based on a clustering infrastructure and conducted evaluations on a Linux cluster. Overall, our evaluation methodology is based on simulations as well as experiments with a prototype implementation. The rest of this paper is organized as follows. Section 1.1 describes the service traces and synthetic workloads that are used in this paper. Section 2 presents our simulation studies on load balancing policies and the impact of various parameters. Section 3 describes a prototype system implementation in a Linux cluster with a proposed optimization. Section 4 evaluates the performance of this prototype system. Section 5 discusses related work and Section 6 concludes the paper Evaluation Workload We collected the traces of two internal service cluster components from search engine Teoma [5] and their statistics are listed in Table 1. Both traces were collected across an oneweek time span in late July 21. One of the services provides the translation between query words and their internal representations. The other service supports a similar translation between Web page descriptions and their internal representations. Both services support multiple translations in one access. The first trace has a mean service time of 22.2 ms and we call it the FineGrain trace. The second trace has a mean service time of 28.9 ms and we call it the MediumGrain trace. We use a peak time portion (early afternoon hours of three consecutive weekdays) from each trace in our study. Most system resources are well underutilized during nonpeak times, therefore load balancing is less critical during those times. Note that the arrival intervals of those two traces may be scaled when necessary to generate workloads at various demand levels during our evaluation. In addition to the traces, we also include a synthetic workload with Poisson process arrivals and exponentially distributed service times. We call this workload Poisson/Exp in the rest of this paper. Several previous studies on Internet connections and workstation clusters suggested that both the interarrival time distribution and the service time distribution exhibit high variance, thus are better modeled by Lognormal, Weibull, or Pareto distributions [13, 2]. We choose Poisson/Exp workload in our study for the following reasons. First, we believe the peak time arrival process is less bursty than the arrival process over a long period of time. Secondly, the service time distribution tends to have a low variance for services of the same type. In fact, Table 1 shows that those distributions in our traces have even lower variance than an exponentially distributed sample would have. 2. Simulation Studies In this section, we present the results of our simulation studies. We confine our study on fully distributed load balancing policies that do not contain any single point of failure because high availability is essential in building largescale network service infrastructure. We will first examine the load information inaccuracy caused by its dissemination delay. This delay is generally insignificant for coarsegrain jobs but it can be critical for finegrain services. We will then move on to study two distributed load balancing policies: 1) the broadcast policy in which load information is propagated through serverinitiated pushing; and 2) the polling policy in which load information is propagated through clientinitiated pulling. We choose them because

3 Number of accesses Arrival interval Service time Workload Total Peak portion Mean Std. dev. Mean Std. dev. MediumGrain trace 1,55, , ms 321.1ms 28.9ms 62.9ms FineGrain trace 1,171,838 98, ms 349.4ms 22.2ms 1.ms i;j=(1?)2i+jji?jj=2 1X Table 1. Statistics of evaluation traces. 1?2 they represent two broad categories of policies in terms of For a Poisson/Exp workload, since the limiting probability that a single server system has a queue length ofkis how load information is propagated from the servers to the clients. In addition, they are both shown to be competitive (1?)k[22], the upperbound can be calculated as: in a previous tracedriven simulation study [3]. In our simulation model, each server contains a nonpreemptive processing unit and a FIFO service queue. The (1) network latency of sending a service request and receiving a service response is set to be half a TCP roundtrip latency with connection setup and teardown, which is measured at See Appendix A for the detailed calculation. 516 us in a switched 1 Mb/s Linux cluster. <A> server 9% busy We choose the mean service response time as the performance 1 index to measure and compare the effectiveness of various policies. We believe this is a better choice than system throughput for evaluating load balancing policies because system throughput is tightly related to the admission control, which is beyond the scope of this paper Accuracy of Load Information Almost all load balancing policies use some sort of load indexes to measure server load levels. Prior studies have suggested a linear combination of the resource queue lengths can be an excellent predictor of service response time [14, 29]. We use the total number of active service accesses, i.e. the queue length, on each server as the server load index. In most distributed policies, load indexes are typically propagated from server side to client side in some way and then each client uses acquired information to direct service accesses to lightly loaded servers. Accuracy of the load index is crucial for clients to make effective load balancing decision. However, the load index tends to be stale due to the delay between the moment it is being measured at the server and the moment it is being used at the client. We define the load index inaccuracy for a certain delaytas the statistical mean of the queue length difference measured at arbitrary timetandt+t. Figure 2 illustrates the impact of this delay (normalized to mean service time) on the load index inaccuracy for a single server through simulations on all three workloads. We also show the upperbound for Poisson/Exp in a straight line. With the assumption that the inaccuracy monotonically increases with the increase oft, the upperbound is the statistical mean of the queue length difference measured at any two arbitrary timet1andt2. Let termbe defined as the mean service time divided by the mean arrival interval, which reflects the level of server load. Load index inaccuracy Load index inaccuracy 4 Upperbound for Poisson/Exp 2 Poisson/Exp Medium Grain trace Fine Grain trace Delay (normalized to mean service time) <B> server 5% busy Upperbound for Poisson/Exp Poisson/Exp Medium Grain trace Fine Grain trace Delay (normalized to mean service time) Figure 2. Impact of delay on load index inaccuracy with 1 server (simulation). When the server is moderately busy (5%), the load index inaccuracy quickly reaches the upperbound (1.33 for Poisson/Exp) when delay increases, but the inaccuracy is

4 moderate even under high delay. This means a approach is likely to work well when servers are only moderately busy and fancier policies do not improve much. When the server is very busy (9%), the load index inaccuracy is much more significant and it can cause an error of around 3 in the load index when the delay is around 1 times the mean service time. This analysis reveals that when servers are busy, finegrain services require small dissemination delays in order to have accurate load information on the client side Broadcast Policy In the broadcast policy, an agent is deployed at each server which collects the server load information and announces it through a broadcast channel at various intervals. It is important to have nonfixed broadcast intervals to avoid the system selfsynchronization [15]. The intervals we use are evenly distributed between.5 and 1.5 times the mean value. Each client listens at this broadcast channel and maintains the server load information locally. Then every service request is made to a server with the lightest workload. Since the server load information maintained at the client side is acquired through periodical server broadcasts, this information becomes stale between consecutive broadcasts and the staleness is in large part determined by the broadcast frequency. Figure 3 illustrates the impact of broadcast frequency through simulations. A 5 ms mean service time is used for Poisson/Exp workload. Sixteen servers are used in the simulation. The mean response time shown in Figure 3 is normalized to the mean response time under an approach, in which all server load indices can be accurately acquired on the client side freeofcost whenever a service request is to be made. When servers are 9% busy, we observe that the performance for broadcast policy with 1 second mean broadcast interval could be an order of magnitude slower than the scenario for finegrain services (Poisson/Exp and FineGrain trace). The degradation is less severe (up to 3 times) when servers are 5% busy, but it is still significant. This problem is mainly caused by the staleness of load information due to low broadcast frequency. But we also want to emphasize that the staleness is severely aggravated by the flocking effect of the broadcast policy, i.e. all service requests tend to flock to a single server (the one with the lowest perceived queue length) between consecutive broadcasts. The performance under low broadcast interval, e.g. interval ms is close to the scenario. However, we believe the overhead will be prohibitive under such high frequency, e.g. a sixteen server system with ms mean broadcast interval will force each client to process a broadcast message every 2 ms. Mean response time (normalized to IDEAL) Mean response time (normalized to IDEAL) <A> server 9% busy Poisson/Exp 5ms Medium Grain trace Fine Grain trace Mean broadcast interval (in milliseconds) <B> server 5% busy Poisson/Exp 5ms Medium Grain trace Fine Grain trace Mean broadcast interval (in milliseconds) Figure 3. Impact of broadcast frequency with 16 servers (simulation) Random Polling Policy For every service access, the polling policy requires a client to ly poll several servers for load information and then direct the service access to the most lightly loaded server according to the polling results. An important parameter for a polling policy is the poll size. Mitzenmacher demonstrated through analytical models that a poll size of two leads to an exponential improvement over pure policy, but a poll size larger than two leads to much less substantial additional improvement [24]. Figure 4 illustrates our simulation results on the impact of poll size using all three workloads. Policies with the poll size of 2, 3, 4, and 8 are compared with the and approach in a sixteen server system. A 5 ms mean service time is used for Poisson/Exp workload. This result basically confirms Mitzenmacher' s analytical results in the sense that a poll size of two performs significantly better than a pure policy while a larger poll size does not provide much additional benefit. Our simula

5 Mean response time (in milliseconds) <A> Medium Grain trace 5% 6% 7% 8% 9% Mean response time (in milliseconds) <B> Poisson/Exp with mean service time 5ms % 6% 7% 8% 9% Mean response time (in milliseconds) <C> Fine Grain trace Figure 4. Impact of poll size with 16 servers (simulation). 5% 6% 7% 8% 9% tion also suggests that this result is consistent across all service granularity and all server load level, which makes it a very robust policy. We believe the polling policy is wellsuited for finegrain services because the justintime polling always guarantees very little staleness on the load information Summary of Simulation Studies First, our simulation study shows that a long delay between the load index measurement time at the server and its time of usage at the client can yield significant inaccuracy. This load index inaccuracy tends to be more severe for finergrain services and busier servers. Then we go on to study two representative policies, broadcast and polling. Our results show that polling based load balancing policies deliver competitive performance across all service granularities and all server load levels. In particular, the policy with a poll size of two already delivers competitive performance with the scenario. As for the broadcast policy, we identify the difficulty of choosing a proper broadcast frequency for finegrain services. A low broadcast frequency results in severe load index inaccuracy, and in turn degrades the system performance significantly. A high broadcast frequency, on the other hand, introduces high broadcast overhead. Ideally, the broadcast frequency should linearly scale with the system load level to cope with rapid system state fluctuation. This creates a scalability problem because the number of messages under a broadcast policy would linearly scale with three factors: 1) the system load level; 2) the number of servers; and 3) the number of clients. In contrast, the number of messages under the polling policy only scale with the server load level and the number of servers. 3. Prototype Design and Implementation We have developed a prototype implementation of the polling policy on top of a clusterbased service infrastructure. The simulation results in Section 2 favor polling policy so strongly that we do not consider the broadcast policy in the prototype system System Architecture This implementation is a continuation of our previous work on Neptune, a clusterbased infrastructure for aggregating and replicating partitionable network services [28]. Neptune allows services ranging from readonly to frequently updated be replicated and aggregated in a cluster environment. Neptune encapsulates an applicationlevel network service through a service access interface which contains several RPClike access methods. Each service access through one of these methods can be fulfilled exclusively on one data partition. We employ a flat architecture in constructing the service network infrastructure. A node can elect to provide services and it can also access services provided by other nodes. It serves as an internal server or client in each context respectively. Each node, when elects to provide services, maintains a service queue and a worker thread pool. The size of the thread pool is chosen to strike the best balance between concurrency and efficiency. Conceptually, for each service access, the client first acquires the set of available server nodes through a service availability subsystem. Then it chooses one node from the available set through a load balancing subsystem before sending the service request. Our service availability subsystem is maintained around a wellknown publish/subscribe channel, which can be implemented using IP multicast or a highly available wellknown central directory. Each cluster

6 node can elect to provide services through repeatedly publishing the service type, the data partitions it hosts, and the access interface. Published information is kept as soft state in the channel such that it has to be refreshed frequently to stay alive [1]. Each client node subscribes to the wellknown channel and maintains a service/partition mapping table. We implemented a polling policy for the load balancing subsystem. On the server side, we augmented each node to respond to load inquiry requests. For each service access, the client ly chooses a certain number of servers out of the available set returned from the service availability subsystem. Then it sends out load inquiry requests to those servers through connected UDP sockets and asynchronously collects the responses using select system call. Service availability subsystem Service mapping table Publish/subscribe channel Service publishing Client node Load balancing subsystem Polling agent Load inquiries Load index server Server node Service access point Service access Request queue, thread pool Figure 5. The client/server architecture in our service infrastructure. Figure 5 illustrates the client/server architecture in our service infrastructure. Overall, both subsystems employ a looselyconnected and flat architecture which allows the service infrastructure to operate smoothly in the presence of transient failures and service evolution Discarding Slowresponding Polls On top of the basic polling implementation, we also made an enhancement by discarding slowresponding polls. Through a pingpong test on two idle machines in our Linux cluster, we measured that a UDP roundtrip cost is around 29 us. However, it may take much longer than that for a busy server to respond a UDP request. We profiled a typical run under a poll size of 3, a server load index of 9%, and 16 server nodes. The profiling shows that 8.1% of the polls are not completed within 1 ms and 5.6% of them are not completed within 2 ms. With this in mind, we enhanced the basic polling policy by discarding polls not responded within 1 ms. Intuitively, this results in a tradeoff between consuming less polling time and acquiring more load information. However, we also realize that long polls result in inaccurate load information due to long delay. Discarding those long polls can avoid using stale load information, which is an additional advantage. And this tends to be more substantial for finegrain services Linux as a Network Service Platform Our experience of using Linux as a platform to build largescale network services has not been a very smooth one. We observed intermittent kernel crashes under some networkintensive workload. Another problem is that the system only allows around 4 ports to stay in the TCP TIME WAIT state at a given time. This causes our testing clients to perform abnormally in some large testing configurations, which forces us to use more machines to run testing clients than otherwise needed. We recently ported our implementation in a Solaris cluster. Our initial experience shows that Solaris has a more reliable and scalable network kernel and we plan to investigate further in the future. 4. Experimental Evaluations All the evaluations in this section were conducted on a rackmounted Linux cluster with around 3 dual 4 Mhz Pentium II nodes, each of which contains either 512 MB or 1 GB memory. Each node runs Linux and has two 1 Mb/s Ethernet interfaces. The cluster is connected by a Lucent P55 Ethernet switch with 22 Gb/s backplane bandwidth. All the experiments presented in this section use 16 server nodes and up to 6 client nodes. Due to various system overhead, we notice that the server load level cannot simply be the mean service time divided by the mean arrival interval. For each workload on a singleserver setting, we consider the server reach full load (1%) when around 98% of client requests were successfully completed within two seconds. Then we use this as the basis to calculate the client request rate for various server load levels. The service processing on the server side is emulated using a CPUspinning microbenchmark that consumes the same amount of CPU time as the intended service time. The scenario in our simulation study is achieved when all server load indices can be accurately acquired on the client side freeofcost whenever a service request is to be made. For the purpose of comparison, we emulate a corresponding scenario in the evaluations of our prototype implementation. This is achieved through a centralized load

7 index manager which keeps track of all server load indices. Each client contacts the load index manager whenever a service access is to be made. The load index manager returns the server with the shortest service queue and increments that queue length by one. Upon finishing one service access, each client is required to contact the load index manager again so that the corresponding server queue length can be properly decremented. This approach closely emulates the actual scenario with a delay of around one TCP roundtrip without connection setup and teardown (around 339 us in our Linux cluster) Evaluation on Poll Size Figure 6 shows our experimental results on the impact of poll size using all three workloads. We observe that the results for MediumGrain trace and Poisson/Exp workload largely confirm the simulation results in Section 2. However, for the FineGrain trace with very finegrain service accesses, we notice that a poll size of 8 exhibits far worse performance than policies with smaller poll sizes and it is even slightly worse than the pure policy. This is caused by excessive polling overhead coming from two sources: 1) longer polling delays resulted from larger poll size; 2) less accurate server load index due to longer polling delay. And those overheads are more severe for finegrain services. Our conclusion is that a small poll size (e.g. 2 or 3) provides sufficient information for load balancing. And an excessively large poll size may even degrade the performance due to polling overhead, especially for finegrain services Improvement of Discarding Slowresponding Polls Table 2 shows the overall improvement and the improvement excluding polling time for discarding slowresponding polls. The experiments are conducted with a poll size of 3 and a server load index of 9%. The experiment on MediumGrain trace shows a slight performance degradation due to the loss of load information. However, the results on both FineGrain trace and Poisson/Exp workload exhibit sizable improvement in addition to the reduction of polling time and this additional improvement is a result of avoiding the use of stale load information. Overall, the enhancement of discarding slowresponding polls can improve the load balancing performance by up to 8.3%. Note that the performance results shown in Figure 6 are not with discarding slowresponding polls. 5. Related Work This work is a continuation of our previous research on Neptune: a clusterbased infrastructure for aggregating and replicating partitionable network services [28]. Closely related to a group of work on building largescale network services in cluster environments [16, 18, 19, 26], Neptune provides a scalable, available, and extensible service infrastructure through service partitioning, replication, and aggregation. The study of load balancing policies in this paper complements these service infrastructure work by providing efficient load balancing support suitable for all service granularities. A large body of work has been done to optimize HTTP request distribution among a cluster of Web servers [1, 2, 4, 7, 9, 21, 25]. In particular, localityaware request distribution (LARD) has been proposed to strike the balance between data locality and dynamic load balancing for contentbased request distribution [25]. Most load balancing policies proposed in such a context rely on the premise that all network packets go through a single frontend dispatcher or a TCPaware (layer 4 or above) switch so that TCP level connectionbased statistics can be accurately maintained. However, clients and servers inside the service cluster are often connected by highthroughput, lowlatency Ethernet (layer 2) or IP (layer 3) switches, which do not provide any TCP level traffic statistics. Our study in this paper shows that an optimized polling policy that does not require centralized statistics can deliver competitive performance based on a prototype implementation on a Linux cluster. Previous research has proposed and evaluated various load balancing policies for clusterbased distributed systems [8, 11, 12, 17, 23, 24, 29, 3]. Those studies mostly deal with coarsegrain distributed computation and often ignore finegrain jobs by simply processing them locally. We put our focus on finegrain network services by examining the sensitivity of the load information dissemination delay and its overhead. Both are minor issues for coarsegrain jobs but they are critical for finegrain services. Several recent studies show that network servers based on Virtual Interface (VI) Architecture provide significant performance benefits over standard server networking interfaces [9, 27]. Generally the advance in network performance improves the effectiveness of all load balancing policies. In particular, such an advance has certain impact on our results. First, a highperformance network layer may allow efficient and high frequency server broadcasts, which improves the feasibility of the broadcast policy. However, the flocking effect and the scalability issue we raised in Section 2 remain to be solved. Secondly, a reduction in network overhead might change some quantitative results of our experimental evaluations. For instance, the overhead of the

8 Mean response time (in milliseconds) <A> Medium Grain trace 5% 6% 7% 8% 9% Mean response time (in milliseconds) <B> Poisson/Exp with mean service time 5ms % 6% 7% 8% 9% Mean response time (in milliseconds) <C> Fine Grain trace 5% 6% 7% 8% 9% Figure 6. Impact of poll size based on a prototype implementation with 16 servers. Mean response time (mean polling time) Improvement Workload Original Optimized Overall Excl. polling time MediumGrain trace 282.1ms (2.6ms) 283.1ms (1.ms).4%.9% Poisson/Exp 81.8ms (2.7ms) 79.2ms (1.1ms) 3.2% 1.2% FineGrain trace 51.6ms (2.7ms) 47.3ms (1.1ms) 8.3% 5.2% Table 2. Performance improvement of discarding slowresponding polls with poll size 3 and server 9% busy. polling policy with a large poll size might not be as severe as those shown in our experiments. Those issues should be addressed when advanced network standards become more widespread. 6. Concluding Remarks In this paper, we study load balancing policies for clusterbased network services with the emphases on finegrain services. Our evaluation is based on a synthetic workload and two traces we acquired from an online search engine. In addition to simulations, we also developed a prototype implementation on a Linux cluster and conducted experimental evaluations with it. Our study and evaluations identify techniques that are effective for finegrain services and lead us to make several conclusions: 1) Random polling based loadbalancing policies are wellsuited for finegrain network services; 2) A small poll size provides sufficient information for load balancing, while an excessively large poll size may even degrade the performance due to polling overhead; 3) An optimization of discarding slowresponding polls can further improve the performance by up to 8.3%. Acknowledgment. This work was supported in part by NSF CCR97264, ACIR82666 and We would like to thank Ricardo Bianchini, Apostolos Gerasoulis, Rich Martin, UB=1X Hong Tang, and the anonymous referees for their valuable i;j=(1?)2i+jji?jj comments and help. A. Calculation for Equation (1) We start with defining UB=(1?)21Xn=nnXi=jn?2ij (2) Letn=i+j, then we have F(n)=2kXi=j2k?2ij (3) =2k?1 Xi=(2k?2i)=2k(k+1) We definef(n)=pni=jn?2ij. Whennis an even number such thatn=2k, we have (4)

OpenFlow Based Load Balancing Hardeep Uppal and Dane Brandon University of Washington CSE561: Networking Project Report Abstract: In today s high-traffic internet, it is often desirable to have multiple

CS423 Spring 2015 MP4: Dynamic Load Balancer Due April 27 th at 9:00 am 2015 1. Goals and Overview 1. In this MP you will design a Dynamic Load Balancer architecture for a Distributed System 2. You will

The Three-level Approaches for Differentiated Service in Clustering Web Server Myung-Sub Lee and Chang-Hyeon Park School of Computer Science and Electrical Engineering, Yeungnam University Kyungsan, Kyungbuk

A COMPARISON OF LOAD SHARING AND JOB SCHEDULING IN A NETWORK OF WORKSTATIONS HELEN D. KARATZA Department of Informatics Aristotle University of Thessaloniki 546 Thessaloniki, GREECE Email: karatza@csd.auth.gr

Locality Based Protocol for MultiWriter Replication systems Lei Gao Department of Computer Science The University of Texas at Austin lgao@cs.utexas.edu One of the challenging problems in building replication

A Survey Study on Monitoring Service for Grid Erkang You erkyou@indiana.edu ABSTRACT Grid is a distributed system that integrates heterogeneous systems into a single transparent computer, aiming to provide

Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking Burjiz Soorty School of Computing and Mathematical Sciences Auckland University of Technology Auckland, New Zealand

MEASURING WORKLOAD PERFORMANCE IS THE INFRASTRUCTURE A PROBLEM? Ashutosh Shinde Performance Architect ashutosh_shinde@hotmail.com Validating if the workload generated by the load generating tools is applied

1. Comments on reviews a. Need to avoid just summarizing web page asks you for: i. A one or two sentence summary of the paper ii. A description of the problem they were trying to solve iii. A summary of

White Paper Capacity Planning Guide for Adobe LiveCycle Data Services 2.6 Create applications that can deliver thousands of messages per second to thousands of end users simultaneously Table of contents

Performance Evaluation of Linux Bridge James T. Yu School of Computer Science, Telecommunications, and Information System (CTI) DePaul University ABSTRACT This paper studies a unique network feature, Ethernet

152 APPENDIX 1 USER LEVEL IMPLEMENTATION OF PPATPAN IN LINUX SYSTEM A1.1 INTRODUCTION PPATPAN is implemented in a test bed with five Linux system arranged in a multihop topology. The system is implemented

Protagonist International Journal of Management And Technology (PIJMT) Online ISSN- 2394-3742 Vol 2 No 3 (May-2015) A Qualitative Approach To Design An Algorithm And Its Implementation For Dynamic Load

An Experimental Study of Load Balancing on Amoeba Weiping Zhu C.F. Steketee School of Computer and Information Science University of South Australia Adelaide, Australia SA5095 Abstract This paper presents

1 Using Service Brokers for Accessing Servers for Web Applications Huamin Chen and Prasant Mohapatra Abstract Current Web servers use various API sets to access backend services. This model does not support

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

RAMCloud and the Low- Latency Datacenter John Ousterhout Stanford University Most important driver for innovation in computer systems: Rise of the datacenter Phase 1: large scale Phase 2: low latency Introduction

Dynamic Multi-User Load Balancing in Distributed Systems Satish Penmatsa and Anthony T. Chronopoulos The University of Texas at San Antonio Dept. of Computer Science One UTSA Circle, San Antonio, Texas

ZEN LOAD BALANCER EE v3.04 DATASHEET The Load Balancing made easy OVERVIEW The global communication and the continuous growth of services provided through the Internet or local infrastructure require to