Static Load Balancing

Transcription

1 Load Balancing

2 Load Balancing Load balancing: distributing data and/or computations across multiple processes to maximize efficiency for a parallel program. Static load-balancing: the algorithm decides a priori how to divide the workload. Dynamic load-balancing: the algorithm collects statistics while it runs and uses that information to rebalance the workload across the processes as it runs.

4 Dynamic Load Balancing Manages a queue of tasks, known as the workpool. Usually more effective than static load balancing. Overhead of collecting statistics while the program runs. More complex to implement. Two styles of dynamic load balancing: Centralized Workpool: The workpool is kept at a coordinator process, which hands out tasks and collects newly generated tasks from worker processes. Distributed Workpool: The workpool is distributed across the worker processes. Tasks are exchanged between arbitrary processes. Requires a more complex termination detection technique to know when the program has finished.

5 Centralized Workpool Return results/ new tasks/ Request task Task P 0 Centralized Work Pool... task queue Coordinator 1 p 1 worker processes 2 p 2 The workpool holds a collection of tasks to be performed. Processes are supplied with tasks when they finish previously assigned task and request for another task. This leads to load balancing. Processes can generate new tasks to be added to the workpool as well. Termination: The workpool program is terminated when the task queue is empty, and each worker process has made a request for another task without any new tasks being generated.

6 Distributed Workpool Distributed task queue Requests/Tasks The task of queues is distributed across the processes. Any process can request any other process for a task or send it a task. Suitable when the memory required to store the tasks is larger than can fit on one system.

7 Distributed Workpool How does the work load get balanced? Receiver initiated: A process that is idle or has light load asks another process for a task. Works better for a system with high load. Sender initiated: A process that has a heavy load send task(s) to another process. Works better for a system with a light load. How do we determine which process to contact? Round robin. Random polling. Structured: The processes can be arranged in a logical ring or a tree.

8 Distributed Workpool Termination Two conditions must be true to be able to terminated a distributed workpool correctly: local termination conditions exist on each process, and no messages are in transit. This is tricky...here are two ways to deal with it: Tree-based termination algorithm. A tree order is imposed on the processes based on who sends a message for the first time to a process. At termination the tree is traversed bottom-up to the root. Dual-pass token ring algorithm. A separate phase that passes a token to determine if the distributed algorithm has finished. The algorithm specifically detects if any messages were in transit.

9 Dual-pass Token Ring Termination Task white token white token Pi turns white token black P0 Pj Pi Pn 1 Process 0 becomes white when terminated and passes a white token to process 1. Processes pass on the token after meeting local termination conditions. However, if a process sends a message to a process earlier than itself in the ring, it colors itself black. A black process colors a token black before passing it on. A white process passes the token without any change. If process 0 receives a white token, termination conditions have been met. If it receives a black token, it starts a new ring with another white token.

10 Example: Shortest Paths Given a directed graph with n vertices and m weighted edges, find the shortest paths from a source vertex to all other vertices. For two given vertices i and j, the weight of the edge between the two is given by the weight function w(i, j). The distance is infinite if there is no edge between i and j. Graph can be represented in two different ways: Adjacency matrix: A two dimensional array w[0... n 1][0... n 1] holds the weight of the edges. Adjacency lists: An array adj[0... n 1] of lists, where the ith list represents the vertices adjacent to the ith vertex. The list stores the weights of the corresponding edges. Sequential shortest paths algorithms Dijstkra s shortest paths algorithm: Uses a priority queue to grow the shortest paths tree one edge at a time: has limited opportunities for parallelism. Moore s shortest path algorithm: Works by finding new shorter paths all over the graph. Allows for more parallelism but can do extra work by exploring a given vertex multiple times.

11 Moore s Algorithm A FIFO queue of vertices to explore is maintained. Initially it contains just the source vertex. A distance array dist[0... n 1] represents the current shortest distance to the respective vertex. Initially the distance to the source vertex is zero and all other distances are infinity. Remove the vertex i in the front of the queue and explore edges from it. Suppose vertex j is connected to vertex i. Then compare the shortest distance from the source that is currently known to the distance going through vertex i. If the new distance is shorter, update the distance and add vertex j into the queue (if not in queue already). d[i] i w(i,j) j d[j] Repeat until the vertex queue is empty.

12 Shortest Paths using Centralized Workpool Task (for Shortest paths): One vertex to be explored. Coordinator process (process 0): Holds the workpool, which consists of the queue of vertices to be explored. This queue shrinks and grows dynamically.

15 Improvements to the Centralized Workpool Solution Make the task contain multiple vertices to make the granularity be more coarse. Instead of sending a new task every time a lower distance is found, wait until all edges out of a vertex have been explored and then send results together in one message. Updating local copy of distance array would eliminate many tasks from being created in the first place. This would give further improvement. Use a priority queue instead of a FIFO queue for workpool. This should give some more improvement for large enough graphs.

16 Shortest Paths using Distributed Workpool Process i searches around vertex i and stores if vertex i is in the queue or not. Process i keeps track of the ith entry of the distance array. Process i stores the adjacency matrix row or adjacency list for vertex i. If a process receives a message containing a distance, it checks with its stored value and if it is smaller, it updates distances to its neighbors and send messages to the corresponding processes

17 Improvements to the Distributed Workpool Solution Make the task contain multiple vertices to make the granularity be more coarse. Combine messages and only send known minimums by keeping a local estimate of the distance array. Maintain the local copy of the distance array as a priority queue. In actual implementation, the distributed workpool solution (with the optimizations) was able to scale much more than the centralized solution.

18 Comparison of Various Implementations $

19 Further Reading Pencil Beam Redefinition Algorithm: A dynamic load balancing scheme that is adaptive in nature. The statistics are collected centrally but the data is rebalanced in a distributed manner! This is based on an actual medical application code. Parallel Toolkit Library: Masters project by Kirsten Allison. This library gives a centralized and distributed workpool design pattern that any application programmer can use without having to implement the same complex patterns again and again. Notes on both are on the class website under lecture notes.

Scheduling MIMD parallel program A number of tasks executing serially or in parallel Lecture : Load Balancing The scheduling problem NP-complete problem (in general) Distribute tasks on processors so that

Asynchronous Computations Asynchronous Computations Computations in which individual processes operate without needing to synchronize with other processes. Synchronizing processes is an expensive operation

CSE / Notes : Task Scheduling & Load Balancing Task Scheduling A task is a (sequential) activity that uses a set of inputs to produce a set of outputs. A task (precedence) graph is an acyclic, directed

Load Load & Termination Lecture 7 Load and Termination Detection Load Load Want all processors to operate continuously to minimize time to completion. Load balancing determines what work will be done by

Graph A graph G consist of 1. Set of vertices V (called nodes), (V = {v1, v2, v3, v4...}) and 2. Set of edges E (i.e., E {e1, e2, e3...cm} A graph can be represents as G = (V, E), where V is a finite and

Load Balancing Backtracking, branch & bound and alpha-beta pruning: how to assign work to idle processes without much communication? Additionally for alpha-beta pruning: implementing the young-brothers-wait

A Review And Evaluations Of Shortest Path Algorithms Kairanbay Magzhan, Hajar Mat Jani Abstract: Nowadays, in computer networks, the routing is based on the shortest path problem. This will help in minimizing

PROBLEM ONE (Trees) Homework 15 Solutions 1. Recall the definition of a tree: a tree is a connected, undirected graph which has no cycles. Which of the following definitions are equivalent to this definition

QUEUES A queue is simply a waiting line that grows by adding elements to its end and shrinks by removing elements from the. Compared to stack, it reflects the more commonly used maxim in real-world, namely,

Chapter 4 Trees 4.1 Basics A tree is a connected graph with no cycles. A forest is a collection of trees. A vertex of degree one, particularly in a tree, is called a leaf. Trees arise in a variety of applications.

Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web 1/7 Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society.

Part-A QUESTION BANK UNIT-IV 1. What is a File? A file is a named collection of related information that is recorded on secondary storage. A file contains either programs or data. A file has certain structure

Project 2 CPU Scheduling Simulator 1. Objectives: This project is to simulate a few CPU scheduling algorithms discussed in the class. You will write a C or C++ program to implement a simulator with different

Proposed Algorithm 1 The performance of the Round Robin Scheduling Algorithm relies on the size of the time quantum. At one extreme, if the time quantum is extremely large, cause less response time and

The ADT Graph Recall the ADT binary tree: a tree structure used mainly to represent 1 to 2 relations, i.e. each item has at most two immediate successors. Limitations of tree structures: an item in a tree

CS2 Algorithms and Data Structures Note 11 Breadth-First Search and Shortest Paths In this last lecture of the CS2 Algorithms and Data Structures thread we will consider the problem of computing distances

Data Structures and Algorithms Written Examination 22 February 2013 FIRST NAME STUDENT NUMBER LAST NAME SIGNATURE Instructions for students: Write First Name, Last Name, Student Number and Signature where

Dynamic Programming Applies when the following Principle of Optimality holds: In an optimal sequence of decisions or choices, each subsequence must be optimal. Translation: There s a recursive solution.

1. What scheduling policy will you use for each of the following cases? Explain your reasons for choosing them. a. The processes arrive at large time intervals: b. The system s efficiency is measured by

OPTIMAL BINARY SEARCH TREES 1. PREPARATION BEFORE LAB DATA STRUCTURES An optimal binary search tree is a binary search tree for which the nodes are arranged on levels such that the tree cost is minimum.

8.5 PETRI NETS Consider the computer program shown in Figure 8.5.1. Normally, the instructions would be processed sequentially first, A = 1, then B = 2, and so on. However, notice that there is no logical

Interconnection Networks Interconnection Networks Interconnection networks are used everywhere! Supercomputers connecting the processors Routers connecting the ports can consider a router as a parallel

Divide and Conquer Divide the problem into several subproblems of equal size. Recursively solve each subproblem in parallel. Merge the solutions to the various subproblems into a solution for the original

Discrete Mathematics Lent 2009 MA210 Solutions to Exercises 8 (1) Suppose that G is a graph in which every vertex has degree at least k, where k 1, and in which every cycle contains at least 4 vertices.

2.3 Scheduling jobs on identical parallel machines There are jobs to be processed, and there are identical machines (running in parallel) to which each job may be assigned Each job = 1,,, must be processed

CPU Scheduling James Moscola Department of Engineering & Computer Science York College of Pennsylvania Based on Operating System Concepts, 9th Edition by Silberschatz, Galvin, Gagne Scheduling Concepts

Artificial Intelligence Lecture 2 Representation From AI admirers to AI programmers. Step 1: Represent the problem so that it is computerfriendly. Step 2: Code the problem in a programming language. Step

Distributed Computing over Communication Networks: Maximal Independent Set What is a MIS? MIS An independent set (IS) of an undirected graph is a subset U of nodes such that no two nodes in U are adjacent.

LOAD BALANCING TECHNIQUES Two imporatnt characteristics of distributed systems are resource multiplicity and system transparency. In a distributed system we have a number of resources interconnected by

4 Basics of Trees Trees, actually acyclic connected simple graphs, are among the simplest graph classes. Despite their simplicity, they still have rich structure and many useful application, such as in

Chapter 2 Processor Scheduling 2.1 Processes A process is an executing program, including the current values of the program counter, registers, and variables.the subtle difference between a process and

CME 305: Discrete Mathematics and Algorithms 1 Basic Definitions and Concepts in Graph Theory A graph G(V, E) is a set V of vertices and a set E of edges. In an undirected graph, an edge is an unordered