DISTRIBUTED SCHEDULING ISSUES IN LOAD DISTRIBUTING

Scheduling
refers to the execution of non-interactive processes or tasks at designated
times and places around a network of computer

Distributed
scheduling refers to the chaining of different jobs into a coordinated workflow that spans several computers. For
example: - you schedule a processing job on computer1 and computer2, and when
these are finished you need to schedule a job on computer3, this is distributed
scheduling.

Motivation:

Locally
distributed system consists of a collection of autonomous computers, connected
by a LAN.

In
a locally distributed system, there is a good possibility that several computers
are heavily loaded while others are ideal or lightly loaded.

If
we can move jobs around, the overall performance of the system can be
maximized.

A
distributed scheduler is a resources management component of a distributed
operating system that focuses on judiciously and transparently redistributing
the load of the system among the computers to maximize the overall performance.

A
distributed system may have a mix of heavily and lightly loaded system.

Tasks arrive at random
intervals.

CPUs service time
requirements are also random.

Migrating a task to
share or balance load can help.

System
may be heterogeneous in terms of CPU speed and resources. The distributed
system may be heterogeneous in terms of load in different system.

Even
in homogeneous distributed system a system may be idle even when a task is
waiting for service in other system.

Consider
a system of N identical, independent M/M/1 servers. Let P be the probability
that the system is in state in which at least 1 task is waiting for service and
at least 1 server is idle.

Let
P be the utilization of each servers. We can estimate P using probabilistic
analysis and plot a graph against system utilization.

For
moderate system utilization, Value of P is high, i.e. at least 1 node is idle.
Hence, performance can be improved by sharing of task.

What
is performance? Average response time.

LOAD:

Load
on a system/node can correspond to the queue length of tasks/ processes that
need to be processed.

Queue
length of waiting tasks: proportional to task response time, hence a good
indicator of system loads.

Distributing
load: transfer tasks/processes among nodes.

If
a task transfer (from another node) takes a long time, the node may accept more
tasks during the transfer time.

Causes
the node to be highly loaded. Affects performance.

Solution:
artificially increment the queue length when a task is accepted for transfer
from remote node (to account for the proposed increased in load).

Task
transfer can fail? : use timeouts.

Types of Algorithms

Static load distribution algorithms: Decisions are hard-coded into an algorithm with a
priori knowledge of system.

Dynamic load distribution: use system state information such as task queue
length, processor utilization.

Adaptive load distribution: adapt the approach based on system state.(e.g.)
Dynamic distribution algorithms collect load information from nodes.Even at very high system loads.

Load
information collection itself can add load on the system as messages need to be
exchanged.

Adaptive
distribution algorithms may stop collecting state information at high loads.

Balancing vs.
sharing

Load balancing: Equalize load on the
participating nodes.

Transfer tasks even if a node is not heavily
loaded so that queue length on all

Nodes are approximately equal.

More number of tasks transfers, might degrade
performance.

Load sharing: Reduce burden of an overloaded
node.

Transfer tasks only when the queue length
exceeds a certain threshold.

Less number of task transfers.

Anticipatory task transfer: Transfer from
overloaded nodes to ones that are likely to become idle/highly loaded