Received 4 March 2016; accepted 26 April 2016; published 29 April 2016

ABSTRACT

The hex-cell is one of the interconnection networks used for parallel systems. The main idea of the hex-cell is that there are hexagon cells that construct the network; each one of those cells has six nodes. The performance of the network is affected by many factors one of the factors as load balancing. Until the moment of writing of this paper, there is no load balancing algorithm for this network. The proposed algorithm for dynamic load balancing on hex-cell is based on Tree Walking Algorithm (TWA) for load balancing on tree interconnection network and the ring all to all broadcast.

Keywords:

Hex-Cell, Load Balancing, Tree Walking

1. Introduction

Hex-cell is newly proposed interconnection network in (2008) [1] . The researches that evaluate the performance of the hex-cell are not enough and it should get more attention because it has potentials for parallel systems. Since there is no load balancing algorithm on hex-cell topology (until the moment of writing of this paper), the aim of this paper is to propose a dynamic load balancing algorithm and evaluate it.

The proposed algorithm is based on Tree Walking Algorithm (TWA) for load balancing on tree interconnection network and the ring all to all broadcast, using the SBHCR (Section Based Hex-Cell Routing Algorithm) addressing schema proposed in that divides the hex-cell into six sections [2] .

The rest of the paper is arranged as the following. Section 2 is the related works that we build our work on it include hex-cell network topology, Tree Walking Algorithm (TWA) and ring all to all broadcast. Section 3 is the proposed load balancing algorithm and example to illustrate the way the algorithm works. Section 4 is the simulation for the algorithm. Finally, Section 5 is the conclusion of the paper and the future work.

2. Related Works

2.1. Hex-Cell Network Topology

Hexogen units create hex-cell topology; each one of those cells has six nodes. The depth of the network is the number of levels around the innermost cell denoted by HC (d) where d is the depth. So the innermost cell has depth of one, the six cells around it form level two, then the next twelve cells make the level three and so on [1] , As shown in Figure 1.

There are three addressing for the hex-cell topology. First addressing depend on the number of the line that the node stand on from top to down and the number of node in that line from left to right [1] , which denoted by pair (X, Y) where X is the line number and Y is the node number in that line. So if the node in the third line and has position of ten then the node label is (3, 10). As shown in Figure 2.

The second addressing of the hex-cell is by dividing the network into six sections label from one to six left to right (clockwise). Each node takes a label consist of three numbers (S, L, X) where S is the section number, L is the level number and X is the node number in that level while X is not bigger than ((2 × L) − 1) [2] . So if we have a node in section three, in level two and number in that level is three then the node label is (3, 2, 3). As shown in Figure 3.

The third addressing is by using the level of the node and the node number in that level. Which denoted by pair (X, Y) where X is the level number and Y is the node number while the Y is less than 6 × (2 × L − 1) where L is the node level [3] . So the node in the third level and has number of thirty then (3, 30) as shown in Figure 4.

2.2. Tree Walking Algorithm (TWA)

The TWA uses global information collection to know the accumulative load at each subtree and the number of the nodes at each subtree. After that the root node calculates the average load and broadcast it. Then each node calculates the quota for subtree for which it is the root. Finally start the exchange of tasks between the nodes [4] .

To explain how TWA works Figure 5 shows tree with seven nodes. Each node has number of tasks.

First after the global information collection, each node has number of tasks (Wi) and number of nodes in the subtree it rooted (Ni). Then each node calculate the total number of tasks in the subtree (SWi). The root node (N0) calculates the average load and the remaining task (R) that cannot be evenly divided on nodes. Then broadcast average load and R to all the nodes. Next each node calculates the quota for itself (Qi) and the subtree it rooted (SQi). Table 1 shows the values for each node.

broadcast on ring topology need (n − 1) communication steps where n is the number of nodes.

To explain how all to all broadcast on ring topology works Figure 7 shows ring topology consist of six nodes.

Since the number of nodes equal six then the number of communication steps needed for all to all broadcast is five (6 − 1 = 5). First step each node sends its message to the next node clockwise. Then after that in the following four steps each node sends the recent message received from the previous node [5] as shown in Figure 8.

3. LBHC Proposed Algorithm

The proposed load balancing algorithm depends on the TWA on tree and all to all broadcast on ring. The hex- cell interconnection network in [2] is divided into six sections. We will use each section as tree topology and the root node for each tree construct ring topology of six nodes, as shown in Figure 9.

3.1. LBHC Algorithm

Phase one: Global information Collection

For each child in the node children {

Receive global information from the child}

If node has parent {

Send global information to the parent

} Else {

Calculate average load (accumulate task / accumulate nodes)

Calculate remaining load (accumulate task Mod accumulate nodes)}

Phase Two: All to All ring broadcast

If node is root node {

For six loops {

Send node global information to the next root node

Receive global information from the previous root node}

// Evaluate tasks load

If (maxAverage − minAverage > 5) {

Calculate new global average load tasks (total task load / 6)

Calculate new global remaining tasks (total task load Mod 6)

Calculate new average load tasks (global average load / tree nodes)

Calculate new remaining tasks (global average load Mod tree nodes)}

}

Phase Three: Broadcast the Average Load

If node has parent {

Receive global information from the parent}

For each child in the node children {

Send global information for the child}

Calculate the Quota for each subtree

Phase Four: TWA balancing

// Receiving

If node has parent {

If ((Actual accumulated tasks < Quota) {

Receive tasks from the parent}}

For each child in the node children {

If ((Actual accumulated tasks for the child > Quota) {

Receive tasks from the child}}

// Sending

For each child in the node children {

If ((Actual accumulated tasks for the child < Quota) {

Send tasks to the child}}

If node has parent {

If ((Actual accumulated tasks > Quota) {

Send tasks to the parent}}

Phase Five: Ring Balancing

If node is root node {

For two loops {

Send (extra tasks or Zero tasks) to the next root node

Receive extra tasks from the previous root node}}

Phase Six: Final Balancing

If node has parent {

If ((Actual accumulated tasks < Quota) {

Receive tasks from the parent}}

For each child in the node children {

If ((Actual accumulated tasks for the child < Quota) {

Send tasks to the child}}

3.2. LBHC Phases

3.2.1. Phase One: Global Information Collection

Local information collection in each tree topology separately, as in TWA.

3.2.2. Phase Two: All to All Ring Broadcast

Each root node in the ring topology broadcast total task and average load to all other root nodes in the ring. Evaluate average load between the trees to check if it’s efficient to do global load balancing.

If it’s efficient to do global balancing then new global average load tasks and remaining tasks for each tree calculated as following:

1) Trees with extra tasks apply TWA so that the extra task goes to the root node.

2) Trees with exact quota just apply TWA.

3) Trees with fewer tasks than its quota do TWA to complete its node from bottom to top with the right number of tasks (the node quota).

3.2.5. Phase Five: Ring Balancing

If the root node has extra tasks then sends the extra tasks to the next root else it sends Zero tasks to the next root node.

3.2.6. Phase Six: Final Balancing

Apply TWA again to balance the received tasks.

3.3. LBHC Example

Here is an example on the proposed LBHC algorithm. Figure 10 shows hex-cell network and each node has number of tasks.

3.3.1. Phase One: Global Information Collection

Each tree does the global information collection and compute total number of tasks and number of nodes in each subtree. The following six tables (Tables 2-7) show that information.

3.3.2. Phase Two: All to All Ring Broadcast

Each root node sends the maximum average and minimum average. Then evaluate the task number and it is efficient to do global balancing since maximum quota is 18 and the minimum is 13. (Since we have chosen 5 tasks different between the highest average tasks load and lowest average tasks load to check the efficiency to do

Trees 1, 4 and 5 have fewer tasks than its quota so we apply TWA to complete its node from bottom to top with the right number of tasks (the node quota) and wait for more tasks. As an example we take tree Section 4. Figure 13 shows the tree Section 4 that has fewer tasks than its quota.

Now the exchanging of tasks between the nodes is done as the following:

Each node i waits to receive tasks from its parent if (SWi < Qi) or from its children j if (SWj > Qj) else the node i send tasks to its parent if (SWi > Qi) or to its children j if (SWj < Qj).

the received tasks (total extra tasks 25) and send it to root node in tree Section 4.

4) Root node in tree Section 4 will receive 25 tasks, while it has fewer tasks load than it quota then it will take the appropriate number of task (23 tasks) and send the rest (2 tasks) to the root node in tree Section 5.

5) Root node in tree Section 5 will receive 2 tasks, while it has fewer tasks load than it quota then it will take the appropriate number of task (1 task) and send the rest (1 tasks) to the root node in tree Section 6.

6) Root node in tree Section 5 will receive 1 task, since it has extra tasks (9 tasks) it will add the extra tasks to the received tasks (total extra tasks 10) and send it to root node in tree Section 1.

Finally the hex-cell network tasks load is balanced as shown in Figure 19.

4. Simulation

The simulation is done using JAVA programming language version (1.8.0_51) 64-bit, using multi-threading to simulate each node. The hardware specification for our simulation are:

1) Processor: Intel(R) Core(TM) i5-2450 M CPU @ 2.50 GHz.

2) RAM: 6.00 GB.

3) System type: Windows 7/64-bit.

As for the simulation we have chosen 5 different tasks between the highest average tasks load and lowest average tasks load to check the efficiency to do global balancing between the sections trees.

The major factor to evaluate load balancing algorithm is the accuracy. After the simulation with different levels and different inputs for the load tasks for each node, LBHC proved to be effective. In all runs, the difference between the highest task load and the lowest task load is one. (Unless it is not efficient to do global balancing where the difference will be the factor we choose and that is 5).

The execution time for any algorithm is one of the important factors to evaluate the performance. So Figure 20 shows the average execution time for LBHC in various levels, from level one (hex-cell has 6 nodes) to level ten (hex-cell with 600 nodes).

Another factor we have studied is number of messages in the network while applying the LBHC algorithm. Figure 21 shows the average number of messages in various levels from level one (hex-cell has 6 nodes) to level ten (hex-cell with 600 nodes).

5. Conclusion and Future Work

In this paper, we have proposed a dynamic load balancing algorithm for hex-cell interconnection network that

Figure 18. Tree section 4 apply TWA again to balance the received tasks.

is based on Tree Walking Algorithm (TWA) and the ring all to all broadcast, using SBHCR addressing schema that divides the hex-cell into six sections. As the simulation shows, this algorithm has good performance in execution time and number of message compared with the number of nodes in the network. But since there are no other dynamic load balancing algorithms on the hex-cell, we could not compare it any algorithm. For future

work, the next step is to propose another load balancing algorithm and compare it with the proposed one. After that, we could do more research on other aspects of the hex-cell interconnection network to evaluate its performance.