A static parallelization of tree search using pthreads

In our static parallelization, a single thread uses breadth-first search to generate enough partial tours so that each thread gets at least one partial tour.

A static parallelization of tree search using pthreads

In our static parallelization, a single thread
uses breadth-first search to generate enough partial tours so that each thread
gets at least one partial tour. Then each thread takes its partial tours and
runs iterative tree search on them. We can use the pseudocode shown in Program
6.7 on each thread. Note that most of the function calls—for example, Best_tour, Feasible, Add city—need to access the adjacency matrix representing the digraph, so
all the threads will need to access the digraph. However, since these are only read accesses, this won’t result in a race
condition or contention among the threads.

There are only four potential differences
between this pseudocode and the pseudocode we used for the second iterative
serial implementation:

. The use of my stack instead of stack; since
each thread has its own, private stack,

. we use my stack as the identifier for the
stack object instead of stack.

. Initialization of the stack.

. Implementation of the Best tour function.
Implementation of the Update best tour function.

In the serial implementation, the stack is
initialized by pushing the partial tour con-sisting only of the hometown onto
the stack. In the parallel version we need to generate at least thread count partial tours to distribute among the threads. As we discussed
earlier, we can use breadth-first search to generate a list of at least thread count tours by having a single thread search the tree until it reaches a
levelwith at least thread count tours. (Note that this implies that the number
of threads should be less than (n-1)! , which shouldn’t be a problem). Then the threads can

use a block partition to divide these tours
among themselves and push them onto their private stacks. Exercise 6.18 looks
into the details.

To implement the Best
tour function, a thread should compare the cost of
its current tour with the cost of the global best tour. Since multiple threads
may be simul-taneously accessing the global best cost, it might at first seem
that there will be a race condition. However, the Best tour function only reads the global best cost, so there won’t be any conflict with threads
that are also checking the best cost. If a thread is updating the global best
cost, then a thread that is just checking it will either read the old value or
the new, updated value. While we would prefer that it get the new value, we
can’t insure this without using some very costly locking strategy. For example,
threads wanting to execute Best_tour or Update_best_tour could wait on a sin-gle mutex. This would
insure that no thread is updating while another thread is only checking, but
would have the unfortunate side effect that only one thread could check the
best cost at a time. We could improve on this by using a read-write lock, but
this would have the side effect that the readers—the threads calling Best_tour—would all block while a thread updated the best tour. In
principle, this doesn’t sound too bad, but recall that in practice read-write
locks can be quite slow. So it seems pretty clear that the “no contention”
solution of possibly getting a best tour cost that’s out-of-date is probably
better, as the next time the thread calls Best_tour, it will get the updated value of the best
tour cost.

On the
other hand, we call Update_best_tour with the intention of writing to the best tour structure, and this clearly
can cause a race condition if two threads call it simultaneously. To avoid this
problem, we can protect the body of the Update_best_tour function with a mutex.
This isn’t enough, however; between the time a thread completes the test in
Best tour and the time it obtains the lock in
Update_best_tour, another thread may have obtained the lock and updated
the best tour cost, which now may be
less than the best tour cost that the first thread found in Best_tour. Thus, correct pseudocode for
Update_best_tour should look something
like this:

Update best tour function with a mutex. This isn’t enough, however; between thetime a thread completes the test in Best
tour and the time it obtains the lock in Update best tour, another thread may have obtained the lock and updated the besttour cost, which now may be less than the best tour cost that the
first thread found in Best tour. Thus, correct pseudocode for Update best tour should look somethinglike this:

This may seem wasteful, but if updates to the
best tour are infrequent, then most of the time Best
tour will return false and it will only be rarely necessary to make
the “double” call.