Techniques for the Design of Algorithms

Algorithmic Paradigms

General approaches to the construction of efficient
solutions to problems.

Such methods are of interest because:

They provide templates suited to solving a broad range of
diverse problems.

They can be translated into common control and data structures
provided by most high-level languages.

The temporal and spatial requirements of the algorithms which
result can be precisely analysed.

Over the next few lectures we shall examine the following paradigms:

Divide and Conquer.

Dynamic Programming

Greedy Method

Backtracking.

Although more than one technique may be applicable to
a specific problem, it is often the case that an algorithm
constructed by one approach is clearly superior to
equivalent solutions built using alternative techniques.

The choice of design paradigm is an important aspect of
algorithm synthesis.

Basic Algorithm Analysis

Questions

How does one calculate the running time of an algorithm?

How can we compare two different algorithms?

How do we know if an algorithm is `optimal'?

1.Count the number of
basic operations
performed by the algorithm on the
worst-case input

A basic operation could be:

An assignment

A comparison between two variables

An arithmetic operation between two variables.
The worst-case input is that input assignment for which
the most basic operations are performed.

Simple Example:

n := 5;
loop
get(m);
n := n -1;
until (m=0 orn=0)

Worst-case: 5 iterations

Usually we are not concerned with the number of steps for a
single fixed case
but wish to estimate the running time in terms of the `input size'.

get(n);
loop
get(m);
n := n -1;
until (m=0 orn=0)

Worst-case: n iterations

Examples of `input size':

Sorting:

n == The number of items to be sorted;

Basic operation: Comparison.

Multiplication (of x and y):

n == The number of digits in x
plus the number of digits in y.

Basic operations: single digit arithmetic.

Graph `searching':

n == the number of nodes in the graph or the number of edges
in the graph.

Examples

Divide-and-Conquer

This is a method of designing algorithms that (informally)
proceeds as follows:

Given an instance of the problem to be solved, split
this into several, smaller, sub-instances (of the same problem)
independently solve each of the sub-instances and then combine
the sub-instance solutions so as to yield a solution for the
original instance.
This description raises the question:

By what methods are the sub-instances to be independently solved?

The answer to this question is central to the concept of Divide-&amp-Conquer
algorithm and is a key factor in gauging their efficiency.

Consider the following:
We have an algorithm, alpha say, which is known to solve
all problem instances of size n in at most c n^2 steps
(where c is some constant).
We then discover an algorithm, beta say, which solves the same
problem by:

Dividing an instance into 3 sub-instances of size n/2.

Solves these 3 sub-instances.

Combines the three sub-solutions taking d n steps to do this.

Suppose our original algorithm, alpha, is used to carry out
the `solves these sub-instances' step 2.
Let

T(alpha)( n ) = Running time of alpha

T(beta)( n ) = Running time of beta

Then,

T(alpha)( n ) = c n^2 (by definition of alpha)

But

T(beta)( n ) = 3 T(alpha)( n/2 ) + d n
= (3/4)(cn^2) + dn

So if dn &lt (cn^2)/4 (i.e. d &lt cn/4) then beta is faster
thanalpha

In particular for all large enough n, (n &gt 4d/c = Constant),
beta is faster than alpha.

This realisation of beta improves upon alpha by just a constant factor.
But if the problem size, n, is large enough then

n &gt 4d/c
n/2 &gt 4d/c
...
n/2^i &gt 4d/c

which suggests that using beta instead of alpha for the `solves
these'
stage repeatedly until the sub-sub-sub..sub-instances are of sizen0 &lt = (4d/c)
will yield a still faster algorithm.

We shall show how relations of this form can be estimated later in the
course. With these methods it can be shown that

T(gamma)( n ) = O( n^{log3} ) (=O(n^{1.59..})

This is an asymptotic improvement upon algorithms alpha and beta.

The improvement that results from applying algorithm gamma is
due to the fact that it maximises the savings achieved beta.

The (relatively) inefficient method alpha is applied only to
"small" problem sizes.

The precise form of a divide-and-conquer algorithm is
characterised by:

The threshold input size, n0, below which
the problem size is not sub-divided.

The size of sub-instances into which an instance is split.

The number of such sub-instances.

The algorithm used to combine sub-solutions.

In (II) it is more usual to consider the ratio of initial
problem size to sub-instance size. In our example this was 2. The
threshold in (I) is sometimes called the (recursive) base value.
In summary, the generic form of a divide-and-conquer algorithm is:

procedureD-and-C (n : input size) isbeginifn &lt = n0then
Solve problem without further
sub-division;
else
Split into r sub-instances
each of size n/k;
for each of the r sub-instances do
D-and-C (n/k);
Combine the r resulting
sub-solutions to produce
the solution to the original problem;
end if;
endD-and-C;

Such algorithms are naturally and easily realised as:

Recursive Procedures

in (suitable) high-level programming languages.

Example 1:

Binary Search

Consider the following problem: one has a directory containing a
set of names and a telephone number associated with
each name.

The directory is sorted by alphabetical order of names. It contains n
entries which are stored in 2 arrays:

names (1..n) ; numbers (1..n)

Given a name and the value n the problem is to find the number
associated with the name.

We assume that any given input name actually does occur in the directory,
in order to make the exposition easier.

The Divide-&amp-Conquer algorithm to solve this problem is the simplest example
of the paradigm.

It is based on the following observation

Given a name, X say,

X occurs in the middle place of the names array

Or

X occurs in the first half of the names array. (U)

Or

X occurs in the second half of the names array. (L)

U (respectively L) are true only ifX comes before
(respectively after) that name stored in the middle place.

Exercise: How should this algorithm be modified to cater for
the possibility that a given name does not occur in the directory?
In terms of the generic form of divide-&amp-conquer algorithms,
at which stage is this modification made?

We defer analysis of the algorithm's performance until later.

Example 2

Closest Pair

Input:

P = {p(1), p(2) ,..., p(n) }

where p(i) = ( x(i), y(i) ).

A set of n points in the plane.

Output

The distance between the two points that are closest.

Note: The distance DELTA( i, j ) between p(i) and p(j)
is defined by the expression:

Square root of { (x(i)-x(j))^2 + (y(i)-y(j))^2 }

We describe a divide-and-conquer algorithm for this problem.

We assume that:

n is an exact power of 2, n = 2^k.

For each i, x(i) &lt = x(i+1), i.e. the points are
ordered by increasing x from left to right.

Consider drawing a vertical line (L) through the
set of points P so that half of the points in P
lie to the left of L and half lie to the right
of L.

A step is regarded as a single operation involving two
single digit numbers, e.g. 5+6, 3*4, etc.

In 1962, A.A. Karatsuba discovered an asymptotically faster
algorithm for multiplying two numbers by using a divide-and-conquer approach.

From this we also know that the result of multiplying x and y
(i.e. z)
is

The terms ( a*c ), ( a*d ), ( b*c ), and ( b*d ) are
each products of

2 (n/2)-digit numbers.

Thus the expression for the multiplication of x and y in terms of
the numbers
a, b, c, and d tells us that:

Two single digit numbers can be multiplied immediately. (Recursive base: 1 step)

If n &gt 1 then the product of 2 n-digit numbers can be expressed
in terms of
4 products of 2 (n/2)-digit numbers (Divide-and-Conquer stage)

To calculate the result of multiplying x and y given the four products
returned involves only addition (can be done in O( n ) steps)
and multiplying by a power of 10 (also can be done in O( n ) steps,
since it only requires placing the appropriate number of 0s at the end of the number).
(Combine stage).

(1-3) therefore describe a Divide-&amp-Conquer algorithm for multiplying two
n-digit numbers represented in decimal.
However,

Moderately difficult:
How many steps does the resulting algorithm take to multiply two
n-digit numbers?

Karatsuba discovered how the product of 2 n-digit numbers
could be expressed in terms of three products each of 2 (n/2)-digit
numbers - instead of the four products that a naive implementation
of the Divide-and-Conquer schema above uses.

This saving is accomplished at the expense of slightly increasing the number
of steps taken in the `combine stage' (Step 3) (although, this will still
only use O( n ) operations).

In the case when alpha and beta are both constant (as in all the
examples we have given) there is a general method that can be used to solve such
recurrence relations in order to obtain an asymptotic bound for
the running time Tp( n ).

Dynamic Programming

This paradigm is most often applied in the construction
of algorithms to solve a certain class of

Optimisation Problem

That is: problems which require the minimisation
or maximisation of some measure.

One disadvantage of using Divide-and-Conquer is that
the process of recursively solving separate sub-instances
can result in the same computations being performed
repeatedly since identical sub-instances may arise.

The idea behind dynamic programming is to avoid this
pathology by obviating the requirement to calculate the
same quantity twice.

The method usually accomplishes this by maintaining a table
of sub-instance results.

Dynamic Programming is a

Bottom-Up Technique

in which the smallest sub-instances are explicitly solved first
and the results of these used to construct solutions to progressively
larger sub-instances.

In contrast, Divide-and-Conquer is a

Top-Down Technique

which logically progresses from the initial instance down to
the smallest sub-instances via intermediate sub-instances.
We can illustrate these points by considering the problem of
calculating the Binomial Coefficient, "n choose k", i.e.

Using this relationship, a rather crude Divide-and-Conquer solution to the problem of
calculating the Binomial Coefficient `n choose k' would be:

By contrast, the Dynamic Programming approach uses the same relationship
but constructs a table of all the (n+1)*(k+1) binomial coefficients`i choose j' for each value of i between 0 and n, each value of j
between 0 and k.

These are calculated in a particular order:

First the table entries corresponding to
the coefficients `i choose 0' and `1 choose 1' are fixed to the value 1.

The remaining table entries corresponding to the binomial coefficient `i choose j'
are calculated in increasing order of the value of i+j.

It should be noted that since the coefficient `i choose j' requires only
the values of `i-1 choose j-1' and `i-1 choose j', computing the
table entries in the order of increasing i+j ensures that the table
entries needed for ``i choose j' have already been calculated, i.e.

is invoked when all the table entries `i choose j', for which
i+j equals the current value of sum, have been found.
The if statement increments the value of sum
and sets up the new values of i and j.

Now consider the differences between the two methods:
The Divide-and-Conquer approach recomputes values, such as
"2 choose 1", a very large number of times,
particularly if n is large and k depends on n, i.e. k is not a constant.

It can be shown that the running time of this method is

Despite the fact that the algorithm description is quite simple (it is just
a direct implementation of the relationship given) it is completely
infeasible as a practical algorithm.

The Dynamic Programming method, since it computes each value
"i choose j"exactly once is
far more efficient. Its running time is O( n*k ), which is
O( n^2 ) in the worst-case, (again k = n/2).

It will be noticed that the dynamic programming solution is rather
more involved than the recursive Divide-and-Conquer method, nevertheless
its running time is practical.

The binomial coefficient
example illustrates the key features of dynamic programming algorithms.

A table of all sub-instance results is constructed.

The entries corresponding to the smallest sub-instances are
initiated at the start of the algorithm.

The remaining entries are filled in following a precise order
(that corresponds to increasing sub-instance size) using only
those entries that have already been computed.

Each entry is calculated exactly once.

The final value computed is the solution to the initial problem
instance.

Implementation is by iteration (never by recursion, even
though the analysis of a problem may naturally suggest a recursive solution).

Example: Shortest Path

Input:
A directed graph, G( V, E ), with nodes

V = {1, 2 ,..., n }

and edges E as subset of VxV. Each edge in E has associated with
it a non-negative length.

Output:
An nXn matrix, D, in which D^(i,j) contains the length
of the shortest path from node i to node j in G.

Informal Overview of Method

The algorithm, conceptually, constructs a sequence of matrices:

D0, D1 ,..., Dk ,..., Dn

For each k (with 1 &lt = k &lt = n), the (i, j) entry of Dk,
denoted Dk( i,j ), will contain the
Length of the shortest path from node i to node j when only the nodes

{ 1, 2, 3 ,..., k }

can be used as intermediate nodes on the path.

Obviously Dn = D.

The matrix, D0, corresponds to the `smallest sub-instance'.
D0 is initiated as:

0 if i=j
D0( i, j ) = infinite if (i,j) not in E
Length(i,j) if (i,j) is in E

Now, suppose we have constructed Dk, for some k &lt n.

How do we proceed to build D(k+1)?

The shortest path from i to j with only

{ 1, 2, 3 ,..., k, k+1 }

available as internal nodes

Either: Does not contain the node k+1.

Or: Does contain the node k+1.

In the former case:

D(k+1)( i, j ) = Dk( i, j )

In the latter case:

D(k+1)( i, j ) = D-k( i, k+1 ) + D-k( k+1, j )

Therefore D(k+1)( i, j ) is given by

Dk(i,j)
minimum
Dk( i, k+1 ) + Dk( k+1, j )

Although these relationships suggest using a recursive
algorithm, as with the previous example, such a realisation
would be extremely inefficient.

Instead an iterative algorithm is employed.

Only onenXn matrix, D, is needed.

This is because after the matrix D(k+1) has been constructed,
the matrix Dk is no longer needed. Therefore D(k+1)
can overwrite Dk.

In the implementation below, L denotes the matrix of
edge lengths for the set of edges in the graph G( V, E ).

O( n^3 )

Greedy Algorithms

This is another approach that is often used to
design algorithms for solving

Optimisation Problems

In contrast to dynamic programming, however,

Greedy algorithms do not always yield a genuinely
optimal solution. In such cases the greedy method is frequently
the basis of a heuristic approach.

Even for problems which can be solved exactly by a greedy
algorithm, establishing the correctness of the method
may be a non-trivial process.

In order to give a precise description of the greedy paradigm
we must first consider a more detailed definition of the
environment in which typical optimisation problems occur.
Thus in an optimisation problem, one will have, in the
context of greedy algorithms, the following:

A predicate (solution) to test whether a given set of candidates
give a solution (not necessarily optimal).

A predicate (feasible) to test if a set
of candidates can be extended
to a (not necessarily optimal) solution.

A selection function (select) which chooses some candidate which h
as not
yet been used.

An objective function which assigns a value to a solution.

In other words:
An optimisation problem involves finding a subset, S, from a
collection of candidates, C; the subset, S, must satisfy
some specified criteria, i.e. be a solution and be such that
the objective function is optimised by S.
`Optimised' may mean

Minimised or Maximised

depending on the precise problem being solved.
Greedy methods are distinguished by the fact that the selection function
assigns a numerical value to each candidate, x, and chooses that
candidate for which:

SELECT( x ) is largest

or
SELECT( x ) is smallest

All Greedy Algorithms have exactly the same general form. A Greedy
Algorithm for a particular problem is specified by describing the
predicates `solution' and `feasible'; and the selection
function `select'.

Consequently, Greedy Algorithms are often very easy to design for
optimisation problems.

Before proving the correctness of this algorithm, we give an
example of it running.

The algorithm may be viewed as dividing the set of nodes, V, into
n parts or components:

{1} ; {2} ; ... ; {n}

An edge is added to the set S if and only if it joins two
nodes which belong to different components; if an edge is
added to S then the two components containing its endpoints are
coalesced into a single component.

In this way, the algorithm stops when there is just a single component

{ 1, 2 ,..., n }

remaining.

Iteration

Edge

Components

0

-

{1}; {2}; {3}; {4}; {5}; {6}; {7}

1

{1,2}

{1,2}; {3}; {4}; {5}; {6}; {7}

2

{2,3}

{1,2,3}; {4}; {5}; {6}; {7}

3

{4,5}

{1,2,3}; {4,5}; {6}; {7}

4

{6,7}

{1,2,3}; {4,5}; {6,7}

5

{1,4}

{1,2,3,4,5}; {6,7}

6

{2,5}

Not included (adds cycle)

7

{4,7}

{1,2,3,4,5,6,7}

Question: How do we know that the
resulting set of edges form a
Minimal
Spanning Tree?

In order to prove this we need the following result.

For G(V,E) as before, a subset, F, of the edges E is
called promising if F is a subset of the edges in a
minimal spanning tree of G(V,E).

Lemma:
Let G(V,E) be as before and W be a subset of V.

Let F, a subset of E
be a promising set of edges such that no edges in F
has exactly one endpoint in W.

If {p,q} in E-F is a shortest edge having exactly one
of p or q in W then: the set of edges F union { {p,q} }
is promising.

Proof:
Let T(V,H) be a minimal spanning tree of G(V,E) such that
F is a subset of H. Note that T exists since F is a promising set of
edges.

Consider the edge e = {p,q} of the Lemma statement.

If e is in H then the result follows immediately, so suppose that
e is not in H. Assume that p is in W and q
is not in W and consider the
graph T(V, H union {e}).

Since T is a tree the graph T
(which contains one extra edge) must contain a cycle that includes the
(new) edge {p,q}.

Now since p is in W and q is not in W there must be some edge,
e' = {p',q'} in H which is also part of this cycle
and is such that p' is in W and q' is not in W.

Now, by the choice of e, we know that

Length ( e ) &lt = Length ( e' )

Removing the edge e' from T gives a new spanning tree of G(V,E).

The cost of this tree is exactly

cost( T ) - Length(e') + Length(e)

and this is &lt = cost(T).

T is a minimal spanning tree so either e and e' have
the same length
or this case cannot occur. It follows that there is a minimal spanning tree containing
F union {e} and hence this set of edges is promising as claimed.

Theorem:
Kruskal's algorithm always produces a minimal spanning tree.

Proof:
We show by induction on k &gt = 0 - the number of edges in S at each stage -
that the set of edges in S is always promising.

Base
(k = 0): S = {} and obviously the empty set of edges is promising.

A very simple example shows that the method can fail to deliver an
optimal solution. Let

U = { u1, u2, u3 ,..., u12 }

s(u1) = 101 ; v(u1) = 102

s(ui) = v(ui) = 10 2 &lt = i &lt = 12

K = 110

Greedy solution: S = {u1}; Value is 102.

Optimal solution: S = U - {u1}; Value is 110.

Backtracking and Searching

In a number of applications graph structures occur.
The graph may be an explicit object in the problem instance
as in:

Shortest Path Problem

Minimal Spanning Tree Problem

Graphs, however, may also occur implicitly as an
abstract mechanism with which to analyse problems and
construct algorithms for these.
Among the many areas where such an approach has been
used are:

Game Playing Programs

Theorem Proving Systems

Semantic Nets

Hypertext

...

Whether a graph is an explicit or implicit structure in describing
a problem, it is often the case that searching the
graph structure may be necessary.
Thus it is required to have methods which

Can `mark' nodes in a graph which have already been `examined'.

Determine which node should be examined next.

Ensure that every node in the graph can (but not necessarily
will) be visited.

These requirements must be realised subject to the constraint
that the search process respects the structure of the graph.

That is to say,
(With the exception of the first node inspected)

Any new node examined must be adjacent to some node
that has previously been visited.

So, search methods implicitly describe an ordering
of the nodes in a given graph.

One search method that occurs frequently with implicit graphs
is the technique known as

backtracking

Suppose a problem may be expressed in terms of detecting a particular
class of subgraph in a graph.

Then the backtracking approach to solving such a problem would be:

Scan each node of the graph, following a specific order, until

A subgraph constituting a solution has been found.

or

It is discovered that the subgraph built so far cannot be extended
to be a solution.

If (2) occurs then the search process is `backed-up' until
a node is reached from which a solution might still be found.

Simple Example

Knight's Tour

Given a natural number, n, describe how a Knight should be moved
on an n times n chessboard so that it visits every square exactly once
and ends on its starting square.

The implicit graph in this problem has n^2 nodes corresponding
to each position the Knight must occupy. There is an edge between
two of these nodes if the corresponding positions are `one move apart'.
The sub-graph that defines a solution is a cycle which contains each
node of the implicit graph.

Of course it is not necessary to construct this graph
explicitly, in order to to solve the problem.
The algorithm below, recursively searches the graph, labeling
each square (i.e. node) in the order in which it is visited.
In this algorithm:

If G( V,E ) is a directed graph then it is possible
that not all of the nodes of the graph are reachable from a single
root node. To deal with this the algorithm is modified by changing the
`Main Program Section' to

The running time of both algorithms, input G( V,E ) is O( |E| )
since each edge of the graph is examined only once.

It may be noted that the recursive form given, is a rather
inefficient method of implementing depth first search for explicit
graphs.

Iterative versions exist, e.g. the method of Hopcroft and Tarjan
which is based on a 19th century algorithm discovered by Tremaux.
Depth-first search has a large number of applications in solving
other graph based problems, for example:

Topological Sorting

Connectivity Testing

Planarity Testing

One disadvantage of depth first search as a mechanism for
searching implicit graph structures, is that expanding
some paths may result in the search process never terminating
because no solution can be reached from these. For example
this is a possible difficulty that arises when scanning
proof trees in Prolog implementations.

Breadth-first Search is another search method that is
less likely to exhibit such behaviour.