Part IV Advanced Design and Analysis IV.1
Techniques
Outline:
Chapter 15: Dynamic Programming applies to
optimization problems in which a set of
choices must be made to get the optimal
solution. The key idea is to store the
solution to a subproblem that can occur from
more than one set of choices. A dynamic
programming solution can sometimes change an
exponential-time algorithm into a polynomial
time algorithm.
Chapter 16: Greedy algorithms also apply to
optimization problems in which a set of
choices must be made to get the optimal
solution. The key idea here is to make each
choice in a locally optimal way. An example
is coin-changing: to minimize the number of
coins given as change, repeatedly select the
largest-denomination coin that is not
greater than the amount still owed.
Chapter 17: Amortized analysis is a tool for
analyzing algorithms that perform a
sequence of operations from a set of a few
operations. Instead of bounding the cost of
each operation separately, amortized
analysis gives a bound on the entire
sequence of operations.
Chapter 15 Dynamic Programming 15.0.1
Like divide-and-conquer, dynamic programming
solves problems by combining solutions to
subproblems ("programming" refers to a tabular
method, as also used in "linear programming",
not to writing computer code). In contrast to
divide-and-conquer, where the subproblems are
independent, dynamic programming is applicable
when subproblems share common subsubproblems.
Dynamic programming solves each subsubproblem
just once and saves its answer in a table.
Dynamic programming is typically applied to
optimization problems, which often have many
solutions. Each solution has a value, and we
wish to find a solution with the optimal value
(minimum or maximum; several such solutions
may exist). The four steps for developing a
dynamic-programming algorithm solution are:
1. Characterize the structure of an optimal
solution.
2. Recursively define the value of an optimal
solution.
3. Compute the value of an optimal solution in
a bottom-up fashion (may be desired result).
4. Construct an optimal solution from the
computed information (may be omitted).
15.1 Rod Cutting 15.1.1
Section 15.1 treats the problem of cutting a
rod into smaller rods to maximize their total
value. Section 15.2 asks how to multiply a
chain of matrices to minimize the number of
scalar multiplications. Section 15.3 treats
the theory underlying dynamic programming.
In Sections 15.4 and 15.5 dynamic programming
is used to solve the longest subsequence and
optimal binary search tree problems.
We want to know how to cut up a steel rod of
length n into integer lengths in order to
maximize revenue, given the prices p_i of a
rod of length i, as in Figure 15.1 page 360:
length i | 1 2 3 4 5 6 7 8 9 10
----------|-----------------------------------
price p_i | 1 5 8 9 10 17 17 20 24 30
Figure 15.2 shows the possibilities for n = 4:
One rod of length 4, cost 9;
rods of length 1 and 3: cost 9;
2 rods of length 2: cost 10 (optimal);
2 rods of length 1, 1 of length 2: cost 7;
4 rods of length 1: cost 4
15.1.2
In general, we can cut a rod in 2^(n-1) ways,
opting to cut or not cut at distance i from
the left end. We use addition to denote how
to cut a rod: 7 = 2+2+3 indicates a rod of
length 7 is cut into pieces of lengths 2, 2, &
3. In general, we optimally cut a rod into k
pieces of lengths i_1, i_2, ..., i_k, so:
n = i_1 + i_2 + i_3 + ... i_k
and the optimal revenue is:
r_n = p_i1 + p_i2 + ... + p_ik
The optimal revenues for i = 1, 2, ..., 10:
r_1 = 1 from 1 = 1 (no cuts),
r_2 = 5 from 2 = 2 (no cuts),
r_3 = 8 from 3 = 3 (no cuts),
r_4 = 10 from 4 = 2 + 2,
r_5 = 13 from 5 = 2 + 3,
r_6 = 17 from 6 = 6 (no cuts),
r_7 = 18 from 7 = 1 + 6 or 7 = 2 + 2 + 3,
r_8 = 22 from 8 = 2 + 6,
r_9 = 25 from 9 = 3 + 6,
r_10 = 30 from 10 = 10 (no cuts).
In general: 15.1.3
r_n = max( p_n, r_1 + r_(n-1), r_2 + r_(n-2),
..., r_(n-1) + r_1) (15.1)
where p_n corresponds to no cut, and the other
n -1 values r_i + r_(n-i) correspond to cuts
at i. Thus to maximize r_n, we maximize the
independent subproblems r_i and r_(n-i) - the
optimal substructure property. We simplify
the analysis by assuming the rod to the left
of i has no further cuts; only the rod to the
right may be cut again. This simplification
reduces the problem to finding a solution to
only one subproblem. So any decomposition has
a cost p_i for the left rod and r_(n-i) for
the revenue of the decomposed right rod. So
if we use the entire rod, i = n, and we let
r_0 = 0, thus obtaining the simpler formula,
which is Step 1 of dynamic programming:
r_n = max ( p_i + r_(n-i) ) (15.2)
i <= i <= n
Recursive top-down implementation (Step 2)
CUT_ROD(p, n)
1 if n == 0
2 return 0
3 q = -inf
4 for i = 1 to n
5 q = max(q, p[i] + CUT_ROD(p, n - i) )
6 return q
CUT_ROD has array p[1..n] of prices 15.1.4
and n as arguments, and returns the maximum
value of the revenue. It can be proved to be
correct by using formula (15.2).
However, CUT_ROD inefficiently calls itself
repeatedly on small inputs. Figure 15.3 (on
page 364) shows what happens when n = 4:
____________(4)____
/ | \ \
/ | \ \
(3) (2) (1) (0)
__/ | \ / \ |
/ | \ | | |
(2) (1) (0) (1) (0) (0)
/ \ | |
(1) (0) (0) (0)
|
(0)
The number of calls is given by: T(0) = 1 and
n-1
T(n) = 1 + Sum T(j)
j=0
which has the solution T(n) = 2^n (as Exercise
15.1-1 asks you to prove).
Using dynamic programming for optimal 15.1.5
rod cutting
For Step 3: we solve each subproblem once,
saving the result in a table, looking it up
when we need it in the future. The memory for
the table is an example of a time-space trade-
off. A dynamic programming solution runs in
polynomial time if the number of distinct
subproblems is polynomial in n and each
subproblem is solvable in polynomial time.
Dynamic programming is implemented in 2 ways:
(1) top-down with memoization, in which the
the procedure first sees if a subproblem has
been solved: if so, it is looked up from the
table; if not, it is computed and put in the
table. Such a procedure has been "memoized".
(2) bottom-up, which solves the smallest
subproblems first, working up to the larger
subproblems. Again, each subproblem is only
solved once.
They have the same asymptotic running time,
except in rare cases where the top-down does
not recurse to all subcases. Otherwise the
bottom-up approach has better constant factors
due to less overhead for recursive calls.
MEMOIZED-CUT-ROD(p,n) 15.1.6
1 let r[0..n] be a new array
2 for i = 0 to n
3 r[i] = -infinity
4 return MEMOIZED-CUT-ROD_AUX(p,n,r)
MEMOIZED-CUT-ROD_AUX(p,n,r)
1 if r[n] >= 0
2 return r[n]
3 if n == 0
4 q = 0
5 else q = -infinity
6 for i = 1 to n
7 q = max(q, p[i] +
MEMOIZED-CUT-ROD_AUX(p,n-i,r))
8 r[n] = q
9 return q
This is just the memoized version of CUT_ROD.
The bottom-up version is even simpler:
BOTTOM-UP-CUT-ROD(p,n)
1 let r[0..n] be a new array
2 r[0] = 0
3 for j = 1 to n
4 q = -infinity
5 for i = 1 to j
6 q = max( q, p[i] + r[j-i] )
7 r[j] = q
8 return r[n]
Due to the nested for-loops, this runs in
Theta(n^2) time. It is harder to see, but
MEMOIZED-CUT-ROD also runs in Theta(n^2) time.
Subproblem graphs 15.1.7
We should understand the set of subproblems
and how they depend on one another. This is
embodied in the subproblem graph. Figure 15.4
(page 367)shows the subproblem graph for the
rod-cutting problem when n = 4:
_(4)
///|
/// V
/ ||(3)
| |\ |\\
| | \V \\
| | (2)||
| \ /| ||
| X | /|
| / \V/ |
\ \ (1) |
\ \ | /
\_\V/
(0)
There is an edge from x to y if finding an
optimal solution to x depends on finding one
for y. The subproblem graph is a "reduced" or
"collapsed" version of the top-down recursion
tree: so Figure 15.4 is the reduded version of
Figure 15.3. In the bottom-up method, we go
back up the graph in a "reverse topological
sort"; the top-down method corresponds to DFS.
The time to solve a subproblem is proportional
to the number of edges (= the degree) going out
from it, so the total solving time is O(E), and
each subproblem/vertex must be solved, so the
total running time is usually Theta(V + E).
Reconstructing a solution (Step 4) 15.1.8
The bottom-up method reports the value of the
optimal solution, but not the choices made; we
extend it to record optimal size s_j of rod j:
EXTENDED-BOTTOM-UP-CUT-ROD(p,n)
1 let r[0..n] and s[0..n] be new arrays
2 r[0] = 0
3 for j = 1 to n
4 q = -infinity
5 for i = 1 to j
6 if q < p[i] + r[j-i]
7 q = p[i] + r[j-i]
8 s[j] = i
9 r[j] = q
10 return r and s
If we call it with n = 10, it returns arrays:
i | 0 1 2 3 4 5 6 7 8 9 10
-----|----------------------------------------
r[i] | 0 1 5 8 10 13 17 18 22 25 30
s[i] | 0 1 2 3 2 2 6 1 2 3 10
The following method prints a list of optimal
piece sizes for a rod of length n.
PRINT-CUT-ROD-SOLUTION(p,n) 15.1.9
1 (r,s) = EXTENDED-BOTTOM-UP-CUT-ROD(p,n)
2 while n > 0
3 print s[n]
4 n = n - s[n]
If n = 10, it would just print 10, but if
n = 7, it would print 1 and 6, corresponding
to the optimal decomposition given on page
15.1.2.
Matrix-chain multiplication 15.2.1
Suppose is a chain of
matrices to be multiplied to get A_1A_2...A_n.
Due to associativity of matrix multiplication,
we can compute this product in several ways -
we indicate the order in which to perform the
multiplications by fully parenthesizing it.
There are 5 ways to parenthesize if n = 4:
(A_1(A_2(A_3A_4)))
(A_1((A_2A_3)A_4))
((A_1A_2)(A_3A_4))
((A_1(A_2A_3))A_4)
(((A_1A_2)A_3)A_4)
(A_1(A_2(A_3A_4)))
The parenthesization can have a big impact
on the cost of the calculation. Here is the
standard way to multiply two matrices:
MATRIX-MULTIPLY(A,B)
1 If A.columns not = B.rows
2 error "incompatible dimensions"
3 else let C be a new A.row x B.columns matrix
4 for i = 1 to A.rows
5 for j = 1 to B.columns
6 c_ij = 0
7 for k = 1 to A.columns
8 c_ij = c_ij + a_ik*c_kj
9 return C
The number of columns of A must = the number
of rows of B; if A is a pxq matrix and B is a
qxr matrix, C is a pxr matrix. The main cost
is the "*" in line 8, which is done pqr times.
For example, consider the chain
with dimensions 10x100, 100x5, and 5x50. If
we parenthesize as ((A_1A_2)A_3), the cost is
10*100*5 = 5000 to compute A_1A_2 and
10*5*50 = 2500 to compute its product with A_3
for a total of 7500 scalar multiplications.
If we parenthesize as (A_1(A_2A_3)), the cost
is 100*5*50 = 25,000 to compute A_2A_3 and
10*100*5 = 50,000 to compute its product with
A_1, for a total of 75,000 scalar multiplies.
The matrix-chain multiplication 15.2.2
problem: given a chain of n
matrices where A_i has size p_(i-1) by p_i,
fully parenthesize A_1A_2...A_n to minimize
scalar multiplications. Note that the cost to
find this parenthesization will be much less
than the cost to multiply the matrices.
Counting the number of parenthesizations
Iterating through all parenthesizations is
not efficient. Let P(n) be the number of
parenthesizations of n matrices. Then P(1) is
1; if k>1 the number of ways to parenthesize
splitting between the k-th and (k+1)-st matrix
is P(k)*P(n-k), and since we can split at any
k = 1, 2, ..., n-1, the recurrence for P is
/ n-1
P(n) = < Sum P(k)P(n-k) if n > 1
\ k=1
P(n) = C(n-1), where C(n) = B(2n,n)/(n+1) ( =
Theta(4^n/n^1.5) ) is the n-th Catalan number
& B(2n,n) is the central binomial coefficient.
Step 1. Characterize the structure of 15.2.3
an optimal solution.
Let A_i..j denote the product A_i*...*A_j.
Then if A_1..n = (A_1..k)(A_k+1..n), is an
optimal parenthesization, A_1..k and A_k+1..n
are also optimally parenthesized, the first
hallmark of applicability of dynamic
programming.
Step 2. Recursively define the value of an
optimal solution.
Let m[i,j] = minimum number of scalar
multiplications to compute A_i..j, then:
/ 0 if i = j
m[i,j] = < min{m[i,k]+m[k+1,j]+p_i-1*p_k*p_j}
\i<=k, where
p.length = n+1. Another table s[i,j] stores
the index k to split A_i..j to get least cost.
MATRIX-CHAIN-ORDER(p) 15.2.4
1 n = p.length - 1
2 let m[1..n,1..n] & s[1..n-1,2..n] be tables
3 for i = 1 to n
4 m[i,i] = 0
5 for l = 2 to n // l = length of chain
6 for i = 1 to n - l + 1
7 j = i + l - 1
8 m[i,j] = infinity
9 for k = i to j - 1
10 q = m[i,k] + m[k+1,j] + p_(i-1)p_kp_j
11 if q < m[i,j]
12 m[i,j] = q
13 s[i,j] = k // k = best split yet
14 return m and s
The minimum cost is m[1,n]. Figure 15.5 shows
an example when n = 6. Since we only use half
of each table, they are rotated 45 degrees
counter-clockwise. The outer loop of the
algorithm fills entries one line at a time
from the bottom (previously the main diagonal)
to the top vertices m[1,n] and s[1,n] (which
tells where to make the first split).
The nested loop structure gives a running
time of O(n^3) since each loop is executed at
most n times. A careful count shows that the
number of times the inner loop is executed is
(1/6)n^3 - n/6, so the running time is
actually Theta(n^3).
Step 4. Constructing an optimal 15.2.5
solution
Each entry s[i,j] tells where to split
A_i..j to obtain the minimal cost. So s[1,n]
tells where to make the first split, and then
recursively s[1,s[1,n]] tells where to split
the left half and s[s[1,n]+1,n] tells where to
split the right half, etc. The following
algorithm prints the optimal parenthesization
with initial call PRINT-OPTIMAL-PARENS(s,1,n).
PRINT-OPTIMAL-PARENS(s,i,j)
1 if i == j
2 print "A"_i
3 else print "("
4 PRINT-OPTIMAL-PARENS(s, i, s[i,j])
5 PRINT-OPTIMAL-PARENS(s, s[i,j]+1, j)
6 print ")"
In the example of Figure 15.5, the call
PRINT-OPTIMAL-PARENS(s,1,6) prints out the
parenthesization ((A_1(A_2A_3))((A_4A_5)A_6))
Elements of Dynamic Programming 15.3.1
What is necessary in order to apply dynamic
programming? Answer:
(1) optimal substructure, and
(2) overlapping subproblems
We will also look at the memoization method.
Optimal substructure
Definition: A problem has optimal substructure
if an optimal solution contains optimal
solutions to subproblems.
This is one indication that a problem might
have a dynamic programming solution, though it
might also have a greedy algorithm solution.
We have seen optimal substructure in all the
problems solved by dynamic programming so far.
Here is a pattern to find optimal substructure
1. Show that the solution consists of making a
choice: where to cut for a rod, a "splitting
index" for a matrix-chain, or an intermediate
vertex in a shortest path.
2. Assume that you are given the choice that
leads to an optimal solution.
3. Given this choice, determine the ensuing
subproblems and how to characterize the
space of subproblems.
4. Show that the solutions to sub- 15.3.2
problems within an optimal solution are also
optimal by using a "cut-and-paste" argument:
assume that a subsolution is non-optimal, by
replacing it with an optimal solution, one
would reduce (or increase) the value of the
of the whole solution, giving a better value
so that the original value was not optimal
after all - a contradiction.
To characterize the space of subproblems, a
we should make it as simple as possible. In
the rod-cutting case, we only need to consider
cutting a rod of length i for each size i; in
the matrix-chain case, the subproblems are of
the form A_i..j, where we allow both i and j
to vary (giving a 2-dimensional space).
Optimal substructure varies in two ways:
(1) how many subproblems are used in an
optimal solution, and
(2) how many choices we have in determining
which subproblems to use in a solution.
The rod-cutting problem uses one subproblem of
size n-i but we must consider n choices for i.
In the matrix-chain case to solve A_i..j,
there are two subproblems A_i..k, and
A_(k+1)..j, and j - i ways of picking k.
So the cost of a dynamic programming 15.3.3
algorithm is the product of the number of
subproblems times the number of choices for
each one. In the rod-cutting case, there
were Theta(n) subproblems and n choices for
each one for Theta(n^2) total cost. For the
matrix-chain case, there were Theta(n^2) sub-
problems and at most n-1 choices, for O(n^3)
total cost - actually the cost is Theta(n^3).
Dynamic programming produces a bottom-up
solution: first find optimal solutions to sub-
problems, then use them to make choices to
find an optimal solution to the whole problem.
So the cost is the cost of the subproblems
plus the cost of making the choice. For rod-
cutting, we first found the cost of cutting
rods of length 0, 1, ..., n-1 and then chose
the one giving an optimal solution for a rod
of length n; the choice cost is p_i (Equation
(15.2)). In the matrix-chain case, the choice
cost was the term p_(i-1)*p_k*p_j.
Greedy algorithms (Chapter 16) have some
similarities to dynamic programming algorithms
- in particular they both have the optimal
substructure property. The difference is
that greedy algorithms work in a top-down way,
making the best (greedy) choice at the time
_before_ knowing the solutions to the sub-
problems (after making the choice they then
solve the subproblems).
Subtleties 15.3.4
We must use care in identifying optimal
substructure. Consider the following two
problems on a directed graph G = (V,E) and
vertices u and v:
Unweighted shortest path. Find a path from u
to v with the fewest edges (which must be
simple, otherwise we could remove a cycle to
get a shorter path).
Unweighted longest simple path. Find a simple
path from u to v with the most edges (we need
to exclude cycles, otherwise we could go
around them many times to get an arbitrarily
high edge count).
The unweighted shortest path problem has the
optimal substructure property by the usual
argument: if a subpath of an optimal path was
not optimal, it could be replaced by a shorter
subpath, giving a shorter total path than the
original "optimal" path.
However the unweighted longest simple path
problem does not have the optimal substructure
property, as shown by Figure 15.6: (q)(r)
Now q-->r-->t is a longest path from ^ ^
q to t, but neither of its subpaths | |
q-->r or r-->t is a longest path v v
between their endpoints. (s)(t)
There is no known good dynamic 15.3.5
programming solution to this problem - in fact
it is NP-complete, which means it probably
can't be solved in polynomial time.
The distinction between these two problems is
that subproblems of the unweighted shortest
path problem are independent - finding a
shortest path from q to r does not affect the
finding of a shortest path from r to t (if
they did share a vertex, we would have a cycle
which we have seen can't happen for a shortest
path). On the other hand if we have a longest
path from q to r, it would include all of the
vertices and there would be none left to use
for a longest path from r to t, so finding a
first longest subpath _does_ effect finding a
second subpath.
In the matrix-chain case, multiplying A_i..k
and multiplying A_(k+1)..j are independent.
In the rod-cutting case, we determine the best
way to make the first cut; "sub-cutting" those
two pieces is independent.
Overlapping subproblems 15.3.6
The second ingredient of a problem amenable
to dynamic programming is that it have
"overlapping subproblems" - the natural
recursive algorithm would have to solve the
same problem many times, as we saw in the
rod-cutting case, which uses 2^n computations
to cut a rod of length n. In contrast, the
dynamic programming solution is Theta(n^2).
For dynamic programming to be effective, the
space of subproblems is usually polynomial in
the input size. A dynamic programming
algorithm takes advantage of overlapping
subproblems by solving them once and storing
the result in a table, after which the result
can simply be looked up in constant time.
In the matrix-chain case, the smaller sub-
problems are looked up many times. Figure
15.7 shows the case of four matrices, in which
there are 10 subproblems and there would be 25
if we didn't use overlaps. Consider:
RECURSIVE-MATRIX-CHAIN(p,i,j)
1 if i == j then
2 return 0
3 m[i,j] = infinity
4 for k = i to j-1
5 q = RECURSIVE-MATRIX-CHAIN(p,i,k)
+ RECURSIVE-MATRIX-CHAIN(p,k+1,j)
+ p_(i-1)*p_k*p_j
6 if q < m[i,j]
7 m[i,j] = q
8 return m[i,j]
We show RECURSIVE-MATRIX-CHAIN needs 15.3.7
Omega(2^n) time to compute m[1,n]. Letting
T(n) be the time to compute m[1,n], we have:
T(1) >= 1
n-1
T(n) >= 1 + Sum(T(k) + T(n-k) + 1) for n > 1
k=1
Note that each T(i) occurs twice in the sum,
once as T(i) and once as T(n - (n-i)), and 1
appears n times altogether, so we have:
n-1
T(n) >= n + 2*Sum T(i)
i=1
Now we prove T(n) >= 2^(n-1) by induction.
Certainly T(1) = 1 = 2^(1-1), so for n > 1
n-1
T(n) >= n + 2*Sum 2^(i-1)
i=1
n-2
= n + 2*Sum 2^i
i=0
= n + 2(2^(n-1) - 1) = n + 2^n - 2
>= 2^(n-1)
Thus T(n) = Omega(2^n) 15.3.8
The bottom-up dynamic programming solution is
more efficient because it takes advantage of
single solutions to the Theta(n^2) different
overlapping subproblems. The recursive
algorithm repeatedly solves the same problem
each time it occurs in the recursion tree. So
whenever the same subproblem occurs repeatedly
in the recursion tree and the total number of
different subproblems is small, there may be
a dynamic programming solution.
Reconstructing an optimal solution
It may be possible to obtain the optimal
choices (Step 4) from the optimal costs in
Step 3, but it usually we store the choice we
made in a table as we go along. In the chain-
matrix case, using the table s[i,j] tells us
how to choose in Theta(1) time, whereas if we
didn't have s[i,j] we would have to examine
the j-i possibilities for parenthesizing
A_iA_(i+1)...A_j, which is Theta(j-i).
Memoization 15.3.9
It is possible to make the natural recursive
solution to a dynamic programming problem as
efficient as the dynamic programming solution
by memoizing it. The idea is to maintain a
table as usual and to compute its entries the
first time a subproblem is encountered, but to
just look up the result in subsequent times.
Here is the memoized RECURSIVE-MATRIX-CHAIN():
MEMOIZED-MATRIX-CHAIN(p)
1 n = p.length - 1
2 let m[1..n,1..n] be a new table
3 for i = 1 to n
4 for j = i to n
5 m[i,j] = infinity
6 return LOOKUP-CHAIN(m,p,1,n)
LOOKUP-CHAIN(m,p,i,j)
1 if m[i,j] < infinity
2 return m[i,j]
3 if i == j
4 m[i,j] = 0
5 else for k = i to j-1
6 q = LOOKUP-CHAIN(m,p,i,k)
+ LOOKUP-CHAIN(m,p,k+1,j)
+ p_(i-1)*p_k*p_j
7 if q < m[i,j]
8 m[i,j] = q
9 return m[i,j]
15.3.10
Figure 15.7 shows how MEMOIZED-MATRIX-CHAIN
saves time compared to RECURSIVE-MATRIX-CHAIN.
Shaded subtrees are values that are looked up
rather than computed.
There are two kinds of calls to LOOKUP-CHAIN:
1. if m[i,j] = infinity, lines 3-9 are done
2. if m[i,j] < infinity, line 2 returns m[i,j]
There are Theta(n^2) calls of the first type,
one per table entry. And for each entry in
the table there are O(n) lookups from line 2,
for a total of O(n^3) time (really Theta(n^3))
which is the same asymptotically as the
dynamic programming solution. So memoization
converts an Omega(2^n) recursive algorithm to
a O(n^3) algorithm.
In general, if all the subproblems must be
solved at least once, a dynamic programming
solution beats a memoized recursive solution
by a constant factor since it is simpler and
doesn't have the overhead of recursive calls.
And there are some problems for which the
time or space requirements of the dynamic
programming solution can be further reduced.
On the other hand if not all subproblems
need be computed, the memoized algorithm
saves time by not computing them.