In practice, value iteration is much faster per iteration, but policy
iteration takes fewer iterations. Arguments have been put forth to
the effect that each approach is better for large problems.
Puterman's modified policy iteration algorithm [91]
provides a method for trading iteration time for iteration improvement
in a smoother way. The basic idea is that the expensive part of
policy iteration is solving for the exact value of . Instead
of finding an exact value for , we can perform a few steps of a
modified value-iteration step where the policy is held fixed over
successive iterations. This can be shown to produce an approximation
to that converges linearly in . In practice, this can
result in substantial speedups.

Several standard numerical-analysis techniques that speed the
convergence of dynamic programming can be used to accelerate value and
policy iteration.
Multigrid methods can be used to quickly seed a good initial
approximation to a high resolution value function by initially
performing value iteration at a coarser resolution [93].
State aggregation works by collapsing groups of states to a
single meta-state solving the abstracted problem [15].