Now consider the use of a bound such as the one proposed above. We
have an analytically specified function, f, which we are trying to
optimize starting from an initial point x* =
x1. Figure 6.1 depicts the bound and the function to be
optimized. Note the contact point where the function meets the bound
is at x*.

Figure 6.1:
A Parabolic Bound
on an Arbitrary Function

We can now find the maximum of that bound and move the contact point
x* to this new locus on the function. This point is now x2 and
is used to find another parabola which lies below the function f and
touches it at our current operating point, x2. As this process is
iterated, we gradually converge to a maximum of the function. This
process is illustrated in Figure 6.2. At the function's
maximum, however, the only parabola we can draw which is a lower bound
is one whose peak is at the function's maximum (i.e. a 'fixed'
point). The algorithm basically stops there. Thus, there is a natural
notion of a fixed point approach. Such an algorithm will always
converge to a local maximum and stay there once locked on.

Figure 6.2:
Bound Maximization Convergence

In the example, the algorithm skipped out of a local maximum on its
way to its destination. This is not due to the fact that the local
maximum is lower but due to the some peculiarity of the bounding
approach and its robustness to very local attractor. The bound is
shaped by the overall structure function which has a wider and higher
local maximum at xfinal. We will discuss this trade-off between
the amplitude of a local maximum and its robustness (width of the
basin) later on and their relationship to annealing and global
optimization.

Note how these bounding techniques differ from gradient ascent which
is a first order Taylor approximation. A second order Taylor
approximation is reminiscent of Newton and Hessian methods which
contain higher order derivative data. However, both gradient ascent
and higher order approximations to the functions are not lower
bounds and hence using these to maximize the function is not always
guaranteed to converge. Basically gradient ascent is a degenerate case
of the parabolic bound approach where the width of the parabola is set
to infinity (i.e. a straight line) and the step size chosen in an ad
hoc manner by the user (i.e. infinitesimal) instead of in a more
principled way. A parabolic bound's peak can be related to its width
and if this value can be properly estimated, it will never fail to
converge. However, selection of an arbitrary step size in gradient
ascent can not be guaranteed to converge. Observe
Figure 6.3 which demonstrates how an invalid step size
can lead gradient approaches astray. In addition
Figure 6.4 demonstrates how higher order methods can also
diverge away from maxima due to adhoc step size. The parabola
estimated using Taylor approximations is neither an upper nor a lower
bound (rather it is a good approximation which is a different quality
altogether). Typically, these techniques are only locally valid and
may become invalid for significantly large step sizes. Picking the
maximum of this parabola is not provably correct and again an ad hoc
step size constraint is needed.

Figure 6.3:
Gradient Ascent

Figure 6.4:
2nd Order Approximations

However, gradient and Taylor approximation approaches have remained
popular because of their remarkable ease of use and general
applicability. Bound techniques (such as the Expectation Maximization
or EM algorithm [15]) have not been as widely
applicable. This is because almost any analytic function can be
differentiated to obtain a gradient or higher order
approximation. However, we shall show some important properties of
bounds which should illustrate their wide applicability and
usefulness. In addition, some examples of nonlinear optimization and
conditional density estimations will be given as demonstrations.