11 Optimization

An extreme point of a function is a point x
which either maximizes or minimizes the value of a function.

This topic is broken into two major sub-problems:

Finding an extreme point of a real-valued function of a single variable, and

Finding an extreme point of a real-valued function of a many variables.

We will focus on finding local minima. If you want to adjust any of these methods
for finding local maxima, recall that a local maxima of f(x) is a local minima of -f(x).

There are three techniques which we will cover which may be used to
find the local minima of a univariate (single variable) function:

Golden-mean search

Newton's method

Quadratic optimization

Finding the local minima of a function of more than one variable may be done
by using gradient descent. This technique converts the multi-variable problem
into a univariate problem which we may solve using one of the above three techiques.

Each of these processes listed above are deterministic: given the same problem and the same initial conditions, they will always compute the same steps and converge to the same optimum. While such predictability is desirable in many engineering applications, it also carries a downside: given a problem with multiple local optima and only one global optimum, the methods will always converge to the optimum nearest to its initial conditions regardless of whether it is a local or global optimum.

In many engineering applications, the best system performance can only be achieved by discovering the global optimum. This requires methods that can diverge away from nearby local optima and explore the function space to converge on the global optimum. A deterministic method cannot do that: a method designed to converge will always converge on the first optimum it encounters, and a method designed to diverge from local optima will diverge from all optima.

The alternative to a deterministic method is a stochastic method: an algorithm that includes an element of randomness. This random element is what will allow the method to have two different behaviours, by giving it a chance to escape local optima but not the global optimum. This can be implemented in practice in a number of ways, such as for example by including a random variable in an equation or by having a decision step that includes an element of chance. It should be noted that a random element is not necessarily one whose value is a result of complete and unbiased chance, such as a lottery drawing. Rather, it is simply a term whose result is not known for sure in advance. It is completely acceptable to skew the probabilities towards a preferred outcome, or to change the probabilities as the algorithm progresses to reduce the impact of randomness over time. A flip of a coin with a 99-1 probability of landing on heads is still a stochastic event: even though one outcome is more likely than the other, it is still a result of chance.

Using randomness in our optimization algorithms eliminates some of the certainty we had with deterministic algorithms. As we've established, a stochastic algorithm is no longer guaranteed to converge on the nearest local optimum, which can be a desirable feature. However, this should not be mistaken for a certainty to converge on the global optimum; stochastic algorithms can make no such guarantee. Stochastic algorithms have a chance of avoiding a nearby local optimum and converging on the global optimum, which is more than can be said for a deterministic algorithm under the same conditions, with chance no outcome is certain. Another important difference is that running the same deterministic algorithm twice using the same data will lead to the exact same result, whereas doing this with a stochastic algorithm can lead to two very different results. The reason for this is of course the inclusion of a random element in the algorithm, which can take very different values in its various runs.

Stochastic optimization algorithms are an intense area of on-going research. Dozens of algorithms already exist, and new algorithms, variations, and enhancements are being proposed every year. Popular algorithms include genetic algorithms, ant colony algorithms, particle swarm algorithms, and many others. A complete review of these methods is beyond the scope of this book. We will limit ourselves to two stochastic optimization methods, to give an overview of this class of optimization methods.

Observation

We are looking for local extreme points. It is a very
difficult to find global extreme points.