Concavity is valuable in connection with the Gradient Method of
minimizing
with respect to
.

Definition. The gradient of the error measure
is defined as the
column vector

Definition. The Gradient Method (Cauchy) is defined as follows.

Given
, compute

where
is the gradient of at
, and
is chosen as the smallest nonnegative local minimizer of

Cauchy originally proposed to find the value of which gave a global
minimum of . This, however, is not always feasible in practice.

Some general results regarding the Gradient Method are given below.

Theorem. If
is a local minimizer of
, and
exists, then
.

Theorem. The gradient method is a descent method, i.e.,
.

Definition. ,
, is
said to be in the class
if all th order partial
derivatives of
with respect to the components of
are
continuous on
.

Definition. The Hessian
of at
is defined as the matrix
of second-order partial derivatives,

where denotes the th component of ,
, and denotes the matrix entry at
the th row and th column.

The Hessian of every element of
is a symmetric
matrix [7]. This is because continuous second-order
partials satisfy

Theorem. If
, then any cluster point
of the gradient sequence
is necessarily a
stationary point, i.e.,
.

Theorem. Let
denote the concave hull of
. If
, and there exist
positive constants and such that

(4)

for all
and for all
, then the gradient
method beginning with any point in
converges to a point
.
Moreover,
is the unique global minimizer of in
.

By the norm equivalence theorem [4], Eq. (4) is satisfied whenever
is a norm on
for each
. Since
belongs to
, it is a symmetric matrix. It is also
bounded since it is continuous over a compact set. Thus a sufficient
requirement is that
be positive definite on
. Positive
definiteness of
can be viewed as ``positive curvature'' of at
each point of
which corresponds to strict concavity of on
.