Trinity College Surface Evolver Workshop

Surface Evolver Workshop Day 3 - Convergence and Hessian

Introduction

Whenever one uses an iterative method, such as the Evolver's 'g'
command, there is the question of how close one is to the optimal
solution and when it is reasonable to stop. Today's workshop
is devoted to such questions and related matters. First, we look
at gradient descent and a modified version called conjugate
gradient, which are first-order methods, in the sense that they
only require the evaluation of first derivatives of the energy.
Then we look at second-order methods involving the Hessian, which
you may recall is the matrix of all second partial derivatives.
Hessian methods can basically do a second-derivative test for
a minimum to test convergence.

Gradient descent

Recall that the energy of a surface is a scalar function of all
the vertex coordinates.
Gradient descent iteration seeks the minimum energy configuration
by calculating the gradient of the energy function and then moving
each vertex in the downhill (negative gradient) direction by some
multiple the gradient. The multiplier factor is called the scale,
and is printed with each 'g' iteration.

Optimizing scale. The default mode of gradient descent is to
choose the scale that minimizes the energy in that direction of motion.
This involves a line search along the line of motion for the minimum
energy. Evolver does the search by evaluating the energy for several scales,
raising or lowering the scale until it has three scales for which the
middle energy is lower than for the outer two. Then it uses quadratic
intepolation to get the final scale. The steps of one iteration are
listed in detail
here.

Scale limit. It is possible for the aforementioned line search
to go too far in certain circumstances, so there is an upper bound set
for the scale, the default limit being 1.0. This can be changed by
using the "scale_limit value" phrase in the top of the datafile,
or by setting the scale_limit variable at run time, or in response to
the prompt produced by the "m" command.

Reasonable scales.
Trouble in evolving is usually signaled by a small scale, which means
there is some obstacle to evolution. Of course, that means you have to
know what a reasonable scale is, and that depends on the type of energy
you are using and how refined your surface si. In normal evolution, the
size of the scale is set by
the development of small-scale roughness in the surface. Combined with
a little dimensional analysis, that leads to the conclusion that the
scale should vary as L2-q, where L is the typical edge
length and
the units of energy are lengthq. The dimensional analysis goes
like this: Let D be the perturbation of one vertex away from an equilibrium
surface. In general, energy is quadratic around an equibrium, so

E = D2Lq-2

So the gradient of energy at the vertex is

grad E = 2 D Lq-2

The motion is the scale times the gradient, which we want proportional to D,
so

scale * grad E = scale * 2 D Lq-2 = D

So scale is on the order of L 2-q. Some examples:

Dimensional Dependence of Scale

Energy

Energy dimension

Scale

Example file

Area of soapfilm

L2

L0

quad.fe

Length of string

L1

L1

flower.fe

Squared curvature of string

L-1

L3

elastic8.fe

Squared mean curvature of soapfilm

L0

L2

sqcube.fe

In particular, the scale for area evolution is independent of refinement,
but for most other energies the scale decreases with refinement.

Another common influence on the scale for area evolution is the surface
tension. Doing a liquid solder simulation in a system of units where
the surface tension of facets is assigned a value 470, say, means that
all calculated gradients are multiplied by 470, so the scale decreases
by a factor of 470 to get the same geometric motion. Thus you should
set scale_limit to be the inverse of the surface tension.

Fixed scale.
It can be useful to iterate with a fixed scale. For example, if
you make a change to the volume of a body and want that adjustment
to take effect without the complication of simultaneously trying
to minimize energy, then iterate once with a zero scale. For example,
if you run cube.fe and do

body[1].target := 2
m 0
g
m

you will see pure volume adjustment. The 'm' command toggles back
and forth between optimizing and fixed scale, except when you follow
it by a number it sets a fixed scale.

Another use for fixed scale is simulating grain growth. Here the motion
of grain boundaries is defined to have a velocity proportional to their mean
curvature, so you want
to keep the scale small enough so the linear approximation to the motion
is reasonably good.

Convergence. It is impossible to tell, in general, when gradient
descent is close to convergence. Surfaces can be arbitrarily far from
the minimum, and move toward it arbitrarily slowly. As an example,
run the capillary3.fe datafile. This is a slightly pinched tube with
a soapfilm across it. The minimum energy comes when the film is exactly
at the middle of the neck. Try this evolution:

r
r
r
g 100

Not at the neck yet. Try more g steps. See how many it takes you to
converge to the minimum energy of 3.13262861328124 to, say, 8 decimal places.

Another problem can be saddle points. If you start with a symmetric surface,
then under 'g' iteration the surface should stay symmetric. If the surface
has a symmetric saddle point, then the iteration could approach it and
look like it was converging to a minimum.

Conjugate gradient

Conjugate gradient is a method of accelerating gradient descent.
In ordinary gradient descent, one uses the gradient of energy to
find the steepest downhill direction, then moves along that line
to the minimum energy in that direction. Hence successive steps
are at right angles. However, this can be very inefficient, as
you can spend a lot of time zigzagging across an energy "valley"
without making much progress "downstream". With conjugate gradient,
the search direction is chosen to be in a "conjugate" direction to
the previous direction.
For a mathematical explanation, see any
decent book in numerical analysis,
or J. R. Shewchuk's on-line paper
A Painless Introduction to Conjugate Gradient.

In practice, the conjugate gradient method remembers a cumulative
"history vector", which it combines with the ordinary gradient to
figure out the conjugate gradient direction. The upshot is that
conjugate gradient can converge much faster than ordinary gradient
descent. In the ideal case of a quadratic energy function
in N variables and perfect numerical accuracy, conjugate gradient will
reach the exact minimum within N steps.

Conjugate gradient can be toggled with the U
command, or with the conj_grad toggle.
It should always be used with optimizing scale.

To see the dramatic improvement conjugate gradient can produce, run
capillary3.fe again, with this evolution:

r
r
r
g 10
U
g 100

Conjugate gradient can get in trouble, particularly when you use it too
early when the surface is making large adjustments. To see an example,
run capillary3.fe with this evolution:

r
r
r
U
g 100

and take a close look at the film in the center.

In sum, one should run a few steps of ordinary gradient descent at
the start of evolving a surface or after making big changes, but
the bulk of iteration should be done in conjugate gradient mode.

Notes: The conjugate gradient method is designed for quadratic energy
functions. As long as the energy function is nearly quadratic, as it
should be near an energy minimum, conjugate gradient works well.
Otherwise, it may misbehave, either by taking too big steps or by
getting stalled. Both effects are due to the history vector being
misleading. To prevent too big steps, one should iterate without
conjugate gradient for a few steps whenever significant changes are
made to the surface (refining, changing a constraint, etc.).
On the other hand, if it looks like conjugate gradient is converging,
it may have simply become confused by its own history. See the
catenoid example for a case in point.
A danger signal is the scale factor going to zero. If you are suspicious,
toggle conjugate gradient off and on ("U 2" does nicely)
to erase the history vector and start over.

Hessian

Please work through the Hessian tutorial
from the online documentation. There are a few references to things
we haven't covered yet, such as "quadratic mode", which you can skip.

Exercise - Double Bubble Pipe

Blowing a bubble with a normal round-bowled bubble pipe produces a
spherical bubble which is stable at any volume. But suppose you
had a bubble pipe with two round bowls, blown simultaneously with a
connected air supply? Define a body whose faces are two round unit
disks on the z = 0 plane. (For those of you too lazy to create your
own datafile, use doublepipe.fe.)
Increase the body volume gradually, watching
the eigenvalues with the ritz command. Find the critical volume.
Evolve a little beyond the critical volume, and use saddle to find out what
happens (don't forget to evolve after doing saddle to find the
optimal surface). Do the same with a three-bowled pipe.
HomeHelpDay 2Day 4