Complex step Hessian

Hessian is the second order derivative of a scalar function. It is relatively difficult using finite difference to get accurate Hessian due to the approximation error if the differentiation step is too large and cancellation between two large values if the differentiation step is too small. Taking the advantage of complex step differentiation, this code is able to provide the Hessian of a scalar function much more accurate than other finite difference approaches.

As in the case of Fortran, one must redefine functions such as abs(), max() and min. All differentiable functions are defined for complex variables. The correct definitions for these functions can be found in the complexify.f90 module.

The standard transpose operation represented by an apostrophe (') poses a problem as it takes the complex conjugate of the elements of the matrix, so one should use the non-conjugate transpose represented by ``dot apostrophe'' (.') instead.

I assume with these changes (which may be difficult to implement as pointed out by Arvind Keerthi), complex step will work for almost any Hessian.

Scott

Comment only

11 Oct 2008

Scott Miller

Pardon me, my mistake: "if" statements are not implemented in Martins, but ABS, <, and similar functions with discontinuous derivatives are.

Scott

Comment only

11 Oct 2008

Scott Miller

The author I was referring to is J.R.R.A.Martins, who in 1999 implemented a Fortran90 module which implements, among other things, complex step differentiation with "if" statements and a variety of other functions.
A link to this module is currently found at:

http://mdolab.utias.utoronto.ca/resources/complex-step/fortran

Of course this would need some translation, but it would be useful to see this implemented in MATLAB.

Scott

Comment only

28 Apr 2008

Arvind Keerthi

The idea of complex step derivative is quite elegant, and Yi Cao has done a very nice job of encoding it.

But buyer beware: This code may NOT be applicable to your particular cost function. For example: If the cost function is defined thus:

function outVal = testFun(x)

b = [3 7 5]';

A = [ 1 2 3;
4 5 6;
17 8 9];

outVal = b'*x + 0.5*x'*A*x;

then it is obvious that the Hessian should always equal the matrix A. Yet, the hessiancsd will not compute the Hessian properly. If instead you made a tiny change to the cost function, and re-wrote the last line as follows:

outVal = b.'*x + 0.5*x.'*A*x;

then hessiancsd will start working correctly.

If you think about why hessiancsd did not work in the first instance, but does in the second, the conclusion is this: Your cost function should be such that a change in the imaginary component of one variable must not be equivalent to a change in another variable.

In practise this is difficult to achieve. For example, one may already have written a difficult-to-evaluate cost function that treats complex input variables in a certain way that makes it totally incompatible for application by hessiancsd.

02 Jan 2008

Scott Miller

Well commented, flexible and clear.

Having struggled with the same issues as the author, my concerns are mathematical, rather than programmatical. The complex step derivative is indeed often much more accurate, but takes more time to evaluate both because the functions are complex and because of the difficulty of vectorizing this type of algorithm; the author has used symmetry to gain some speed. As a result, the user may still want to consider a conventional, less accurate Hessian to improve the speed, at least during the earlier stages of optimization.

The second point is that it is not possible to perform 2 complex step derivatives to get Hessians, as the 'i' will interact between the two steps. The author has accounted for that, and only the gradient is computed with a complex step. The Hessian is computed from the gradient using a conventional real finite step, and so suffers from an unavoidable loss in accuracy.

It would, I think, be possible to make a Hessian using two "complex" steps using a quaternion class, which would be highly accurate in both derivative terms, but would probably be even slower. It would be interesting to see such an implementation, and if possible more vectorizing.

Finally, although the author warns of using conditional functions such as 'if', etc., there is an implementation of complex derivatives which includes this class, the author of which escapes me. It would require some work to implement, however.