search Description
--------------------------------------------------------------
"interpolate" use linear and quadratic interpolation to
search for an optimal delta
"bracket" use a bracketed quadratic formula to search
for an optimal delta
"off" do not search for an optimal delta
--------------------------------------------------------------
The default is "interpolate" if not set.

Description

These functions compute derivatives of the real function f(p) at the real
parameter values p.

deriv_init() begins the definition of a problem and returns D, a
problem-description handle set that contains default values.

The deriv_init_*(D, ...) functions then allow you to modify those
defaults. You use these functions to describe your particular problem:
to set the identity of function f(), to set parameter values, and the
like.

deriv(D,todo) then computes derivatives depending upon the value of
todo.

deriv(D, 0) returns the function value without computing derivatives.

deriv(D, 1) returns the first derivatives, also known as the gradient
vector for scalar-valued functions (type d and v) or the Jacobian
matrix for vector-valued functions (type t).

deriv(D, 2) returns the matrix of second derivatives, also known as
the Hessian matrix; the gradient vector is also computed. This
syntax is not allowed for type t evaluators.

The deriv_result_*(D) functions can then be used to access other values
associated with the solution.

Usually you would stop there. In other cases, you could compute
derivatives at other parameter values:

deriv_init_params(D,p_alt)deriv(D,todo)

Aside:

The deriv_init_*(D, ...) functions have two modes of operation. Each has
an optional argument that you specify to set the value and that you omit
to query the value. For instance, the full syntax of deriv_init_params()
is

voidderiv_init_params(D,real rowvector parameters)

real rowvectorderiv_init_params(D)

The first syntax sets the parameter values and returns nothing. The
second syntax returns the previously set (or default, if not set)
parameter values.

Below we use the functions to compute f'(x) at x=0, where the function is

f(x) = exp(-x^2+x-3)

: void myeval(x, y)
> {
> y = exp(-x^2 + x - 3)
> }

: D = deriv_init()

: deriv_init_evaluator(D, &myeval())

: deriv_init_params(D, 0)

: dydx = deriv(D, 1)

: dydx
.0497870683

: exp(-3)
.0497870684

The derivative, given the above function, is f'(x) =
(-2*x+1)*exp(-x^2+x-3), so f'(0) = exp(-3).

Notation and formulas

We wrote the above in the way that mathematicians think, that is,
differentiate y=f(x). Statisticians, on the other hand, think
differentiate s=f(b). To avoid favoritism, we will write v=f(p) and
write the general problem with the following notation:

Differentiate v = f(p) with respect to p, where

v: a scalar

p: 1 xnp

The gradient vector is g = f'(p) = df/dp, where

g: 1 xnp

and the Hessian matrix is H = f''(p) = d^2f/dpdp', where

H: npxnp

deriv() can also work with vector-valued functions. Here is the notation
for vector-valued functions:

Differentiate v = f(p) with respect to p, where

v: 1 xnv, a vector

p: 1 xnp

The Jacobian matrix is J = f'(p) = df/dp, where

J: nvxnp

and where

J[i,j] = dv[i]/dp[j]

Second-order derivatives are not computed by deriv() when used with
vector-valued functions.

deriv() uses the following formula for computing the numerical derivative
of f() at p

~ f(p+d) - f(p-d)
f'(p) = ---------------
2*d

where we refer to d as the delta used for computing numerical
derivatives. To search for an optimal delta, we decompose d into two
parts.

d = h*scale

By default, h is a fixed value that depends on the parameter value.

h = (abs(p)+1e-3)*1e-3

deriv() searches for a value of scale that will result in an optimal
numerical derivative, that is, one where d is as small as possible
subject to the constraint that f(x+d) - f(x-d) will be calculated to at
least half the accuracy of a double-precision number. This is
accomplished by searching for scale such that f(x+d) and f(x-d) fall
between v0 and v1, where

For this problem, the elements of the gradient function are given by the
following formulas, and we see that deriv() computed values that are in
agreement with the analytical results (to the number of significant
digits being displayed).

In some statistical applications, you will find type v evaluators more
convenient to code than type d evaluators.

In statistical applications, one tends to think of a dataset of values
arranged in matrix X, the rows of which are observations. The function
h(p, X[i,.]) can be calculated for each row separately, and it is the sum
of those resulting values that forms the function f(p) from which we
would like to compute derivatives.

Type v evaluators are for such cases.

In a type d evaluator, you return scalar v=f(p).

In a type v evaluator, you return a column vector, v, such that
colsum(v)=f(p).

The code outline for type v evaluators is the same as those for d
evaluators. All that differs is that v, which is a real scalar in the d
case, is now a real colvector in the v case.

User-defined arguments

The type v evaluators arise in statistical applications and, in such
applications, there are data; that is, just knowing p is not sufficient
to calculate v, g, and H. Actually, that same problem can also arise
when coding type d evaluators.

You can pass extra arguments to evaluators. The first line of all
evaluators, regardless of type, is

voidevaluator(p,v)

If you code

deriv_init_argument(D,1,X)

the first line becomes

voidevaluator(p,X,v)

If you code

deriv_init_argument(D,1,X)deriv_init_argument(D,2,Y)

the first line becomes

voidevaluator(p,X,Y,v)

and so on, up to nine extra arguments. That is, you can specify extra
arguments to be passed to your function.

You believe that the data are the result of a beta distribution process
with fixed parameters alpha and beta, and you wish to compute the
gradient vector and Hessian matrix associated with the log likelihood at
some values of those parameters alpha and beta (a and b in what follows).
The formula for the density of the beta distribution is

1. Rather than calling the returned value v, we called it lnf. You
can name the arguments as you please.

2. We arranged for an extra argument to be passed by coding
deriv_init_argument(D,1,x). The extra argument is the vector
x, which we listed previously for you. In our function, we
received the argument as x, but we could have used a different
name just as we used lnf rather than v.

3. We set the evaluator type to "v".

Type t evaluators

Type t evaluators are for when you need to compute the Jacobian matrix
from a vector-valued function.

Type t evaluators are different from type v evaluators in that the
resulting vector of values should not be summed. One example is when the
function f() performs a nonlinear transformation from the domain of p to
the domain of v.

deriv_init() is used to begin a derivative problem. Store the returned
result in a variable name of your choosing; we have used D in this
documentation. You pass D as the first argument to the other deriv*()
functions.

deriv_init() sets all deriv_init_*() values to their defaults. You may
use the query form of deriv_init_*() to determine an individual default,
or you can use deriv_query() to see them all.

The query form of deriv_init_*() can be used before or after calling
deriv().

deriv_init_evaluator() and deriv_init_evaluatortype()

voidderiv_init_evaluator(D,pointer(function) scalar fptr)

voidderiv_init_evaluatortype(D,evaluatortype)

pointer(function) scalarderiv_init_evaluator(D)

string scalarderiv_init_evaluatortype(D)

deriv_init_evaluator(D,fptr) specifies the function to be called to
evaluate f(p). Use of this function is required. If your function is
named myfcn(), you code deriv_init_evaluator(D,&myfcn()).

deriv_init_evaluatortype(D,evaluatortype) specifies the capabilities of
the function that has been set using deriv_init_evaluator().
Alternatives for evaluatortype are "d", "v", and "t". The default is "d"
if you do not invoke this function.

deriv_init_evaluator(D) returns a pointer to the function that has been
set.

deriv_init_evaluatortype(D) returns the evaluator type currently set.

deriv_init_argument() and deriv_init_narguments()

voidderiv_init_argument(D,real scalar k,X)

voidderiv_init_narguments(D,real scalar K)

pointer scalarderiv_init_argument(D,real scalar k)

real scalarderiv_init_narguments(D)

deriv_init_argument(D,k,X) sets the kth extra argument of the evaluator
function to be X. X can be anything, including a view matrix or even a
pointer to a function. No copy of X is made; it is a pointer to X that
is stored, so any changes you make to X between setting it and X being
used will be reflected in what is passed to the evaluator function.

deriv_init_narguments(D,K) sets the number of extra arguments to be
passed to the evaluator function. This function is useless and included
only for completeness. The number of extra arguments is automatically
set when you use deriv_init_argument().

deriv_init_argument(D,k) returns a pointer to the object that was
previously set.

deriv_init_narguments(D) returns the number of extra arguments that were
passed to the evaluator function.

deriv_init_weights()

voidderiv_init_weights(D,real colvector weights)

pointer scalarderiv_init_weights(D)

deriv_init_weights(D,weights) sets the weights used with type v
evaluators to produce the function value. By default, deriv() with a
type v evaluator uses colsum(v) to compute the function value. With
weights, deriv() uses cross(weights,v). weights must be row conformable
with the column vector returned by the evaluator.

deriv_init_weights(D) returns a pointer to the weight vector that was
previously set.

deriv_init_params()

voidderiv_init_params(D,real rowvector params)

real rowvectorderiv_init_params(D)

deriv_init_params(D,params) sets the parameter values at which the
derivatives will be computed. Use of this function is required.

deriv_init_params(D) returns the parameter values at which the
derivatives were computed.

Advanced init functions

The rest of the deriv_init_*() functions provide finer control of the
numerical derivative taker.

deriv_init_bounds(D,minmax) sets the minimum and maximum values used to
search for optimal scale values. The default is minmax = (1e-8, 1e-7).

deriv_init_search(D,"interpolate") causes deriv() to use linear and
quadratic interpolation to search for an optimal delta for computing the
numerical derivatives. This is the default search method.

deriv_init_search(D,"bracket") causes deriv() to use a bracketed
quadratic formula to search for an optimal delta for computing the
numerical derivatives.

deriv_init_search(D,"off") prevents deriv() from searching for an
optimal delta.

deriv_init_h(D) returns the user-specified h values.

deriv_init_scale(D) returns the user-specified starting scale values.

deriv_init_bounds(D) returns the user-specified search bounds.

deriv_init_search(D) returns the currently set search method.

deriv_init_verbose()

voidderiv_init_verbose(D,verbose)

string scalarderiv_init_verbose(D)

deriv_init_verbose(D,verbose) sets whether error messages that arise
during the execution of deriv() or _deriv() are to be displayed. Setting
verbose to "on" means that they are displayed; "off" means that they are
not displayed. The default is "on". Setting verbose to "off" is of
interest only to users of _deriv().

The above assumes that your evaluator function is type d. If your
evaluator function type is v (that is, it returns a column vector of
values instead of a scalar value), you will also have coded

deriv_init_evaluatortype(D, "v")

and you may have coded other deriv_init_*() functions as well.

Once deriv() completes, you may use the deriv_result_*() functions. You
may also continue to use the deriv_init_*() functions to access initial
settings, and you may use them to change settings and recompute
derivatives (that is, invoke deriv() again) if you wish.

_deriv()

real scalar_deriv(D,todo)

_deriv(D) performs the same actions as deriv(D) except that, rather than
returning the requested derivatives, _deriv() returns a real scalar and,
rather than aborting if numerical issues arise, _deriv() returns a
nonzero value. _deriv() returns 0 if all went well. The returned value
is called an error code.

deriv() returns the requested result. It can work that way because the
numerical derivative calculation must have gone well. Had it not,
deriv() would have aborted execution.

_deriv() returns an error code. If it is 0, the numerical derivative
calculation went well, and you can obtain the gradient vector by using
deriv_result_gradient(). If things did not go well, you can use the
error code to diagnose what went wrong and take the appropriate action.

Thus _deriv(D) is an alternative to deriv(D). Both functions do the same
thing. The difference is what happens when there are numerical
difficulties.

deriv() and _deriv() work around most numerical difficulties. For
instance, the evaluator function you write is allowed to return v equal
to missing if it cannot calculate the f() at p+d. If that happens while
computing the derivative, deriv() and _deriv() will search for a better d
for taking the derivative. deriv(), however, cannot tolerate that
happening at p (the parameter values you set using deriv_init_params())
because the function value must exist at the point when you want deriv()
to compute the numerical derivative. deriv() issues an error message and
aborts, meaning that execution is stopped. There can be advantages in
that. The calling program need not include complicated code for such
instances, figuring that stopping is good enough because a human will
know to address the problem.

_deriv(), however, does not stop execution. Rather than aborting,
_deriv() returns a nonzero value to the caller, identifying what went
wrong. The only exception is that _deriv() will return a zero value to
the caller even when the evaluator function returns v equal to missing at
p, allowing programmers to handle this special case without having to
turn deriv_init_verbose() off.

Programmers implementing advanced systems will want to use _deriv()
instead of deriv(). Everybody else should use deriv().

deriv_result_values(D) returns the vector values returned by the
evaluator. For type v evaluators, this is the column vector that sums to
the value of f() evaluated at p. For type t evaluators, this is the
rowvector returned by the evaluator.

_deriv_result_values(D,v) uses swap() to interchange v with the vector
values stored in D. This destroys the vector values stored in D.

These functions should be called only with type v evaluators.

deriv_result_gradient()_deriv_result_gradient()

real rowvectorderiv_result_gradient(D)

void_deriv_result_gradient(D,g)

deriv_result_gradient(D) returns the gradient vector evaluated at p.

_deriv_result_gradient(D,g) uses swap() to interchange g with the
gradient vector stored in D. This destroys the gradient vector stored in
D.

deriv_result_scores()_deriv_result_scores()

real matrixderiv_result_scores(D)

void_deriv_result_scores(D,S)

deriv_result_scores(D) returns the matrix of the scores evaluated at p.
The matrix of scores can be summed over the columns to produce the
gradient vector.

_deriv_result_scores(D,S) uses swap() to interchange S with the scores
matrix stored in D. This destroys the scores matrix stored in D.

These functions should be called only with type v evaluators.

deriv_result_Jacobian()_deriv_result_Jacobian()

real matrixderiv_result_Jacobian(D)

void_deriv_result_Jacobian(D,J)

deriv_result_Jacobian(D) returns the Jacobian matrix evaluated at p.

_deriv_result_Jacobian(D,J) uses swap() to interchange J with the
Jacobian matrix stored in D. This destroys the Jacobian matrix stored in
D.

These functions should be called only with type t evaluators.

deriv_result_Hessian()_deriv_result_Hessian()

real matrixderiv_result_Hessian(D)

void_deriv_result_Hessian(D,H)

deriv_result_Hessian(D) returns the Hessian matrix evaluated at p.

_deriv_result_Hessian(D,H) uses swap() to interchange H with the Hessian
matrix stored in D. This destroys the Hessian matrix stored in D.

These functions should not be called with type t evaluators.

deriv_result_h()deriv_result_scale()deriv_result_delta()

real rowvectorderiv_result_h(D)

real rowvectorderiv_result_scale(D)

real rowvectorderiv_result_delta(D)

deriv_result_h(D) returns the vector of h values that was used to compute
the numerical derivatives.

deriv_result_scale(D) returns the vector of scale values that was used to
compute the numerical derivatives.

deriv_result_delta(D) returns the vector of delta values used to compute
the numerical derivatives.

17 459 Hessian calculations not allowed with type
t evaluators
----------------------------------------------------------------------
Note: Error 4 can occur only when evaluating f() at the
parameter values. This error occurs only with deriv().

deriv_query()

voidderiv_query(D)

deriv_query(D) displays a report on the current deriv_init_*() values and
some of the deriv_result_*() values. deriv_query(D) may be used before
or after deriv(), and it is useful when using deriv() interactively or
when debugging a program that calls deriv() or _deriv().

Conformability

All functions have 1 x 1 inputs and have 1 x 1 or void outputs, except
the following: