1 Purpose

nag_2d_spline_fit_scat (e02ddc) computes a bicubic spline approximation to a set of scattered data. The knots of the spline are located automatically, but a single argument must be specified to control the trade-off between closeness of fit and smoothness of fit.

3 Description

nag_2d_spline_fit_scat (e02ddc) determines a smooth bicubic spline approximation sx,y to the set of data points xr,yr,fr with weights wr, for r=1,2,…,m.

The approximation domain is considered to be the rectangle xmin,xmax×ymin,ymax, where xminymin and xmaxymax denote the lowest and highest data values of xy.

The spline is given in the B-spline representation

sx,y=∑i=1nx-4∑j=1ny-4cijMixNjy,

(1)

where Mix and Njy denote normalized cubic B-splines, the former defined on the knots λi to λi+4 and the latter on the knots μj to μj+4. For further details, see Hayes and Halliday (1974) for bicubic splines and de Boor (1972) for normalized B-splines.

The total numbers nx and ny of these knots and their values λ1,…,λnx and μ1,…,μny are chosen automatically by the function. The knots λ5,…,λnx-4 and μ5,…,μny-4 are the interior knots; they divide the approximation domain xmin,xmax×ymin,ymax into nx-7×ny-7 subpanels λi,λi+1×μi,μi+1, for i=4,5,…,nx-4 and j=4,5,…,ny-4. Then, much as in the curve case (see nag_1d_spline_fit (e02bec));, the coefficients cij are determined as the solution of the following constrained minimization problem:

minimize

η,

(2)

subject to the constraint

θ=∑r=1mεr2≤S,

(3)

where η is a measure of the (lack of) smoothness of sx,y. Its value depends on the discontinuity jumps in sx,y across the boundaries of the subpanels. It is zero only when there are no discontinuities and is positive otherwise, increasing with the size of the jumps (see Dierckx (1981b) for details). εr denotes the weighted residual wrfr-sxr,yr, and S is a non-negative number to be specified.

By means of the argument S, ‘the smoothing factor’, you will then control the balance between smoothness and closeness of fit, as measured by the sum of squares of residuals in 3. If S is too large, the spline will be too smooth and signal will be lost (underfit); if S is too small, the spline will pick up too much noise (overfit). In the extreme cases the method would return an interpolating spline θ=0 if S were set to zero, and the least squares bicubic polynomial η=0 if S is set very large. Experimenting with S values between these two extremes should result in a good compromise. (See Section 8.3 for advice on choice of S.) Note however, that this function, unlike nag_1d_spline_fit (e02bec) and nag_2d_spline_fit_grid (e02dcc), does not allow S to be set exactly to zero.

The method employed is outlined in Section 8.5 and fully described in Dierckx (1981a) and Dierckx (1981b). It involves an adaptive strategy for locating the knots of the bicubic spline (depending on the function underlying the data and on the value of S), and an iterative method for solving the constrained minimization problem once the knots have been determined.

5 Arguments

The function will build up the knot set starting with no interior knots. No values need be assigned to spline→nx and spline→ny and memory will be internally allocated to spline→lamda, spline→mu and spline→c.

start=Nag_Warm (warm start)

The function will restart the knot-placing strategy using the knots found in a previous call of the function. In this case, all arguments except s must be unchanged from that previous call. This warm start can save much time in searching for a satisfactory value of S.

Constraint:
start=Nag_Cold or Nag_Warm.

2:
m – IntegerInput

On entry: m, the number of data points.

The number of data points with nonzero weight (see weights) must be at least 16.

On entry: weights[r-1] must be set to wr, the rth value in the set of weights, for r=1,2,…,m. Zero weights are permitted and the corresponding points are ignored, except when determining xmin, xmax, ymin and ymax (see Section 8.4). For advice on the choice of weights, see the e02 Chapter Introduction.

Constraint:
the number of data points with nonzero weight must be at least 16.

On entry: an upper bound for the number of knots nx and ny required in the x and y directions respectively. In most practical situations, nxest=nyest=5+m is sufficient. See also Section 8.3.

Constraint:
nxest≥8 and nyest≥8.

10:
fp – double *Output

On exit: the weighted sum of squared residuals, θ, of the computed spline approximation. fp should equal S within a relative tolerance of 0.001 unless spline→nx=spline→ny=8, when the spline has no interior knots and so is simply a bicubic polynomial. For knots to be inserted, S must be set to a value below the value of fp produced in this case.

11:
rank – Integer *Output

On exit: rank gives the rank of the system of equations used to compute the final spline (as determined by a suitable machine-dependent threshold). When rank=spline→nx-4×spline→ny-4, the solution is unique; otherwise the system is rank-deficient and the minimum-norm solution is computed. The latter case may be caused by too small a value of S.

12:
warmstartinf – double *Output

On exit: if the warm start option is used, its value must be left unchanged from the previous call.

13:
spline – Nag_2dSpline *

Pointer to structure of type Nag_2dSpline with the following members:

nx – IntegerInput/Output

On entry: if the warm start option is used, the value of nx must be left unchanged from the previous call.

On exit: the total number of knots, nx, of the computed spline with respect to the x variable.

lamda – double *Input/Output

On entry: a pointer to which if start=Nag_Cold, memory of size nxest is internally allocated. If the warm start option is used, the values lamda[0],lamda[1],…,lamda[nx-1] must be left unchanged from the previous call.

On exit: lamda contains the complete set of knots λi associated with the x variable, i.e., the interior knots lamda[4],lamda[5],…,lamda[nx-5] as well as the additional knots lamda[0]=lamda[1]=lamda[2]=lamda[3]=xmin and lamda[nx-4]=lamda[nx-3]=lamda[nx-2]=lamda[nx-1]=xmax needed for the B-spline representation (where xmin and xmax are as described in Section 3).

ny – IntegerInput/Output

On entry: if the warm start option is used, the value of ny must be left unchanged from the previous call.

On exit: the total number of knots, ny, of the computed spline with respect to the y variable.

mu – double *Input/Output

On entry: a pointer to which if start=Nag_Cold, memory of size nyest is internally allocated. If the warm start option is used, the values mu[0],mu[1],…,mu[ny-1] must be left unchanged from the previous call.

On exit: mu contains the complete set of knots μi associated with the y variable, i.e., the interior knots mu[4], mu[5], …, mu[ny-5] as well as the additional knots mu[0]=mu[1]=mu[2]=mu[3]=ymin and mu[ny-4]=mu[ny-3]=mu[ny-2]=mu[ny-1]=ymax needed for the B-spline representation (where ymin and ymax are as described in Section 3).

c – double *Output

On exit: a pointer to which, if start=Nag_Cold, memory of size nxest-4×nyest-4 is internally allocated. c[ny-4×i-1+j-1] is the coefficient cij defined in Section 3.

The iterative process has failed to converge. Possibly s is too small: s=value.

7 Accuracy

On successful exit, the approximation returned is such that its weighted sum of squared residuals fp is equal to the smoothing factor S, up to a specified relative tolerance of 0.001 – except that if nx=8 and ny=8, fp may be significantly less than S: in this case the computed spline is simply the least squares bicubic polynomial approximation of degree 3, i.e., a spline with no interior knots.

8 Further Comments

8.1 Timing

The time taken for a call of nag_2d_spline_fit_scat (e02ddc) depends on the complexity of the shape of the data, the value of the smoothing factor S, and the number of data points. If nag_2d_spline_fit_scat (e02ddc) is to be called for different values of S, much time can be saved by setting start=Nag_Warm after the first call.

It should be noted that choosing S very small considerably increases computation time.

8.2 Choice of S

If the weights have been correctly chosen (see the e02 Chapter Introduction), the standard deviation of wrfr would be the same for all r, equal to σ, say. In this case, choosing the smoothing factor S in the range σ2m±2m, as suggested by Reinsch (1967), is likely to give a good start in the search for a satisfactory value. Otherwise, experimenting with different values of S will be required from the start.

In that case, in view of computation time and memory requirements, it is recommended to start with a very large value for S and so determine the least squares bicubic polynomial; the value returned for fp, call it fp0, gives an upper bound for S. Then progressively decrease the value of S to obtain closer fits – say by a factor of 10 in the beginning, i.e., S=fp0/10, S=fp0/100, and so on, and more carefully as the approximation shows more details.

To choose S very small is strongly discouraged. This considerably increases computation time and memory requirements. It may also cause rank-deficiency (as indicated by the argument rank) and endanger numerical stability.

The number of knots of the spline returned, and their location, generally depend on the value of S and on the behaviour of the function underlying the data. However, if nag_2d_spline_fit_scat (e02ddc) is called with start=Nag_Warm, the knots returned may also depend on the smoothing factors of the previous calls. Therefore if, after a number of trials with different values of S and start=Nag_Warm, a fit can finally be accepted as satisfactory, it may be worthwhile to call nag_2d_spline_fit_scat (e02ddc) once more with the selected value for S but now using start=Nag_Cold. Often, nag_2d_spline_fit_scat (e02ddc) then returns an approximation with the same quality of fit but with fewer knots, which is therefore better if data reduction is also important.

8.3 Choice of nxest and nyest

The number of knots may also depend on the upper bounds nxest and nyest. Indeed, if at a certain stage in nag_2d_spline_fit_scat (e02ddc) the number of knots in one direction (say nx) has reached the value of its upper bound (nxest), then from that moment on all subsequent knots are added in the other y direction. This may indicate that the value of nxest is too small. On the other hand, it gives you the option of limiting the number of knots the function locates in any direction. For example, by setting nxest=8 (the lowest allowable value for nxest), you can indicate that you want an approximation which is a simple cubic polynomial in the variable x.

8.4 Restriction of the Approximation Domain

The fit obtained is not defined outside the rectangle λ4,λnx-3×μ4,μny-3. The reason for taking the extreme data values of x and y for these four knots is that, as is usual in data fitting, the fit cannot be expected to give satisfactory values outside the data region. If, nevertheless, you require values over a larger rectangle, this can be achieved by augmenting the data with two artificial data points a,c,0 and b,d,0 with zero weight, where a,b×c,d denotes the enlarged rectangle.

8.5 Outline of Method Used

First suitable knot sets are built up in stages (starting with no interior knots in the case of a cold start but with the knot set found in a previous call if a warm start is chosen). At each stage, a bicubic spline is fitted to the data by least squares and θ, the sum of squares of residuals, is computed. If θ>S, a new knot is added to one knot set or the other so as to reduce θ at the next stage. The new knot is located in an interval where the fit is particularly poor. Sooner or later, we find that θ≤S and at that point the knot sets are accepted. The function then goes on to compute a spline which has these knot sets and which satisfies the full fitting criterion specified by 2 and 3. The theoretical solution has θ=S. The function computes the spline by an iterative scheme which is ended when θ=S within a relative tolerance of 0.001. The main part of each iteration consists of a linear least squares computation of special form. The minimal least squares solution is computed wherever the linear system is found to be rank-deficient.

An exception occurs when the function finds at the start that, even with no interior knots nx=ny=8, the least squares spline already has its sum of squares of residuals ≤S. In this case, since this spline (which is simply a bicubic polynomial) also has an optimal value for the smoothness measure η, namely zero, it is returned at once as the (trivial) solution. It will usually mean that S has been chosen too large.

8.6 Evaluation of Computed Spline

The values of the computed spline at the points txr-1,tyr-1, for r=1,2,…,n, may be obtained in the array ff, of length at least n, by the following code:

e02dec(n, tx, ty, ff, &spline, &fail)

where spline is a structure of type Nag_2dSpline which is an output argument of nag_2d_spline_fit_scat (e02ddc).

To evaluate the computed spline on a kx by ky rectangular grid of points in the x-y plane, which is defined by the x coordinates stored in txq-1, for q=1,2,…,kx, and the y coordinates stored in tyr-1, for r=1,2,…,ky, returning the results in the array fg which is of length at least kx×ky, the following call may be used:

e02dfc(kx, ky, tx, ty, fg, &spline, &fail)

where spline is a structure of type Nag_2dSpline which is an output argument of nag_2d_spline_fit_scat (e02ddc). The result of the spline evaluated at grid point q,r is returned in element ky×q-1+r-1 of the array fg.

9 Example

This example program reads in a value of m, followed by a set of m data points xr,yr,fr and their weights wr. It then calls nag_2d_spline_fit_scat (e02ddc) to compute a bicubic spline approximation for one specified value of S, and prints the values of the computed knots and B-spline coefficients. Finally it evaluates the spline at a small sample of points on a rectangular grid.