Example 116.2 Spline Model with Higher-Order Penalty

This example continues the analysis of the data set Measure to illustrate how you can use PROC TPSPLINE to fit a spline model with a higher-order penalty term. Spline models with high-order
penalty terms move low-order polynomial terms into the polynomial space. Hence, there is no penalty for these terms, and they
can vary without constraint.

As shown in the previous analyses, the final model for the data set Measure must include quadratic terms for both and . This example fits the following model:

The model includes quadratic terms for both variables, although it differs from the usual linear model. The nonparametric
term explains the variation of the data that is unaccounted for by a simple quadratic surface.

To modify the order of the derivative in the penalty term, specify the M=
option. The following statements specify the option M=3 in order to include the quadratic terms in the polynomial space:

The model contains six terms in the polynomial space is the number of columns in (). Compare Output 116.2.1 with Output 116.1.1: the value and the smoothing penalty differ significantly. In general, these terms are not directly comparable for different models.
The final estimate based on this model is close to the estimate based on the model by using the default, M=2.

In the following statements, the REG procedure fits a quadratic surface model to the data set Measure:

The REG procedure produces slightly different results. To fit a similar model with PROC TPSPLINE, you can use a MODEL statement
that specifies the degrees of freedom with the DF= option. You can also use a large value for the LOGNLAMBDA0=
option to force a parametric model fit.

Because there is one degree of freedom for each of the terms intercept, x1, x2, x1sq, x2sq, and x1x2, the DF=6 option is used as follows:

Output 116.2.4 shows the GCV values for the list of supplied values in addition to the fitted model with fixed degrees of freedom 6. The fitted model has a larger GCV value than all
other examined models.

Output 116.2.4: Criterion Plot

The final estimate is based on 6.000330 degrees of freedom because there are already 6 degrees of freedom in the polynomial
space and the search range for is not large enough (in this case, setting DF=6 is equivalent to setting ).

The standard deviation and RSS (Output 116.2.3) are close to the sum of squares for the error term and the root MSE from the linear regression model (Output 116.2.2), respectively.

For this model, the optimal is around –3.8, which produces a standard deviation estimate of 0.096765 (see Output 116.2.1) and a GCV value of 0.016051, while the model that specifies DF=6 results in a larger than 1 and a GCV value larger than 0.23074. The nonparametric model, based on the GCV, should provide better prediction,
but the linear regression model can be more easily interpreted.