PROC HPGENSELECT Statement

PROC HPGENSELECT<options>;

The PROC HPGENSELECT statement invokes the procedure. Table 52.1 summarizes the available options in the PROC HPGENSELECT statement by function. The options are then described fully in alphabetical
order.

You can specify the following options in the PROC HPGENSELECT statement.

ABSCONV=r
ABSTOL=r

specifies an absolute function convergence criterion. For minimization, termination requires r, where is the vector of parameters in the optimization and is the objective function. The default value of r is the negative square root of the largest double-precision value, which serves only as a protection against overflow.

ABSFCONV=r <n>
ABSFTOL=r <n>

specifies an absolute function difference convergence criterion. For all techniques except NMSIMP, termination requires a
small change of the function value in successive iterations:

Here, denotes the vector of parameters that participate in the optimization, and is the objective function. The same formula is used for the NMSIMP technique, but is defined as the vertex that has the lowest function value and is defined as the vertex that has the highest function value in the simplex. The default value is r = 0. The optional integer value n specifies the number of successive iterations for which the criterion must be satisfied before the process can be terminated.

Here, denotes the vector of parameters that participate in the optimization, and is the gradient of the objective function with respect to the jth parameter. This criterion is not used by the NMSIMP technique. The default value is r = 1E–8. The optional integer value n specifies the number of successive iterations for which the criterion must be satisfied before the process can be terminated.

ALPHA=number

specifies a global significance level for the construction of confidence intervals. The confidence level is 1 – number. The value of number must be between 0 and 1; the default is 0.05. You can override this global significance level by specifying the ALPHA=
option in the MODEL
statement or the ALPHA=
option in the OUTPUT
statement.

CORR

creates the "Parameter Estimates Correlation Matrix" table. The correlation matrix is computed by normalizing the covariance
matrix . That is, if is an element of , then the corresponding element of the correlation matrix is , where .

COV

creates the "Parameter Estimates Covariance Matrix" table. The covariance matrix is computed as the inverse of the negative
of the matrix of second derivatives of the log-likelihood function with respect to the model parameters (the Hessian matrix).

DATA=SAS-data-set

names the input SAS data set for PROC HPGENSELECT to use. The default is the most recently created data set.

If the procedure executes in distributed mode, the input data are distributed to memory on the appliance nodes and analyzed
in parallel, unless the data are already distributed in the appliance database. In that case the procedure reads the data
alongside the distributed database. For information about the various execution modes, see the section Processing Modes in SAS/STAT 14.1 User's Guide: High-Performance Procedures; for information about the alongside-the-database model, see the section Alongside-the-Database Execution in SAS/STAT 14.1 User's Guide: High-Performance Procedures

FCONV=r <n>
FTOL=r <n>

specifies a relative function difference convergence criterion. For all techniques except NMSIMP, termination requires a small
relative change of the function value in successive iterations:

Here, denotes the vector of parameters that participate in the optimization, and is the objective function. The same formula is used for the NMSIMP technique, but is defined as the vertex that has the lowest function value, and is defined as the vertex that has the highest function value in the simplex.

The default value is r = , where is the machine precision. The optional integer value n specifies the number of successive iterations for which the criterion must be satisfied before the process can terminate.

FMTLIBXML=file-ref

specifies the file reference for the XML stream that contains the user-defined format definitions. User-defined formats are
handled differently in a distributed computing environment than they are in other SAS products. For information about how
to generate an XML stream for your formats, see the section Working with Formats in SAS/STAT 14.1 User's Guide: High-Performance Procedures.

GCONV=r <n>
GTOL=r <n>

specifies a relative gradient convergence criterion. For all techniques except CONGRA and NMSIMP, termination requires that
the normalized predicted function reduction be small:

Here, denotes the vector of parameters that participate in the optimization, is the objective function, and is the gradient. For the CONGRA technique (where a reliable Hessian estimate is not available), the following criterion is used:

This criterion is not used by the NMSIMP technique. The default value is r=1E–8. The optional integer value n specifies the number of successive iterations for which the criterion must be satisfied before the process can terminate.

INEST=SAS-data-set

names the SAS data set that contains starting values for the parameters. Your data set must include the _TYPE_ variable, a character variable in which the value 'PARMS' indicates the observation that contains your starting values. The
data set also includes a numeric variable for each parameter for which you are specifying a starting value; the name of this
numeric variable is the parameter name. You can obtain parameter names by specifying the OUTEST
option and by using the ODS OUTPUT statement to output the "Parameter Estimates" table into a data set; the parameter name
is contained in the ParmName variable in this data set. If you do not specify a starting value for a parameter, it is set to 0. PROC HPGENSELECT uses
only the first observation for which _TYPE_=PARMS, and it ignores BY variables. You can also specify single-parameter equality constraints by using a value of 'EQ' for
the variable _TYPE_ to indicate the observation that contains your equality constraints, and similarly by using values for _TYPE_ of 'UB' for upper bounds and 'LB' for lower bounds on parameters.

ITDETAILS

adds to the "Iteration History" table the current values of the parameter estimates and their gradients. These quantities
are reported only for parameters that participate in the optimization. This option is not available when you perform model
selection.

ITSELECT

generates the "Iteration History" table when you perform a model selection.

ITSUMMARY

generates the "Iteration History" table. This option is not available when you perform model selection.

LASSORHO=r

specifies the base regularization parameter for the LASSO model selection method. The regularization parameter for step i is .

LASSOSTEPS=n

specifies the maximum number of steps for LASSO model selection.

LASSOTOL=r

specifies the convergence tolerance for the optimization algorithm that solves for the LASSO parameter estimates at each step
of LASSO model selection.

MAXFUNC=n
MAXFU=n

specifies the maximum number of function calls in the optimization process. The default values are as follows, depending on
the optimization technique:

TRUREG, NRRIDG, NEWRAP: n = 125

QUANEW, DBLDOG: n = 500

CONGRA: n = 1,000

NMSIMP: n = 3,000

The optimization can terminate only after completing a full iteration. Therefore, the number of function calls that are actually
performed can exceed n. You can choose the optimization technique by specifying the TECHNIQUE=
option.

MAXITER=n
MAXIT=n

specifies the maximum number of iterations in the optimization process. The default values are as follows, depending on the
optimization technique:

TRUREG, NRRIDG, NEWRAP: n = 50

QUANEW, DBLDOG: n = 200

CONGRA: n = 400

NMSIMP: n = 1,000

These default values also apply when n is specified as a missing value. You can choose the optimization technique by specifying the TECHNIQUE=
option.

MAXTIME=r

specifies an upper limit of r seconds of CPU time for the optimization process. The default value is the largest floating-point double representation of
your computer. The time specified by this option is checked only once at the end of each iteration. Therefore, the actual
running time can be longer than r.

MINITER=n
MINIT=n

specifies the minimum number of iterations. The default value is 0. If you request more iterations than are actually needed
for convergence to a stationary point, the optimization algorithms might behave strangely. For example, the effect of rounding
errors can prevent the algorithm from continuing for the required number of iterations.

NAMELEN=number

specifies the length to which long effect names are shortened. The default and minimum value is 20.

NOCLPRINT<=number>

suppresses the display of the "Class Level Information" table if you do not specify number. If you specify number, the values of the classification variables are displayed for only those variables whose number of levels is less than number. Specifying a number helps to reduce the size of the "Class Level Information" table if some classification variables have a large number of levels.

NOPRINT

suppresses the generation of ODS output.

NORMALIZE=YES | NO

specifies whether to normalize the objective function during optimization by the reciprocal of the frequency count of observations
that are used in the analysis. This option affects the values that are reported in the "Iteration History" table. The results
that are reported in the "Fit Statistics" are always displayed for the nonnormalized log-likelihood function. By default,
NORMALIZE = NO.

NOSTDERR

suppresses the computation of the covariance matrix and the standard errors of the regression coefficients. When the model
contains many variables (thousands), the inversion of the Hessian matrix to derive the covariance matrix and the standard
errors of the regression coefficients can be time-consuming.

OUTEST

adds a column for the ParmName variable to the "Parameter Estimates" table. This column is not displayed, but you can use
it to create a data set that you can specify in an INEST= option by first using the ODS OUTPUT statement to output the "Parameter
Estimates" table and then submitting the following statements: