npglpreg: Generalized Local Polynomial Regression

Description

npglpreg computes a generalized local polynomial kernel
regression estimate (Hall and Racine (forthcoming)) of a one (1)
dimensional dependent variable on an r-dimensional vector of
continuous and categorical
(factor/ordered) predictors.

initial.mesh.size.integer

argument passed to the NOMAD solver (see snomadr for
further details)

min.mesh.size.real

argument passed to the NOMAD solver (see snomadr for
further details)

min.mesh.size.integer

arguments passed to the NOMAD solver (see snomadr for
further details)

min.poll.size.real

arguments passed to the NOMAD solver (see snomadr for
further details)

min.poll.size.integer

arguments passed to the NOMAD solver (see snomadr for
further details)

opts

list of optional arguments passed to the NOMAD solver
(see snomadr for further details)

nmulti

integer number of times to restart the process of finding extrema of
the cross-validation function from different (random) initial
points (default nmulti=5)

random.seed

when it is not missing and not equal to 0, the
initial points will be generated using this seed when using
snomadr

degree.max

the maximum degree of the polynomial for
each of the continuous predictors (default degree.max=10)

degree.min

the minimum degree of the polynomial for
each of the continuous predictors (default degree.min=0)

bandwidth.max

the maximum bandwidth scale (i.e. number of
scaled standard deviations) for each of the continuous predictors
(default bandwidth.max=.Machine$double.xmax)

bandwidth.min

the minimum bandwidth scale for each of the
categorical predictors (default sqrt(.Machine$double.eps))

bandwidth.min.numeric

the minimum bandwidth scale (i.e. number
of scaled standard deviations) for each of the continuous predictors
(default bandwidth.min=1.0e-02)

bandwidth.switch

the minimum bandwidth scale (i.e. number of
scaled standard deviations) for each of the continuous predictors
(default bandwidth.switch=1.0e+06) before the local polynomial
is treated as global during cross-validation at which point a global
categorical kernel weighted least squares fit is used for
computational efficiency

bandwidth.scale.categorical

the upper end for the rescaled
bandwidths for the categorical predictors (default
bandwidth.scale.categorical=1.0e+04) - the aim is to ‘even up’
the scale of the search parameters as much as possible, so when very
large scale factors are selected for the continuous predictors, a
larger value may improve search

restart.from.min

a logical value indicating to recommence
snomadr with the optimal values found from its first
invocation (typically quick but sometimes recommended in the field of
optimization)

gradient.vec

a vector corresponding to the order of the
partial (or cross-partial) and which variable the partial (or
cross-partial) derivative(s) are required

gradient.categorical

a logical value indicating whether
discrete gradients (i.e. differences in the response from the base
value for each categorical predictor) are to be computed

cv.shrink

a logical value indicating whether to use ridging
(Seifert and Gasser (2000)) for ill-conditioned inversion during
cross-validation (default cv.shrink=TRUE) or to instead test
for ill-conditioned matrices and penalize heavily when this is the
case (much stronger condition imposed on cross-validation)

cv.maxPenalty

a penalty applied during cross-validation when a
delete-one estimate is not finite or the polynomial basis is not of
full column rank

cv.warning

Bernstein

a logical value indicating whether to use raw
polynomials or Bernstein polynomials (default) (note that a Bernstein
polynomial is also know as a Bezier curve which is also a
B-spline with no interior knots)

mpi

a logical value (default mpi=FALSE) that, when
mpi=TRUE, can call the npRmpi rather than the np
package (note - code needs to mirror examples in the demo directory of
the npRmpi package, you need to broadcast loading of the
crs package, and need .Rprofile in your current
directory)

This function is in beta status until further notice (eventually it
will be rolled into the np/npRmpi packages after the final status of
snomadr/NOMAD gets sorted out).

Optimizing the cross-validation function jointly for bandwidths
(vectors of continuous parameters) and polynomial degrees (vectors of
integer parameters) constitutes a mixed-integer optimization
problem. These problems are not only ‘hard’ from the numerical
optimization perspective, but are also computationally intensive
(contrast this to where we conduct, say, local linear regression which
sets the degree of the polynomial vector to a global value
degree=1 hence we only need to optimize with respect to the
continuous bandwidths). Because of this we must be mindful of the
presence of local optima (the objective function is non-convex and
non-differentiable). Restarting search from different initial starting
points is recommended (see nmulti) and by default this is done
more than once. We encourage users to adopt ‘multistarting’ and
to investigate the impact of changing default search parameters such
as initial.mesh.size.real, initial.mesh.size.integer,
min.mesh.size.real,
min.mesh.size.integer,min.poll.size.real, and
min.poll.size.integer. The default values were chosen based on
extensive simulation experiments and were chosen so as to yield robust
performance while being mindful of excessive computation - of course,
no one setting can be globally optimal.

fitted.values

estimates of the regression function
(conditional mean) at the sample points or evaluation points

residuals

residuals computed at the sample points or
evaluation points

degree

integer/vector specifying the degree of the polynomial
for each dimension of the continuous x

gradient

the estimated gradient (vector) corresponding to the vector
gradient.vec

gradient.categorical.mat

the estimated gradient (matrix) for
the categorical predictors

gradient.vec

the supplied gradient.vec

bws

vector of bandwidths

bwtype

the supplied bwtype

call

a symbolic description of the model

r.squared

coefficient of determination (Doksum and Samarov (1995))

Note

Note that the use of raw polynomials (Bernstein=FALSE) for
approximation is appealing as they can be computed and differentiated
easily, however, they can be unstable (their inversion can be ill
conditioned) which can cause problems in some instances as the order
of the polynomial increases. This can hamper search when excessive
reliance on ridging to overcome ill conditioned inversion becomes
computationally burdensome.

npglpreg tries to detect whether this is an issue or not when
Bernstein=FALSE for each numeric predictor and will
adjust the search range for snomadr and the degree fed
to npglpreg if appropriate.

However, if you suspect that this might be an issue for your specific
problem and you are using raw polynomials (Bernstein=FALSE),
you are encouraged to investigate this by limiting degree.max
to value less than the default value (say 3). Alternatively,
you might consider re-scaling your numeric predictors to lie in
[0,1] using scale.

For a given predictor x you can readily determine if this is an
issue by considering the following: Suppose x is given by