cobs: COnstrained B-Splines Nonparametric Regression Quantiles

Description

Computes constrained quantile curves using linear or quadratic splines. The median spline (\(L_1\) loss) is a robust (constrained) smoother.

Usage

cobs(x, y, constraint = c("none", "increase", "decrease",
                          "convex", "concave", "periodic"),
     w = rep(1,n),
     knots, nknots = if(lambda == 0) 6 else 20,
     method = "quantile", degree = 2, tau = 0.5,
     lambda = 0,
     ic = c("AIC", "SIC", "BIC", "aic", "sic", "bic"),
     knots.add = FALSE, repeat.delete.add = FALSE, pointwise = NULL,
     keep.data = TRUE, keep.x.ps = TRUE,
     print.warn = TRUE, print.mesg = TRUE, trace = print.mesg,
     rq.print.warn = FALSE,
     lambdaSet = exp(seq(log(lambda.lo), log(lambda.hi), length.out = lambda.length)),
     lambda.lo = f.lambda*1e-4, lambda.hi = f.lambda*1e3, lambda.length = 25,
     maxiter = 100,
     rq.tol = 1e-8, toler.kn = 1e-6, tol.0res = 1e-6, nk.start = 2)

Value

an object of class cobs, a list with components

call: the cobs(..) call used for creation.
tau, degree: same as input
constraint: as input (but no more abbreviated).
pointwise: as input.
coef: B-spline coefficients.
knots: the final set of knots used in the computation.
ifl: exit code := 1 + ierr and ierr is the error from rq.fit.sfnc (package quantreg); consequently, ifl = 1 means “success”; all other values point to algorithmic problems or failures.
icyc: length 2: number of cycles taken to achieve convergence for final lambda, and total number of cycles for all lambdas.
k: the effective dimensionality of the final fit.
k0: (usually the same)
x.ps: the pseudo design matrix \(X\) (as returned by qbsks2).
resid: vector of residuals from the fit.
fitted: vector of fitted values from the fit.
SSy: the sum of squares around centered y (e.g. for computation of \(R^2\).)
lambda: the penalty parameter used in the final fit.
pp.lambda: vector of all lambdas used for lambda search when lambda < 0 on input.
pp.sic: vector of Schwarz information criteria evaluated at pp.lambda; note that it is not quite sure how good these are for determining an optimal lambda.

Arguments

x

vector of covariate; missing values are omitted.

y

vector of response variable. It must have the same length as x.

constraint

character (string) specifying the kind of constraint; must be one of the values in the default list above; may be abbreviated. More flexible constraints can be specified via the pointwise argument (below).

w

vector of weights the same length as x (y) assigned to both x and y; default to all weights being one.

knots

vector of locations of the knot mesh; if missing, nknots number of knots will be created using the specified method and automatic knot selection will be carried out for regression B-spline (lambda=0); if not missing and length(knots)==nknots, the provided knot mesh will be used in the fit and no automatic knot selection will be performed; otherwise, automatic knots selection will be performed on the provided knots.

nknots

maximum number of knots; defaults to 6 for regression B-splines, 20 for smoothing B-splines.

method

character string specifying the method for generating nknots number of knots when knots is not provided; either "quantile" (equally spaced in percentile levels) or "uniform" (equally spaced knots); defaults to "quantile".

degree

degree of the splines; 1 for linear spline (equivalent to \(L_1\)-roughness) and 2 for quadratic spline (corresponding to \(L_{\infty}\) roughness); defaults to 2.

tau

desired quantile level; defaults to 0.5 (median).

lambda

penalty parameter \(\lambda\)
\(\lambda = 0\): no penalty (regression B-spline);
\(\lambda > 0\): smoothing B-spline with the given \(\lambda\);
\(\lambda < 0\): smoothing B-spline with \(\lambda\) chosen by a Schwarz-type information criterion.

ic

string indicating the information criterion used in knot deletion and addition for the regression B-spline method, i.e., when lambda == 0;
"AIC" (Akaike-type, equivalently "aic") or
"SIC" (Schwarz-type, equivalently "BIC", "sic" or "bic"). Defaults to "AIC".

Note that the default was "SIC" up to cobs version 1.1-6 (dec.2008).

knots.add

logical indicating if an additional step of stepwise knot addition should be performed for regression B-splines.

repeat.delete.add

logical indicating if an additional step of stepwise knot deletion should be performed for regression B-splines.

pointwise

an optional three-column matrix with each row specifies one of the following constraints:

( 1,xi,yi):: fitted value at xi will be \(\ge\) yi;

(-1,xi,yi):

fitted value at xi will be \(\le\) yi;

( 0,xi,yi):

fitted value at xi will be \(=\) yi;

( 2,xi,yi):

derivative of the fitted function at xi will be yi.

keep.data

logical indicating if the x and y input vectors after removing NAs should be kept in the result.

keep.x.ps

logical indicating if the pseudo design matrix \(\tilde{X}\) should be returned (as sparse matrix). That is needed for interval prediction, predict.cobs(*, interval=..).

print.warn

flag for printing of interactive warning messages; true by default; set to FALSE if performing simulation.

print.mesg

logical flag or integer for printing of intermediate messages; true by default. Probably needs to be set to FALSE in simulations.

trace

integer \(\ge 0\) indicating how much diagnostics the low-level code in drqssbc2 should print; defaults to print.mesg.

rq.print.warn

flag passed to the fitting function, i.e., either to qbsks2() or directly to drqssbc2().

lambdaSet

numeric vector of lambda values to use for grid search; in that case, defaults to a geometric sequence (a “grid in log scale”) from lambda.lo to lambda.hi of length lambda.length.

lambda.lo, lambda.hi

lower and upper bound of the grid search for lambda (when lambda < 0). The defaults are meant to keep everything scale-equivariant and are hence using the factor \(f = \sigma_x^d\), i.e., f.lambda <- sd(x)^degree. Note however that currently the underlying algorithms in package quantreg are not scale equivariant yet.

lambda.length

number of grid points in the grid search for optimal lambda.

maxiter

upper bound of the number of iterations; defaults to 100.

rq.tol

numeric convergence tolerance for the interior point algorithm called from rq.fit.sfnc() or rq.fit.sfn().

toler.kn

numeric tolerance for shifting the boundary knots outside; defaults to \(10^{-6}\).

tol.0res

tolerance for testing \(|r_i| = 0\), passed to qbsks2 and drqssbc2.

nk.start

number of starting knots used in automatic knot selection. Defaults to the minimum of 2 knots.

Details

cobs() computes the constraint quantile smoothing B-spline with penalty when lambda is not zero.
If lambda < 0, an optimal lambda will be chosen using Schwarz type information criterion.
If lambda > 0, the supplied lambda will be used.
If lambda = 0, cobs computes the constraint quantile regression B-spline with no penalty using the provided knots or those selected by Akaike or Schwarz information criterion.

References

Ng, P. and Maechler, M. (2007) A Fast and Efficient Implementation of Qualitatively Constrained Quantile Smoothing Splines, Statistical Modelling 7(4), 315-328.

Koenker, R. and Ng, P. (2005) Inequality Constrained Quantile Regression, Sankhya, The Indian Journal of Statistics 67, 418--440.

He, X. and Ng, P. (1999) COBS: Qualitatively Constrained Smoothing via Linear Programming; Computational Statistics 14, 315--337.

Koenker, R. and Ng, P. (1996) A Remark on Bartels and Conn's Linearly Constrained L1 Algorithm, ACM Transaction on Mathematical Software 22, 493--495.

Ng, P. (1996) An Algorithm for Quantile Smoothing Splines, Computational Statistics & Data Analysis 22, 99--118.

Bartels, R. and Conn A. (1980) Linearly Constrained Discrete \(L_1\) Problems, ACM Transaction on Mathematical Software 6, 594--608.

A postscript version of the paper that describes the details of COBS can be downloaded from https://www2.nau.edu/PinNg/cobs.html.

Examples

Run this code

x <- seq(-1,3,,150)
y <- (f.true <- pnorm(2*x)) + rnorm(150)/10
## specify pointwise constraints (boundary conditions)
con <- rbind(c( 1,min(x),0), # f(min(x)) >= 0
             c(-1,max(x),1), # f(max(x)) <= 1
             c(0,  0,   0.5))# f(0)      = 0.5

## obtain the median  REGRESSION  B-spline using automatically selected knots
Rbs <- cobs(x,y, constraint= "increase", pointwise = con)
Rbs
plot(Rbs, lwd = 2.5)
lines(spline(x, f.true), col = "gray40")
lines(predict(cobs(x,y)), col = "blue")
mtext("cobs(x,y)   # completely unconstrained", 3, col= "blue")

## compute the median  SMOOTHING  B-spline using automatically chosen lambda
Sbs <- cobs(x,y, constraint="increase", pointwise= con, lambda= -1)
summary(Sbs)
plot(Sbs) ## by default  includes  SIC ~ lambda

Sb1 <- cobs(x,y, constraint="increase", pointwise= con, lambda= -1,
            degree = 1)
summary(Sb1)
plot(Sb1)

plot(Sb1, which = 2) # only the  data + smooth
rug(Sb1$knots, col = 4, lwd = 1.6)# (too many knots)
xx <- seq(min(x) - .2, max(x)+ .2, len = 201)
pxx <- predict(Sb1, xx, interval = "both")
lines(pxx, col = 2)
mtext(" + pointwise and simultaneous 95% - confidence intervals")
matlines(pxx[,1], pxx[,-(1:2)], col= rep(c("green3","blue"), c(2,2)), lty=2)

Run the code above in your browser using DataLab