Isotonic regression surface in two dimensions using warped-plane splines is fitted by the wps routine without using additivity assumptions. Meanwhile, linear covariate effects can also be incorporated in the model. The surface and covariate effects are estimated simultaneously with a single cone projection. A penalized spline fit is also provided which allows for more knots and greater flexibility.
wps(formula, family = gaussian(), data = NULL, weights = NULL, pnt = TRUE,
pen = 0, cpar = 1.2)
A formula object which gives a symbolic description of the model to be fitted. It has the form "response ~ predictor". The response is a vector of length \(n\). The user is supposed to indicate the relationship between the mean value \(E(y)\) and the two predictors \(x_1\) and \(x_2\) in the following way:
Assume that \(E(y)\) is the mean value and we model the relationship between \(E(y)\) and the two predictors \(x_1\) and \(x_2\) as:
A parameter indicating the error distribution and link function to be used in the model. It can be a character string naming a family function or the result of a call to a family function. This is borrowed from the glm routine in the stats package. For now, only the option family = gaussian() is allowed in the wps routine.
An optional data frame, list or environment containing the variables in the model. The default is data = NULL.
An optional non-negative vector of "replicate weights" which has the same length as the response vector. If weights are not given, all weights are taken to equal 1. The default is weights = NULL.
Logical flag indicating if or not penalized warped-plane splines are used. When pnt is TRUE, penalized warped-plane splines are used; otherwise, unpenalized warped-plane splines are used. The default is pnt = TRUE. Note that if pnt is TRUE and the user defines a positive value for the argument pen, then the user-defined penalty term will be used; otherwise, a penalty term will be determined by a pilot parametric fit.
The penalty term for penalized warped-plane splines. It can only be a non-negative real number. The default is pen = \(0\).
A multiplier to estimate the model variance, which is defined as \(\sigma^2 = SSR / (n - d_0 - cpar * edf)\). SSR is the sum of squared residuals for the full model, \(d_0\) is the dimension of the linear space in the cone, and edf is the effective degrees of freedom. The default is cpar = 1.2. See Meyer, M. C. and M. Woodroofe (2000) for more details.
Knots used for \(x_1\).
Knots used for \(x_2\).
The estimated constrained mean value.
The estimated unconstrained mean value.
The sum of squared residuals for the full model.
The sum of squared residuals for the linear part.
The constrained effective degrees of freedom.
The unconstrained effective degrees of freedom.
A matrix whose columns are the edges corresponding to the isotonically modelled predictors \(x_1\) and \(x_2\).
A matrix whose columns represent the parametrically modelled covariate. The user can choose to include a constant vector in it or not. It must have full column rank.
A matrix whose columns are \(x_1\) and \(x_2\).
The estimated constrained coefficients for the basis spanning the null space of the constraint set and edges corresponding to isotonically modelled predictors \(x_1\) and \(x_2\).
The estimated unconstrained coefficients for the basis spanning the null space of the constraint set and edges corresponding to \(x_1\) and \(x_2\).
The estimated coefficients for the parametrically modelled covariate.
The approximate p-values for the estimation of the vector \(\beta\). A t-distribution is used as the approximate distribution.
The standard errors for the estimation of the vector \(\beta\).
The generalized cross validation (GCV) value for the constrained fit.
The generalized cross validation (GCV) value for the unconstrained fit.
A vector storing the names of \(x_1\) and \(x_2\).
A vector storing the names of the parametrically modelled covariate.
A vector keeping track of the positions of the parametrically modelled covariate.
A vector storing the levels of each variable used as a factor.
A vector keeping track of the beginning position of the levels of each variable used as a factor.
A vector keeping track of the end position of the levels of each variable used as a factor.
The name of the response variable.
A vector of two logical values indicating the monotonicity of the isotonically-constrained surface with respect to \(x_1\) and \(x_2\).
The terms objects extracted by the generic function terms from a wps fit. See the official help page (http://stat.ethz.ch/R-manual/R-patched/library/stats/html/terms.html) of the terms function for more details.
A logical scalar showing if or not a variable is a parametrically modelled covariate, which could be a linear term or a factor.
A logical scalar showing if or not a variable is a factor.
The dimension of the linear space contained in the cone generated by all constraint conditions.
The penalty term used in the regression.
The cpar argument in the wps formula
The matched call.
We consider the regression model \(y_i = f(t_{1i}, t_{2i}) + z_i'\beta +\varepsilon_{i}, i = 1,\ldots,n\), where \(\beta\) is a \(p\) by \(1\) parameter vector, and the \(\varepsilon_i's\) are mean-zero random errors. We know a priori that \(f\) is continuous and isotonic in both dimensions; that is, for fixed \(t_{1}\) and \(z\) values, if \(t_{2a} \leq t_{2b}\), then \(f(t_{1}, t_{2a}) \leq f(t_{1}, t_{2a})\), and similarly for fixed \(t_{2}\) and \(z\) values, \(f\) is non-decreasing in \(t_{1}\). For splines of degree two or higher, obtaining a finite set of linear inequality constraints that are necessary and sufficient for isotonicity in both dimensions does not seem to be feasible. However, if we use linear spline basis functions, then the necessary and sufficient constraints are straight-forward and the fitted surface can be described as a continuous piece-wise warped plane, called a "warped-plane spline" (WPS) surface. The surface and covariate effects are estimated simultaneously with a single cone projection (no back-fitting). See references cited in this section and the official manual (https://cran.r-project.org/package=coneproj) for the R package coneproj for more details.
A penalized spline version is also provided. Over each knot rectangle, the regression surface is a warped plane, and the slopes can change abruptly from one rectangle to the next. To obtain smoother fits, and to side-step the problem of knot choices, we can use large number of knots for both predictors and penalize these changes in slopes. The size of the penalty parameter will control the effective degrees of freedom of the fit. In practice, the penalty term can be chosen through generalized cross-validation, similar to the method in Meyer (2012).
Meyer, M. C. and M. Woodroofe (2000) On the degrees of freedom in shape-restricted regression. Annals of Statistics 28, 1083--1104.
Meyer, M. C. (2012) Constrained penalized splines. Canadian Journal of Statistics 40(1), 190--206.
Meyer, M. C. (2016) Estimation and inference for isotonic regression in two dimensions, using warped-plane splines.
library(MASS)
data(Rubber)
# regress loss on hard and tens under the shape-restriction: "doubly-decreasing"
# with a penalty term equal to 1
# use 13 knots for each predictor
ans <- wps(loss ~ dd(hard, tens, numknots = c(13, 13)), data = Rubber, pen = 1)
# make a 3D plot of the constrained surface
plotpersp(ans)
# make a 3D plot of the unconstrained surface
plotpersp(ans, surface = "U")
Run the code above in your browser using DataLab