wps: Warped-Plane Spline-Based Isotonic Regression

Description

Isotonic regression surface in two dimensions using warped-plane splines is fitted by the wps routine without using additivity assumptions. Meanwhile, linear covariate effects can also be incorporated in the model. The surface and covariate effects are estimated simultaneously with a single cone projection. A penalized spline fit is also provided which allows for more knots and greater flexibility.

Usage

wps(formula, family = gaussian(), data = NULL, weights = NULL, pnt = TRUE, 
pen = 0, cpar = 1.2)

Arguments

formula

A formula object which gives a symbolic description of the model to be fitted. It has the form "response ~ predictor". The response is a vector of length \(n\). The user is supposed to indicate the relationship between the mean value \(E(y)\) and the two predictors \(x_1\) and \(x_2\) in the following way:

Assume that \(E(y)\) is the mean value and we model the relationship between \(E(y)\) and the two predictors \(x_1\) and \(x_2\) as:

dd(x1, x2): \(E(y)\) is decreasing in both \(x_1\) and \(x_2\). See dd for more details.
di(x1, x2): \(E(y)\) is decreasing in \(x_1\) and increasing in \(x_2\). See di for more details.
ii(x1, x2): \(E(y)\) is increasing in both \(x_1\) and \(x_2\). See ii for more details.

family

A parameter indicating the error distribution and link function to be used in the model. It can be a character string naming a family function or the result of a call to a family function. This is borrowed from the glm routine in the stats package. For now, only the option family = gaussian() is allowed in the wps routine.

data

An optional data frame, list or environment containing the variables in the model. The default is data = NULL.

weights

An optional non-negative vector of "replicate weights" which has the same length as the response vector. If weights are not given, all weights are taken to equal 1. The default is weights = NULL.

pnt

Logical flag indicating if or not penalized warped-plane splines are used. When pnt is TRUE, penalized warped-plane splines are used; otherwise, unpenalized warped-plane splines are used. The default is pnt = TRUE. Note that if pnt is TRUE and the user defines a positive value for the argument pen, then the user-defined penalty term will be used; otherwise, a penalty term will be determined by a pilot parametric fit.

pen

The penalty term for penalized warped-plane splines. It can only be a non-negative real number. The default is pen = \(0\).

cpar

A multiplier to estimate the model variance, which is defined as \(\sigma^2 = SSR / (n - d_0 - cpar * edf)\). SSR is the sum of squared residuals for the full model, \(d_0\) is the dimension of the linear space in the cone, and edf is the effective degrees of freedom. The default is cpar = 1.2. See Meyer, M. C. and M. Woodroofe (2000) for more details.

Value

Knots used for \(x_1\).

Knots used for \(x_2\).

muhat

The estimated constrained mean value.

muhatu

The estimated unconstrained mean value.

SSE1

The sum of squared residuals for the full model.

SSE0

The sum of squared residuals for the linear part.

edf

The constrained effective degrees of freedom.

edfu

The unconstrained effective degrees of freedom.

delta

A matrix whose columns are the edges corresponding to the isotonically modelled predictors \(x_1\) and \(x_2\).

zmat

A matrix whose columns represent the parametrically modelled covariate. The user can choose to include a constant vector in it or not. It must have full column rank.

xmat

A matrix whose columns are \(x_1\) and \(x_2\).

coefs

The estimated constrained coefficients for the basis spanning the null space of the constraint set and edges corresponding to isotonically modelled predictors \(x_1\) and \(x_2\).

coefsu

The estimated unconstrained coefficients for the basis spanning the null space of the constraint set and edges corresponding to \(x_1\) and \(x_2\).

zcoefs

The estimated coefficients for the parametrically modelled covariate.

%\item{sighat}{The estimated model variance.}

pvals.beta

The approximate p-values for the estimation of the vector \(\beta\). A t-distribution is used as the approximate distribution.

se.beta

The standard errors for the estimation of the vector \(\beta\).

%\item{covmat}{The estimated covariance matrix of \eqn{\hat{\beta}}.}

gcv

The generalized cross validation (GCV) value for the constrained fit.

gcvu

The generalized cross validation (GCV) value for the unconstrained fit.

xnms

A vector storing the names of \(x_1\) and \(x_2\).

znms

A vector storing the names of the parametrically modelled covariate.

zid

A vector keeping track of the positions of the parametrically modelled covariate.

vals

A vector storing the levels of each variable used as a factor.

zid1

A vector keeping track of the beginning position of the levels of each variable used as a factor.

zid2

A vector keeping track of the end position of the levels of each variable used as a factor.

ynm

The name of the response variable.

decrs

A vector of two logical values indicating the monotonicity of the isotonically-constrained surface with respect to \(x_1\) and \(x_2\).

tms

The terms objects extracted by the generic function terms from a wps fit. See the official help page (http://stat.ethz.ch/R-manual/R-patched/library/stats/html/terms.html) of the terms function for more details.

is_param

A logical scalar showing if or not a variable is a parametrically modelled covariate, which could be a linear term or a factor.

is_fac

A logical scalar showing if or not a variable is a factor.

The dimension of the linear space contained in the cone generated by all constraint conditions.

pen

The penalty term used in the regression.

cpar

The cpar argument in the wps formula

call

The matched call.

Details

We consider the regression model \(y_i = f(t_{1i}, t_{2i}) + z_i'\beta +\varepsilon_{i}, i = 1,\ldots,n\), where \(\beta\) is a \(p\) by \(1\) parameter vector, and the \(\varepsilon_i's\) are mean-zero random errors. We know a priori that \(f\) is continuous and isotonic in both dimensions; that is, for fixed \(t_{1}\) and \(z\) values, if \(t_{2a} \leq t_{2b}\), then \(f(t_{1}, t_{2a}) \leq f(t_{1}, t_{2a})\), and similarly for fixed \(t_{2}\) and \(z\) values, \(f\) is non-decreasing in \(t_{1}\). For splines of degree two or higher, obtaining a finite set of linear inequality constraints that are necessary and sufficient for isotonicity in both dimensions does not seem to be feasible. However, if we use linear spline basis functions, then the necessary and sufficient constraints are straight-forward and the fitted surface can be described as a continuous piece-wise warped plane, called a "warped-plane spline" (WPS) surface. The surface and covariate effects are estimated simultaneously with a single cone projection (no back-fitting). See references cited in this section and the official manual (https://cran.r-project.org/package=coneproj) for the R package coneproj for more details.

A penalized spline version is also provided. Over each knot rectangle, the regression surface is a warped plane, and the slopes can change abruptly from one rectangle to the next. To obtain smoother fits, and to side-step the problem of knot choices, we can use large number of knots for both predictors and penalize these changes in slopes. The size of the penalty parameter will control the effective degrees of freedom of the fit. In practice, the penalty term can be chosen through generalized cross-validation, similar to the method in Meyer (2012).

References

Meyer, M. C. and M. Woodroofe (2000) On the degrees of freedom in shape-restricted regression. Annals of Statistics 28, 1083--1104.

Meyer, M. C. (2012) Constrained penalized splines. Canadian Journal of Statistics 40(1), 190--206.

Meyer, M. C. (2016) Estimation and inference for isotonic regression in two dimensions, using warped-plane splines.

Examples

Run this code

  library(MASS)
  data(Rubber)

  # regress loss on hard and tens under the shape-restriction: "doubly-decreasing" 
  # with a penalty term equal to 1
  # use 13 knots for each predictor
  ans <- wps(loss ~ dd(hard, tens, numknots = c(13, 13)), data = Rubber, pen = 1)

  # make a 3D plot of the constrained surface
  plotpersp(ans)

  # make a 3D plot of the unconstrained surface
  plotpersp(ans, surface = "U")

Run the code above in your browser using DataLab