shape: Shape Selection

Description

Given a predictor vector \(\bold{x}\), e.g., years, and a matrix \(\bold{ymat}\) whose columns are response vectors, e.g., Landsat signals. The shape routine will select a shape that is the best fit for each response vector according to the Bayes information criterion (BIC) or the cone information criterion (CIC).

Usage

shape(x, ymat, infocrit = "CIC", flat = TRUE, dec = TRUE, jp = TRUE, 
invee = TRUE, vee = TRUE, inc = TRUE, db = TRUE, nsim = 1e+3, 
edf0 = NULL, get.edf0 = FALSE, random = FALSE, msg = FALSE)

Value

shape: A \(N\) by \(1\) vector. The \(i\)th element is the best shape for each of the \(i\)th scatterplot.
ic: A \(k\) by \(N\) matrix where the \(i\)th column is the vector of "BIC" or "CIC" values used to choose the best shape for the \(i\)th scatterplot. \(k\) is the number of shapes allowed by the user.
thetab: A \(n\) by \(N\) matrix where the \(i\)th column is the vector of predicted values for the chosen shape for the \(i\)th scatterplot.
x: The argument x.
ymat: The argument ymat.
infocrit: The argument infocrit.
k: The number of knots used.
bs: A list of coefficient vectors. Each vector is the vector of coefficients for regression basis functions for each scatterplot.
ijps: A list storing the position of the first jump for scatterplots whose best shape is one-jump or double-jump. It also stores the position of the knot from where \(\bold{f}\) starts increasing (decreasing) for scatterplots whose best shape is vee (inverted vee).
jjps: A list storing the position of the second jump for scatterplots whose best shape is double-jump.
m_is: A vector storing the centering values for the first ramp edge for scatterplots whose best shape is one-jump or double-jump.
m_js: A vector storing the centering values for the second ramp edge for scatterplots whose best shape is double-jump.
tm: Total cpu running time.

Arguments

x: A \(n\) by \(1\) predictor vector, for example, years.
ymat: A \(n\) by \(N\) matrix whose columns are response vectors corresponding to x, for example, Landsat signals.
infocrit: The criterion used to select the best shape for a scatterplot. It can either be the Bayes information criterion (BIC) or the cone information criterion (CIC).
flat: A logical flag. If it is TRUE, there is a flat shape choice; otherwise, there is no such a shape option.
dec: A logical flag. If it is TRUE, there is a decreasing shape choice; otherwise, there is no such a shape option.
jp: A logical flag. If it is TRUE, there is a one-jump shape choice; otherwise, there is no such a shape option.
invee: A logical flag. If it is TRUE, there is an inverted-vee shape choice; otherwise, there is no such a shape option.
vee: A logical flag. If it is TRUE, there is a vee shape choice; otherwise, there is no such a shape option.
inc: A logical flag. If it is TRUE, there is an increasing shape choice; otherwise, there is no such a shape option.
db: A logical flag. If it is TRUE, there is a double-jump shape choice; otherwise, there is no such a shape option. The routine is usually slower when there is a double-jump shape choice than it is when there is no such a choice.
nsim: Number of simulations used to get the edf0 vector. The default is nsim = 1e+3. See references in this section for more details about edf0.
edf0: The edf0 given by the user. When \(\bold{x}\) is an equally spaced vector whose number of elements is between \(20\) and \(40\). The user doesn't need to provide an edf0 vector; otherwise, the user has to set get.edf0 to be TRUE such that the shape routine will simulate an edf0 vector, or the user can choose to simulate an edf0 vector by the getedf0 routine and provide the edf0 vector to the shape routine with this argument. The default is edf0 = NULL.
get.edf0: A logical flag. When \(\bold{x}\) is not an equally spaced vector whose number of elements is between \(20\) and \(40\). The user has to set get.edf0 to be TRUE such that the shape routine will simulate an edf0 vector, or the user can choose to simulate an edf0 vector by the ``getedf0'' routine and provide the edf0 vector to the shape routine with the edf0 argument. The default is get.edf0 = FALSE.
random: A parameter used by the maintainer to test if each shape option can be both included and excluded.
msg: A logical flag. If msg is TRUE, then a warning message will be printed when there is a non-convergence problem; otherwise no warning message will be printed. The default is msg = FALSE

Author

Mary C. Meyer and Xiyue Liao

Details

Given a scatterplot of \((x_i, y_i)\), \(i=1,\ldots,n\), where \(\bold{x}\) could be a vector of years and \(\bold{y}\) could be a vector of Landsat signals, constrained least-squares spline fits are obtained for the following shapes:

1. flat
2. decreasing
3. one-jump, i.e., decreasing, jump up, decreasing
4. inverted vee (increasing then decreasing)
5. vee (decreasing then increasing)
6. linear increasing
7. double-jump, i.e., decreasing, jump up, decreasing, jump up, decreasing.

The "shape" routine chooses one of the shapes allowed by the user based on the minimum Bayes information criterion (BIC) or the cone information criterion (CIC). It also returns the information criterion (IC) values for shapes allowed by the user. Fitting method is constrained quadratic B-splines, number of knots depends on number of observations. The cone projection algorithm used in this routine is implemented by the R package coneproj.

See references cited in this section and the official manual (https://cran.r-project.org/package=coneproj) for the R package coneproj for more details.

References

Meyer, M. C. (2013a) Semi-parametric additive constrained regression. Journal of Nonparametric Statistics 25(3), 715.

Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126--1139.

Liao, X. and M. C. Meyer (2014) coneproj: An R package for the primal or dual cone projections with routines for constrained regression. Journal of Statistical Software 61(12), 1--22.

Examples

Run this code

	# import the matrix of Landsat signals 
	data("ymat")

	# define the predictor vector: the year 1985 to the year 2010	
	x <- 1985:2010
if (FALSE) {
	# Example 1:	
	# call the shape routine allowing a double jump shape using "BIC"
	ans <- shape(x, ymat, "BIC")
	plotshape(ans, ids = 1:6, both = TRUE, form = TRUE)
}
if (FALSE) {
	# Example 2:
	# call the shape routine not allowing a double jump shape using "CIC"
	ans <- shape(x, ymat, "CIC", db = FALSE)
	plotshape(ans, ids = 1:6, both = TRUE, form = TRUE)
}

Run the code above in your browser using DataLab