crisp
function in that the tuning parameter, lambda, is automatically selected using K-fold cross-validation.
More details are provided in Petersen, A., Simon, N., and Witten, D. (2016). Convex Regression with Interpretable Sharp Partitions. Journal of Machine Learning Research, 17(94): 1-31 crispCV(y, X, q = NULL, lambda.min.ratio = 0.01, n.lambda = 50, lambda.seq = NULL, fold = NULL, n.fold = NULL, seed = NULL, within1SE = FALSE, rho = 0.1, e_abs = 10^-4, e_rel = 10^-3, varyrho = TRUE, double.run = FALSE)
M.hat
, which will be a q
by q
matrix. M.hat
is a mean matrix whose element M.hat[i,j]
contains the mean for pairs of covariate values within a quantile range
of the observed predictors X[,1]
and X[,2]
. For example, M.hat[1,2]
represents the
mean of the observations with the first covariate value less than the 1/q
-quantile of X[,1]
,
and the second covariate value between the 1/q
- and 2/q
-quantiles of X[,2]
.
If left NULL
, then q=n
is used when n<100, and="" q=100 is used when n>=100.
We recommend using q<=100< code=""> as higher values take longer to fit and provide an unneeded amount of granularity.=100<>
100,>
lambda.seq
, as a fraction of the maximum lambda value, which is the data-derived
smallest value for which the fit is a constant value. The default is 0.01.lambda.seq
using lambda.min.ratio
and n.lambda
, but providing lambda.seq
overrides this. If provided,
lambda.seq
should be a decreasing sequence of values, since CRISP relies on warm starts for speed.
Thus fitting the model for a whole sequence of lambda values is often faster than fitting for a single lambda value.fold
should be an n-vector with entries in 1,...,K when doing K-fold cross-validation. The default is to choose fold
using n.fold
.fold
overrides use of n.fold
.set.seed()
at the beginning of the function. This is only relevant if fold
is not specified by the user.within1SE=TRUE
, lambda is chosen to be the value corresponding to the most sparse model with cross-validation error within one standard error of the minimum cross-validation error. If within1SE=FALSE
, lambda is chosen to be the value corresponding to the minimum cross-validation error.rho
be varied from iteration to iteration? This is discussed in Appendix C.3 of the CRISP paper.M.hat
. If double.run
is TRUE
, then the algorithm
is run a second time to obtain M.hat
with exact equality of the appropriate rows and columns. This issue
is discussed further in Appendix C.4 of the CRISP paper.crispCV
, which can be summarized using summary
, plotted using plot
, and used to predict outcome values for new covariates using predict
.
lambda.cv
: Optimal lambda value chosen by K-fold cross-validation.
index.cv
: The index of the model corresponding to the chosen tuning parameter, lambda.cv
. That is, lambda.cv=crisp.out$lambda.seq[index.cv]
.
crisp.out
: An object of class crisp
returned by crisp
.
mean.cv.error
: An m-vector containing cross-validation error where m is the length of lambda.seq
. Note that mean.cv.error[i]
contains the cross-validation error for the tuning parameter crisp.out$lambda.seq[i]
.
se.cv.error
: An m-vector containing cross-validation standard error where m is the length of lambda.seq
. Note that se.cv.error[i]
contains the standard error of the cross-validation error for the tuning parameter crisp.out$lambda.seq[i]
.
crisp
, plot
, summary
, predict
, plot.cvError
## Not run:
# #See ?'crisp-package' for a full example of how to use this package
#
# #generate data (using a very small 'n' for illustration purposes)
# set.seed(1)
# data <- sim.data(n = 15, scenario = 2)
#
# #fit model and select lambda using 2-fold cross-validation
# #note: use larger 'n.fold' (e.g., 10) in practice
# crispCV.out <- crispCV(X = data$X, y = data$y, n.fold = 2)
# ## End(Not run)
Run the code above in your browser using DataLab