enspls.ad: Ensemble Sparse Partial Least Squares for Model Applicability Domain Evaluation

Description

Model applicability domain evaluation with ensemble sparse partial least squares.

Usage

enspls.ad(x, y, xtest, ytest, maxcomp = 5L, cvfolds = 5L,
  alpha = seq(0.2, 0.8, 0.2), space = c("sample", "variable"),
  method = c("mc", "boot"), reptimes = 500L, ratio = 0.8,
  parallel = 1L)

Value

A list containing:

tr.error.mean - absolute mean prediction error for training set
tr.error.median - absolute median prediction error for training set
tr.error.sd - prediction error sd for training set
tr.error.matrix - raw prediction error matrix for training set
te.error.mean - list of absolute mean prediction error for test set(s)
te.error.median - list of absolute median prediction error for test set(s)
te.error.sd - list of prediction error sd for test set(s)
te.error.matrix - list of raw prediction error matrix for test set(s)

Arguments

x: Predictor matrix of the training set.
y: Response vector of the training set.
xtest: List, with the i-th component being the i-th test set's predictor matrix (see example code below).
ytest: List, with the i-th component being the i-th test set's response vector (see example code below).
maxcomp: Maximum number of components included within each model. If not specified, will use 5 by default.
cvfolds: Number of cross-validation folds used in each model for automatic parameter selection, default is 5.
alpha: Parameter (grid) controlling sparsity of the model. If not specified, default is seq(0.2, 0.8, 0.2).
space: Space in which to apply the resampling method. Can be the sample space ("sample") or the variable space ("variable").
method: Resampling method. "mc" (Monte-Carlo resampling) or "boot" (bootstrapping). Default is "mc".
reptimes: Number of models to build with Monte-Carlo resampling or bootstrapping.
ratio: Sampling ratio used when method = "mc".
parallel: Integer. Number of CPU cores to use. Default is 1 (not parallelized).

Author

Nan Xiao <https://nanx.me>

Examples

Run this code

data("logd1k")
# remove low variance variables
x <- logd1k$x[, -c(17, 52, 59)]
y <- logd1k$y

# training set
x.tr <- x[1:300, ]
y.tr <- y[1:300]

# two test sets
x.te <- list(
  "test.1" = x[301:400, ],
  "test.2" = x[401:500, ]
)
y.te <- list(
  "test.1" = y[301:400],
  "test.2" = y[401:500]
)

set.seed(42)
ad <- enspls.ad(
  x.tr, y.tr, x.te, y.te,
  maxcomp = 3, alpha = c(0.3, 0.6, 0.9),
  space = "variable", method = "mc",
  ratio = 0.8, reptimes = 10
)
print(ad)
plot(ad)
# the interactive plot requires a HTML viewer
if (FALSE) {
plot(ad, type = "interactive")
}

Run the code above in your browser using DataLab