valid(X, Y, ncomp = 3,
mode = c("regression", "invariant", "classic"),
max.iter = 500, tol = 1e-06, criterion = c("rmsep", "q2"),
method = c("pls", "spls"),
keepX = if(method == "pls") NULL else c(rep(ncol(X), ncomp)),
keepY = if(method == "pls") NULL else c(rep(ncol(Y), ncomp)),
scaleY = TRUE,
validation = c("loo", "Mfold"),
M = if(validation == 'Mfold') 10 else nrow(X))
NA
s are allowed.NA
s are allowed.X
."classic"
, "invariant"
or "regression"
.pls
or spls
methods.method="spls"
numeric vector of length ncomp
, the number of variables
weights to keep in $X$-loadings. By default all variables are kept in the model.method="spls"
numeric vector of length ncomp
, the number of variables
weights to keep in $Y$-loadings. By default all variables are kept in the model.FALSE
.valid
produces a list with the following components:validation="rmsep"
Root Mean Square Error Prediction for each Y variablevalidation="q2"
a matrix of RSS values of the $Y$-variables for models
with $1, \ldots ,\code{ncomp}$ components.validation="q2"
prediction error sum of squares of the $Y$-variables.
A matrix of PRESS values for models with $1, \ldots ,\code{ncomp}$ components.validation="q2"
vector of $Q^2$ values for the extracted components.nipals
function. Otherwise, missing
values are handled by casewise deletion in the pls
or spls
function.
If validation = "Mfold"
, M-fold cross-validation is performed.
How many folds to generate is selected by specifying the number of folds in M
.
If validation = "loo"
, leave-one-out cross-validation is performed.
The validation criterion "rmsep"
allows one to assess the predictive validity of the model (using loo or cross-validation). It produces the estimated error obtained by evaluating the PLS or the sPLS models. "q2"
helps choosing the number of (s)PLS dimensions. rmsep
. Note that only the classic, regression and invariant modes can be applied.
What follows is the definition of these criteria:
Let $n$ the number of individuals (experimetals units).
The fraction of the variation of a variable $y_{k}$ that can be predicted
by a component, as estimated by cross-validation, is computed as:
$$Q_{kh}^2 = 1-\frac{PRESS_{kh}}{RSS_{k(h-1)}}$$
where
$$PRESS_{kh} = \sum_{i=1}^{n}(y_{ik} - \hat{y}_{(-i)k}^h)^2$$
is the PRediction Error Sum of Squares and
$$RSS_{kh} = \sum_{i=1}^{n}(y_{ik} - \hat{y}_{ik}^h)^2$$
is the Residual Sum of Squares for the variable $k$, ($k=1, \ldots ,q$)
and the PLS variate $h$, ($h=1, \ldots ,H$).
For $h=0$, $RSS_{kh} = n-1$.
The fraction of the total variation of $Y$ that can be predicted by a component,
as estimated by cross-validation, is computed as:
$$Q_h^2 = 1-\frac{\sum_{k=1}^{q}PRESS_{kh}}{\sum_{k=1}^{q}RSS_{k(h-1)}}$$
The cumulative $(Q_{cum}^2)_{kh}$ of a variable is computed as:
$$(Q_{cum}^2)_{kh} = 1-\prod_{j=1}^h\frac{PRESS_{kj}}{RSS_{k(j-1)}}$$
and the cumulative $(Q_{cum}^2)_h$ for the extracted components is computed as:
$$(Q_{cum}^2)_h = 1-\prod_{j=1}^h\frac{\sum_{k=1}^{q}PRESS_{kj}}{\sum_{k=1}^{q}RSS_{k(j-1)}}$$predict
.data(linnerud)
X <- linnerud$exercise
Y <- linnerud$physiological
## computing the RMSEP with 10-fold CV with pls
error <- valid(X, Y, mode = "regression", ncomp = 3, method = "pls",
validation = "Mfold", criterion = "rmsep")
error$rmsep
Run the code above in your browser using DataLab