valid(X, Y, ncomp = 3,
mode = c("regression", "invariant", "classic"),
max.iter = 500, tol = 1e-06, criterion = c("rmsep", "q2"),
method = c("pls", "spls"),
keepX = if(method == "pls") NULL else c(rep(ncol(X), ncomp)),
keepY = if(method == "pls") NULL else c(rep(ncol(Y), ncomp)),
scaleY = TRUE,
validation = c("loo", "Mfold"),
M = if(validation == 'Mfold') 10 else nrow(X))NAs are allowed.NAs are allowed.X."classic", "invariant" or "regression".pls or spls methods.method="spls" numeric vector of length ncomp, the number of variables
weights to keep in $X$-loadings. By default all variables are kept in the model.method="spls" numeric vector of length ncomp, the number of variables
weights to keep in $Y$-loadings. By default all variables are kept in the model.FALSE.valid produces a list with the following components:validation="rmsep" Root Mean Square Error Prediction for each Y variablevalidation="q2" a matrix of RSS values of the $Y$-variables for models
with $1, \ldots ,\code{ncomp}$ components.validation="q2" prediction error sum of squares of the $Y$-variables.
A matrix of PRESS values for models with $1, \ldots ,\code{ncomp}$ components.validation="q2" vector of $Q^2$ values for the extracted components.nipals function. Otherwise, missing
values are handled by casewise deletion in the pls or spls function.
If validation = "Mfold", M-fold cross-validation is performed.
How many folds to generate is selected by specifying the number of folds in M.
If validation = "loo", leave-one-out cross-validation is performed.
The validation criterion "rmsep" allows one to assess the predictive validity of the model (using loo or cross-validation). It produces the estimated error obtained by evaluating the PLS or the sPLS models. "q2" helps choosing the number of (s)PLS dimensions. rmsep. Note that only the classic, regression and invariant modes can be applied.
What follows is the definition of these criteria:
Let $n$ the number of individuals (experimetals units).
The fraction of the variation of a variable $y_{k}$ that can be predicted
by a component, as estimated by cross-validation, is computed as:
$$Q_{kh}^2 = 1-\frac{PRESS_{kh}}{RSS_{k(h-1)}}$$
where
$$PRESS_{kh} = \sum_{i=1}^{n}(y_{ik} - \hat{y}_{(-i)k}^h)^2$$
is the PRediction Error Sum of Squares and
$$RSS_{kh} = \sum_{i=1}^{n}(y_{ik} - \hat{y}_{ik}^h)^2$$
is the Residual Sum of Squares for the variable $k$, ($k=1, \ldots ,q$)
and the PLS variate $h$, ($h=1, \ldots ,H$).
For $h=0$, $RSS_{kh} = n-1$.
The fraction of the total variation of $Y$ that can be predicted by a component,
as estimated by cross-validation, is computed as:
$$Q_h^2 = 1-\frac{\sum_{k=1}^{q}PRESS_{kh}}{\sum_{k=1}^{q}RSS_{k(h-1)}}$$
The cumulative $(Q_{cum}^2)_{kh}$ of a variable is computed as:
$$(Q_{cum}^2)_{kh} = 1-\prod_{j=1}^h\frac{PRESS_{kj}}{RSS_{k(j-1)}}$$
and the cumulative $(Q_{cum}^2)_h$ for the extracted components is computed as:
$$(Q_{cum}^2)_h = 1-\prod_{j=1}^h\frac{\sum_{k=1}^{q}PRESS_{kj}}{\sum_{k=1}^{q}RSS_{k(j-1)}}$$predict.data(linnerud)
X <- linnerud$exercise
Y <- linnerud$physiological
## computing the RMSEP with 10-fold CV with pls
error <- valid(X, Y, mode = "regression", ncomp = 3, method = "pls",
validation = "Mfold", criterion = "rmsep")
error$rmsepRun the code above in your browser using DataLab