Learn R Programming

mixOmics (version 2.8-1)

valid: Compute validation criterion for PLS and Sparse PLS

Description

Function to estimate the root mean squared error of prediction (RMSEP) and the Q2 criterion for PLS (classic, regression and invariant modes) and sPLS (regression). Cross-validation or leave-one-out cross-validation are implemented.

Usage

valid(X, Y, ncomp = 3,
      mode = c("regression", "invariant", "classic"),
      max.iter = 500, tol = 1e-06, criterion = c("rmsep", "q2"),
      method = c("pls", "spls"),
      keepX = if(method == "pls") NULL else c(rep(ncol(X), ncomp)),
      keepY = if(method == "pls") NULL else c(rep(ncol(Y), ncomp)),
      scaleY = TRUE,
      validation = c("loo", "Mfold"),
      M = if(validation == 'Mfold') 10 else nrow(X))

Arguments

X
numeric matrix of predictors. NAs are allowed.
Y
numeric vector or matrix of responses (for multi-response models). NAs are allowed.
ncomp
the number of components to include in the model. Default is from one to the rank of X.
mode
character string. What type of algorithm to use, matching one of "classic", "invariant" or "regression".
max.iter
integer, the maximum number of iterations.
tol
a not negative real, the tolerance used in the iterative algorithm.
criterion
character string. What type of validation criterion to use, see details.
method
character. pls or spls methods.
keepX
if method="spls" numeric vector of length ncomp, the number of variables weights to keep in $X$-loadings. By default all variables are kept in the model.
keepY
if method="spls" numeric vector of length ncomp, the number of variables weights to keep in $Y$-loadings. By default all variables are kept in the model.
scaleY
should the Y data be scaled ? In the case of a 'discriminant' version of the (s)PLS where the Y data are of discrete type, this should be set to FALSE.
validation
character. What kind of (internal) validation to use. See below.
M
the number of folds in the Mfold cross-validation.

Value

  • valid produces a list with the following components:
  • Y.hatthe predicted values using cross-validation
  • foldindicates which folds the samples belong to wen using k-fold cross-validation
  • rmsepif validation="rmsep" Root Mean Square Error Prediction for each Y variable
  • RSSif validation="q2" a matrix of RSS values of the $Y$-variables for models with $1, \ldots ,\code{ncomp}$ components.
  • PRESSif validation="q2" prediction error sum of squares of the $Y$-variables. A matrix of PRESS values for models with $1, \ldots ,\code{ncomp}$ components.
  • q2if validation="q2" vector of $Q^2$ values for the extracted components.

encoding

latin1

Details

The estimation of the missing values can be performed by the reconstitution of the data matrix using the nipals function. Otherwise, missing values are handled by casewise deletion in the pls or spls function. If validation = "Mfold", M-fold cross-validation is performed. How many folds to generate is selected by specifying the number of folds in M. If validation = "loo", leave-one-out cross-validation is performed. The validation criterion "rmsep" allows one to assess the predictive validity of the model (using loo or cross-validation). It produces the estimated error obtained by evaluating the PLS or the sPLS models. "q2" helps choosing the number of (s)PLS dimensions. rmsep. Note that only the classic, regression and invariant modes can be applied. What follows is the definition of these criteria: Let $n$ the number of individuals (experimetals units). The fraction of the variation of a variable $y_{k}$ that can be predicted by a component, as estimated by cross-validation, is computed as: $$Q_{kh}^2 = 1-\frac{PRESS_{kh}}{RSS_{k(h-1)}}$$ where $$PRESS_{kh} = \sum_{i=1}^{n}(y_{ik} - \hat{y}_{(-i)k}^h)^2$$ is the PRediction Error Sum of Squares and $$RSS_{kh} = \sum_{i=1}^{n}(y_{ik} - \hat{y}_{ik}^h)^2$$ is the Residual Sum of Squares for the variable $k$, ($k=1, \ldots ,q$) and the PLS variate $h$, ($h=1, \ldots ,H$). For $h=0$, $RSS_{kh} = n-1$. The fraction of the total variation of $Y$ that can be predicted by a component, as estimated by cross-validation, is computed as: $$Q_h^2 = 1-\frac{\sum_{k=1}^{q}PRESS_{kh}}{\sum_{k=1}^{q}RSS_{k(h-1)}}$$ The cumulative $(Q_{cum}^2)_{kh}$ of a variable is computed as: $$(Q_{cum}^2)_{kh} = 1-\prod_{j=1}^h\frac{PRESS_{kj}}{RSS_{k(j-1)}}$$ and the cumulative $(Q_{cum}^2)_h$ for the extracted components is computed as: $$(Q_{cum}^2)_h = 1-\prod_{j=1}^h\frac{\sum_{k=1}^{q}PRESS_{kj}}{\sum_{k=1}^{q}RSS_{k(j-1)}}$$

References

Tenenhaus, M. (1998). La r�gression PLS: th�orie et pratique. Paris: Editions Technic. L� Cao, K. A., Rossouw D., Robert-Grani�, C. and Besse, P. (2008). A sparse PLS for variable selection when integrating Omics data. Statistical Applications in Genetics and Molecular Biology 7, article 35.

See Also

predict.

Examples

Run this code
data(linnerud)
X <- linnerud$exercise
Y <- linnerud$physiological

## computing the RMSEP with 10-fold CV with pls
error <- valid(X, Y, mode = "regression", ncomp = 3, method = "pls", 
               validation = "Mfold", criterion = "rmsep")
error$rmsep

Run the code above in your browser using DataLab