prm_cv: Cross-validation for robust PLS

Description

Cross-validation (CV) is carried out with robust PLS based on partial robust M-regression. A plot with the choice for the optimal number of components is generated. This only works for univariate y-data.

Usage

prm_cv(X, y, a, fairct = 4, opt = "median", subset = NULL, segments = 10, 
segment.type = "random", trim = 0.2, sdfact = 2, plot.opt = TRUE)

Value

predicted: matrix with length(y) rows and a columns with predicted values
SEPall: vector of length a with SEP values for each number of components
SEPtrim: vector of length a with trimmed SEP values for each number of components
SEPj: matrix with segments rows and a columns with SEP values within the CV for each number of components
SEPtrimj: matrix with segments rows and a columns with trimmed SEP values within the CV for each number of components
optcomp: final optimal number of PLS components
SEPopt: trimmed SEP value for final optimal number of PLS components

Arguments

X: predictor matrix
y: response variable
a: number of PLS components
fairct: tuning constant, by default fairct=4
opt: if "l1m" the mean centering is done by the l1-median, otherwise by the coordinate-wise median
subset: optional vector defining a subset of objects
segments: the number of segments to use or a list with segments (see mvrCv)
segment.type: the type of segments to use. Ignored if 'segments' is a list
trim: trimming percentage for the computation of the SEP
sdfact: factor for the multiplication of the standard deviation for the determination of the optimal number of components, see mvr_dcv
plot.opt: if TRUE a plot will be generated that shows the selection of the optimal number of components for each step of the CV, see mvr_dcv

Author

Peter Filzmoser <P.Filzmoser@tuwien.ac.at>

Details

A function for robust PLS based on partial robust M-regression is available at prm. The optimal number of robust PLS components is chosen according to the following criterion: Within the CV scheme, the mean of the trimmed SEPs SEPtrimave is computed for each number of components, as well as their standard errors SEPtrimse. Then one searches for the minimum of the SEPtrimave values and adds sdfact*SEPtrimse. The optimal number of components is the most parsimonious model that is below this bound.

References

K. Varmuza and P. Filzmoser: Introduction to Multivariate Statistical Analysis in Chemometrics. CRC Press, Boca Raton, FL, 2009.

Examples

Run this code

data(cereal)
set.seed(123)
res <- prm_cv(cereal$X,cereal$Y[,1],a=5,segments=4,plot.opt=TRUE)

Run the code above in your browser using DataLab