Learn R Programming

chemometrics (version 1.4.4)

prm_cv: Cross-validation for robust PLS

Description

Cross-validation (CV) is carried out with robust PLS based on partial robust M-regression. A plot with the choice for the optimal number of components is generated. This only works for univariate y-data.

Usage

prm_cv(X, y, a, fairct = 4, opt = "median", subset = NULL, segments = 10, 
segment.type = "random", trim = 0.2, sdfact = 2, plot.opt = TRUE)

Value

predicted

matrix with length(y) rows and a columns with predicted values

SEPall

vector of length a with SEP values for each number of components

SEPtrim

vector of length a with trimmed SEP values for each number of components

SEPj

matrix with segments rows and a columns with SEP values within the CV for each number of components

SEPtrimj

matrix with segments rows and a columns with trimmed SEP values within the CV for each number of components

optcomp

final optimal number of PLS components

SEPopt

trimmed SEP value for final optimal number of PLS components

Arguments

X

predictor matrix

y

response variable

a

number of PLS components

fairct

tuning constant, by default fairct=4

opt

if "l1m" the mean centering is done by the l1-median, otherwise by the coordinate-wise median

subset

optional vector defining a subset of objects

segments

the number of segments to use or a list with segments (see mvrCv)

segment.type

the type of segments to use. Ignored if 'segments' is a list

trim

trimming percentage for the computation of the SEP

sdfact

factor for the multiplication of the standard deviation for the determination of the optimal number of components, see mvr_dcv

plot.opt

if TRUE a plot will be generated that shows the selection of the optimal number of components for each step of the CV, see mvr_dcv

Author

Peter Filzmoser <P.Filzmoser@tuwien.ac.at>

Details

A function for robust PLS based on partial robust M-regression is available at prm. The optimal number of robust PLS components is chosen according to the following criterion: Within the CV scheme, the mean of the trimmed SEPs SEPtrimave is computed for each number of components, as well as their standard errors SEPtrimse. Then one searches for the minimum of the SEPtrimave values and adds sdfact*SEPtrimse. The optimal number of components is the most parsimonious model that is below this bound.

References

K. Varmuza and P. Filzmoser: Introduction to Multivariate Statistical Analysis in Chemometrics. CRC Press, Boca Raton, FL, 2009.

See Also

prm

Examples

Run this code
data(cereal)
set.seed(123)
res <- prm_cv(cereal$X,cereal$Y[,1],a=5,segments=4,plot.opt=TRUE)

Run the code above in your browser using DataLab