Cross-validation (CV) is carried out with robust PLS based on partial robust M-regression. A plot with the choice for the optimal number of components is generated. This only works for univariate y-data.
prm_cv(X, y, a, fairct = 4, opt = "median", subset = NULL, segments = 10,
segment.type = "random", trim = 0.2, sdfact = 2, plot.opt = TRUE)
matrix with length(y) rows and a columns with predicted values
vector of length a with SEP values for each number of components
vector of length a with trimmed SEP values for each number of components
matrix with segments rows and a columns with SEP values within the CV for each number of components
matrix with segments rows and a columns with trimmed SEP values within the CV for each number of components
final optimal number of PLS components
trimmed SEP value for final optimal number of PLS components
predictor matrix
response variable
number of PLS components
tuning constant, by default fairct=4
if "l1m" the mean centering is done by the l1-median, otherwise by the coordinate-wise median
optional vector defining a subset of objects
the number of segments to use or a list with segments (see
mvrCv
)
the type of segments to use. Ignored if 'segments' is a list
trimming percentage for the computation of the SEP
factor for the multiplication of the standard deviation for
the determination of the optimal number of components, see
mvr_dcv
if TRUE a plot will be generated that shows the selection of the
optimal number of components for each step of the CV, see
mvr_dcv
Peter Filzmoser <P.Filzmoser@tuwien.ac.at>
A function for robust PLS based on partial robust M-regression is available at
prm
. The optimal number of robust PLS components is chosen according
to the following criterion: Within the CV scheme, the mean of the trimmed SEPs
SEPtrimave is computed for each number of components, as well as their standard
errors SEPtrimse. Then one searches for the minimum of the SEPtrimave values and
adds sdfact*SEPtrimse. The optimal number of components is the most parsimonious
model that is below this bound.
K. Varmuza and P. Filzmoser: Introduction to Multivariate Statistical Analysis in Chemometrics. CRC Press, Boca Raton, FL, 2009.
prm
data(cereal)
set.seed(123)
res <- prm_cv(cereal$X,cereal$Y[,1],a=5,segments=4,plot.opt=TRUE)
Run the code above in your browser using DataLab