Learn R Programming

chemometrics (version 1.4.4)

prm_dcv: Repeated double-cross-validation for robust PLS

Description

Performs a careful evaluation by repeated double-CV for robust PLS, called PRM (partial robust M-estimation).

Usage

prm_dcv(X,Y,a=10,repl=10,segments0=4,segments=7,segment0.type="random",
  segment.type="random",sdfact=2,fairct=4,trim=0.2,opt="median",plot.opt=FALSE, ...)

Value

b

estimated regression coefficients

intercept

estimated regression intercept

resopt

array [nrow(Y) x ncol(Y) x repl] with residuals using optimum number of components

predopt

array [nrow(Y) x ncol(Y) x repl] with predicted Y using optimum number of components

optcomp

matrix [segments0 x repl] optimum number of components for each training set

residcomp

array [nrow(Y) x ncomp x repl] with residuals using optimum number of components

pred

array [nrow(Y) x ncol(Y) x ncomp x repl] with predicted Y for all numbers of components

SEPall

matrix [ncomp x repl] with SEP values

SEPtrim

matrix [ncomp x repl] with trimmed SEP values

SEPcomp

vector of length ncomp with trimmed SEP values; use the element afinal for the optimal trimmed SEP

afinal

final optimal number of components

SEPopt

trimmed SEP over all residuals using optimal number of components

Arguments

X

predictor matrix

Y

response variable

a

number of PLS components

repl

Number of replicattion for the double-CV

segments0

the number of segments to use for splitting into training and test data, or a list with segments (see mvrCv)

segments

the number of segments to use for selecting the optimal number if components, or a list with segments (see mvrCv)

segment0.type

the type of segments to use. Ignored if 'segments0' is a list

segment.type

the type of segments to use. Ignored if 'segments' is a list

sdfact

factor for the multiplication of the standard deviation for the determination of the optimal number of components, see mvr_dcv

fairct

tuning constant, by default fairct=4

trim

trimming percentage for the computation of the SEP

opt

if "l1m" the mean centering is done by the l1-median, otherwise if "median", by the coordinate-wise median

plot.opt

if TRUE a plot will be generated that shows the selection of the optimal number of components for each step of the CV

...

additional parameters

Author

Peter Filzmoser <P.Filzmoser@tuwien.ac.at>

Details

In this cross-validation (CV) scheme, the optimal number of components is determined by an additional CV in the training set, and applied to the test set. The procedure is repeated repl times. The optimal number of components is the model with the smallest number of components which is still in the range of the MSE+sdfact*sd(MSE), where MSE and sd are taken from the minimum.

References

K. Varmuza and P. Filzmoser: Introduction to Multivariate Statistical Analysis in Chemometrics. CRC Press, Boca Raton, FL, 2009.

See Also

mvr

Examples

Run this code
data(NIR)
X <- NIR$xNIR[1:30,]      # first 30 observations - for illustration
y <- NIR$yGlcEtOH[1:30,1] # only variable Glucose
NIR.Glc <- data.frame(X=X, y=y)
res <- prm_dcv(X,y,a=3,repl=2)

Run the code above in your browser using DataLab