Performs a careful evaluation by repeated double-CV for robust PLS, called PRM (partial robust M-estimation).
prm_dcv(X,Y,a=10,repl=10,segments0=4,segments=7,segment0.type="random",
segment.type="random",sdfact=2,fairct=4,trim=0.2,opt="median",plot.opt=FALSE, ...)
estimated regression coefficients
estimated regression intercept
array [nrow(Y) x ncol(Y) x repl] with residuals using optimum number of components
array [nrow(Y) x ncol(Y) x repl] with predicted Y using optimum number of components
matrix [segments0 x repl] optimum number of components for each training set
array [nrow(Y) x ncomp x repl] with residuals using optimum number of components
array [nrow(Y) x ncol(Y) x ncomp x repl] with predicted Y for all numbers of components
matrix [ncomp x repl] with SEP values
matrix [ncomp x repl] with trimmed SEP values
vector of length ncomp with trimmed SEP values; use the element afinal for the optimal trimmed SEP
final optimal number of components
trimmed SEP over all residuals using optimal number of components
predictor matrix
response variable
number of PLS components
Number of replicattion for the double-CV
the number of segments to use for splitting into training and
test data, or a list with segments (see mvrCv
)
the number of segments to use for selecting the optimal number if
components, or a list with segments (see mvrCv
)
the type of segments to use. Ignored if 'segments0' is a list
the type of segments to use. Ignored if 'segments' is a list
factor for the multiplication of the standard deviation for
the determination of the optimal number of components, see
mvr_dcv
tuning constant, by default fairct=4
trimming percentage for the computation of the SEP
if "l1m" the mean centering is done by the l1-median, otherwise if "median", by the coordinate-wise median
if TRUE a plot will be generated that shows the selection of the optimal number of components for each step of the CV
additional parameters
Peter Filzmoser <P.Filzmoser@tuwien.ac.at>
In this cross-validation (CV) scheme, the optimal number of components is determined by an additional CV in the training set, and applied to the test set. The procedure is repeated repl times. The optimal number of components is the model with the smallest number of components which is still in the range of the MSE+sdfact*sd(MSE), where MSE and sd are taken from the minimum.
K. Varmuza and P. Filzmoser: Introduction to Multivariate Statistical Analysis in Chemometrics. CRC Press, Boca Raton, FL, 2009.
mvr
data(NIR)
X <- NIR$xNIR[1:30,] # first 30 observations - for illustration
y <- NIR$yGlcEtOH[1:30,1] # only variable Glucose
NIR.Glc <- data.frame(X=X, y=y)
res <- prm_dcv(X,y,a=3,repl=2)
Run the code above in your browser using DataLab