One of five filter methods can be chosen for repeated shaving of
a certain percentage of the worst performing variables. Performance of the
reduced models are stored and viewable through print and plot
methods.
shaving(
y,
X,
ncomp = 10,
method = c("SR", "VIP", "sMC", "LW", "RC"),
prop = 0.2,
min.left = 2,
comp.type = c("CV", "max"),
validation = c("CV", 1),
fixed = integer(0),
newy = NULL,
newX = NULL,
segments = 10,
plsType = "plsr",
Y.add = NULL,
...
)# S3 method for shaved
plot(x, y, what = c("error", "spectra"), index = "min", log = "x", ...)
# S3 method for shaved
print(x, ...)
Returns a list object of class shaved containing the method type,
the error, number of components, and number of variables per reduced model. It
also contains a list of all sets of reduced variable sets plus the original data.
vector of response values (numeric or factor).
numeric predictor matrix.
integer number of components (default = 10).
filter method, i.e. SR, VIP, sMC, LW or RC given as character.
proportion of variables to be removed in each iteration (numeric).
minimum number of remaining variables.
use number of components chosen by cross-validation, "CV",
or fixed, "max".
type of validation for plsr. The default is "CV". If more
than one set of CV segments is wanted, use a vector of lenth two, e.g. c("CV",5).
vector of indeces for compulsory/fixed variables that should always be included in the modelling.
validation response for RMSEP/error computations.
validation predictors for RMSEP/error computations.
see mvr for documentation of segment choices.
Type of PLS model, "plsr" or "cppls".
Additional response for CPPLS, see plsType.
additional arguments for plsr or cvsegments.
object of class shaved for plotting or printing.
plot type. Default = "error". Alternative = "spectra".
which iteration to plot. Default = "min"; corresponding to minimum RMSEP.
logarithmic x (default) or y scale.
Kristian Hovde Liland
Variables are first sorted with respect to some importancemeasure, and usually one of the filter measures described above are used. Secondly, a threshold is used to eliminate a subset of the least informative variables. Then a model is fitted again to the remaining variables and performance is measured. The procedure is repeated until maximum model performance is achieved.
VIP (SR/sMC/LW/RC), filterPLSR, shaving,
stpls, truncation,
bve_pls, ga_pls, ipw_pls, mcuve_pls,
rep_pls, spa_pls,
lda_from_pls, lda_from_pls_cv, setDA.
data(mayonnaise, package = "pls")
sh <- shaving(mayonnaise$design[,1], pls::msc(mayonnaise$NIR), type = "interleaved")
pars <- par(mfrow = c(2,1), mar = c(4,4,1,1))
plot(sh)
plot(sh, what = "spectra")
par(pars)
print(sh)
Run the code above in your browser using DataLab