shaving: Repeated shaving of variables

Description

One of five filter methods can be chosen for repeated shaving of a certain percentage of the worst performing variables. Performance of the reduced models are stored and viewable through print and plot methods.

Usage

shaving(
  y,
  X,
  ncomp = 10,
  method = c("SR", "VIP", "sMC", "LW", "RC"),
  prop = 0.2,
  min.left = 2,
  comp.type = c("CV", "max"),
  validation = c("CV", 1),
  fixed = integer(0),
  newy = NULL,
  newX = NULL,
  segments = 10,
  plsType = "plsr",
  Y.add = NULL,
  ...
)
# S3 method for shaved
plot(x, y, what = c("error", "spectra"), index = "min", log = "x", ...)
# S3 method for shaved
print(x, ...)

Value

Returns a list object of class shaved containing the method type, the error, number of components, and number of variables per reduced model. It also contains a list of all sets of reduced variable sets plus the original data.

Arguments

y: vector of response values (numeric or factor).
X: numeric predictor matrix.
ncomp: integer number of components (default = 10).
method: filter method, i.e. SR, VIP, sMC, LW or RC given as character.
prop: proportion of variables to be removed in each iteration (numeric).
min.left: minimum number of remaining variables.
comp.type: use number of components chosen by cross-validation, "CV", or fixed, "max".
validation: type of validation for plsr. The default is "CV". If more than one set of CV segments is wanted, use a vector of lenth two, e.g. c("CV",5).
fixed: vector of indeces for compulsory/fixed variables that should always be included in the modelling.
newy: validation response for RMSEP/error computations.
newX: validation predictors for RMSEP/error computations.
segments: see mvr for documentation of segment choices.
plsType: Type of PLS model, "plsr" or "cppls".
Y.add: Additional response for CPPLS, see plsType.
...: additional arguments for plsr or cvsegments.
x: object of class shaved for plotting or printing.
what: plot type. Default = "error". Alternative = "spectra".
index: which iteration to plot. Default = "min"; corresponding to minimum RMSEP.
log: logarithmic x (default) or y scale.

Author

Kristian Hovde Liland

Details

Variables are first sorted with respect to some importancemeasure, and usually one of the filter measures described above are used. Secondly, a threshold is used to eliminate a subset of the least informative variables. Then a model is fitted again to the remaining variables and performance is measured. The procedure is repeated until maximum model performance is achieved.

Examples

Run this code

data(mayonnaise, package = "pls")
sh <- shaving(mayonnaise$design[,1], pls::msc(mayonnaise$NIR), type = "interleaved")
pars <- par(mfrow = c(2,1), mar = c(4,4,1,1))
plot(sh)
plot(sh, what = "spectra")
par(pars)
print(sh)

Run the code above in your browser using DataLab