s.SPLS: Sparse Partial Least Squares Regression [C, R]

Description

Train an SPLS model using spls::spls (Regression) and spls::splsda (Classification)

Usage

s.SPLS(x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL,
  y.name = NULL, k = 2, eta = 0.5, kappa = 0.5, select = "pls2",
  fit = "simpls", scale.x = TRUE, scale.y = TRUE, maxstep = 100,
  classifier = c("lda", "logistic"),
  grid.resample.rtset = rtset.resample("kfold", 5),
  grid.search.type = c("exhaustive", "randomized"),
  grid.randomized.p = 0.1, metric = NULL, maximize = NULL,
  print.plot = TRUE, plot.fitted = NULL, plot.predicted = NULL,
  plot.theme = getOption("rt.fit.theme", "lightgrid"), question = NULL,
  rtclass = NULL, verbose = TRUE, trace = 0, grid.verbose = TRUE,
  outdir = NULL, save.mod = ifelse(!is.null(outdir), TRUE, FALSE),
  n.cores = rtCores, ...)

Arguments

Numeric vector or matrix / data frame of features i.e. independent variables

Numeric vector of outcome, i.e. dependent variable

x.test

Numeric vector or matrix / data frame of testing set features Columns must correspond to columns in x

y.test

Numeric vector of testing set outcome

x.name

Character: Name for feature set

y.name

Character: Name for outcome

[gS] Integer: Number of components to estimate. Default = 2

eta

[gS] Float [0, 1): Thresholding parameter. Default = .5

kappa

[gS] Float [0, .5]: Only relevant for multivariate responses: controls effect of concavity of objective function. Default = .5

select

[gS] String: "pls2", "simpls". PLS algorithm for variable selection. Default = "pls2"

fit

[gS] String: "kernelpls", "widekernelpls", "simpls", "oscorespls". Algorithm for model fitting. Default = "simpls"

scale.x

Logical: if TRUE, scale features by dividing each column by its sample standard deviation

scale.y

Logical: if TRUE, scale outcomes by dividing each column by its sample standard deviation

maxstep

[gS] Integer: Maximum number of iteration when fitting direction vectors. Default = 100

classifier

String: Classifier used by spls::splsda "lda" or "logistic": Default = "lda"

grid.resample.rtset

List: Output of rtset.resample defining gridSearchLearn parameters. Default = rtset.resample("kfold", 5)

grid.search.type

String: Type of grid search to perform: "exhaustive" or "randomized". Default = "exhaustive"

grid.randomized.p

Float (0, 1): If grid.search.type = "randomized", randomly run this proportion of combinations. Default = .1

metric

String: Metric to minimize, or maximize if maximize = TRUE during grid search. Default = NULL, which results in "Balanced Accuracy" for Classification, "MSE" for Regression, and "Coherence" for Survival Analysis.

maximize

Logical: If TRUE, metric will be maximized if grid search is run. Default = FALSE

print.plot

Logical: if TRUE, produce plot using mplot3 Takes precedence over plot.fitted and plot.predicted

plot.fitted

Logical: if TRUE, plot True (y) vs Fitted

plot.predicted

Logical: if TRUE, plot True (y.test) vs Predicted. Requires x.test and y.test

plot.theme

String: "zero", "dark", "box", "darkbox"

question

String: the question you are attempting to answer with this model, in plain language.

verbose

Logical: If TRUE, print summary to screen.

grid.verbose

Logical: Passed to gridSearchLearn

outdir

Path to output directory. If defined, will save Predicted vs. True plot, if available, as well as full model output, if save.mod is TRUE

save.mod

Logical. If TRUE, save all output as RDS file in outdir save.mod is TRUE by default if an outdir is defined. If set to TRUE, and no outdir is defined, outdir defaults to paste0("./s.", mod.name)

n.cores

Integer: Number of cores to be used by gridSearchLearn, if applicable

...

Additional parameters to be passed to npreg

Value

Object of class rtemis

Details

[gS] denotes argument can be passed as a vector of values, which will trigger a grid search using gridSearchLearn np::npreg allows inputs with mixed data types.

Examples

Run this code

# NOT RUN {
x <- rnorm(100)
y <- .6 * x + 12 + rnorm(100)
mod <- s.SPLS(x, y)
# }

Run the code above in your browser using DataLab