rtset: rtemis default-setting functions

Description

These functions output lists of default settings for different rtemis functions. This removes the need of passing named lists of arguments, and provides autocompletion, making it easier to setup functions without having to refer to the manual.

Usage

rtset.resample(resampler = "kfold", n.resamples = 10,
  stratify.var = NULL, train.p = 0.75, strat.n.bins = 4,
  target.length = NULL, seed = NULL, verbose = TRUE)
rtset.grid.resample(resampler = "strat.boot", n.resamples = 10,
  stratify.var = NULL, train.p = 0.75, strat.n.bins = 4,
  target.length = NULL, verbose = TRUE)
rtset.bag.resample(resampler = "strat.sub", n.resamples = 10,
  stratify.var = NULL, train.p = 0.75, strat.n.bins = 4,
  target.length = NULL, verbose = TRUE)
rtset.meta.resample(resampler = "strat.sub", n.resamples = 4,
  stratify.var = NULL, train.p = 0.75, strat.n.bins = 4,
  target.length = NULL, verbose = TRUE)
rtset.cv.resample(resampler = "kfold", n.resamples = 10,
  stratify.var = NULL, train.p = 0.75, strat.n.bins = 4,
  target.length = NULL, verbose = TRUE)
rtset.cluster(type = "fork", hosts = NULL, n.cores = rtCores, ...)
rtset.color(n = 101, colors = NULL, space = "rgb", lo = "#01256E",
  lomid = NULL, mid = "white", midhi = NULL, hi = "#95001A",
  colorbar = FALSE, cb.mar = c(1, 1, 1, 1), ...)
rtset.preprocess(completeCases = FALSE, impute = FALSE,
  impute.type = "missForest", impute.niter = 10, impute.ntree = 500,
  impute.discrete = getMode, impute.numeric = mean,
  removeCases.thres = NULL, removeFeatures.thres = NULL,
  integer2factor = FALSE, nonzeroFactors = FALSE, scale = FALSE,
  center = FALSE, removeConstant = TRUE, oneHot = FALSE)
rtset.decompose(decom = "ICA", k = 2, ...)
rtset.ADDT(max.depth = 2, learning.rate = 1, lin.type = "glmnet",
  alpha = 0, lambda = 0.1, minobsinnode = 2, minobsinnode.lin = 20,
  ...)
rtset.GBM(interaction.depth = 2, shrinkage = 0.001, max.trees = 5000,
  min.trees = 100, bag.fraction = 0.9, n.minobsinnode = 5,
  grid.resample.rtset = rtset.resample("kfold", 5), ipw = TRUE,
  upsample = FALSE, upsample.seed = NULL, ...)
rtset.RANGER(n.trees = 1000, min.node.size = 1, mtry = NULL,
  grid.resample.rtset = rtset.resample("kfold", 5), ipw = TRUE,
  upsample = FALSE, upsample.seed = NULL, ...)
rtset.DN(hidden = 1, activation = NULL, learning.rate = 0.8,
  momentum = 0.5, learningrate_scale = 1, output = NULL,
  numepochs = 100, batchsize = NULL, hidden_dropout = 0,
  visible_dropout = 0, ...)
rtset.MXN(n.hidden.nodes = NULL, output = NULL, activation = "relu",
  ctx = mxnet::mx.cpu(), optimizer = "sgd",
  initializer = mxnet::mx.init.Xavier(), batch.size = NULL,
  momentum = 0.9, max.epochs = 2000, min.epochs = 25,
  early.stop = "train", early.stop.n.steps = NULL,
  early.stop.relativeVariance.threshold = NULL, learning.rate = NULL,
  dropout = 0, dropout.before = 1, dropout.after = 0,
  eval.metric = NULL, arg.params = NULL, mx.seed = NULL)
rtset.lincoef(method = c("glmnet", "cv.glmnet", "lm.ridge", "allSubsets",
  "forwardStepwise", "backwardStepwise", "glm", "sgd", "solve"),
  alpha = 0, lambda = 0.01, lambda.seq = NULL,
  cv.glmnet.nfolds = 5, which.cv.glmnet.lambda = c("lambda.min",
  "lambda.1se"), nbest = 1, nvmax = 8, sgd.model = "glm",
  sgd.model.control = list(lambda1 = 0, lambda2 = 0),
  sgd.control = list(method = "ai-sgd"))
rtset.MARS(hidden = 1, activation = NULL, learning.rate = 0.8,
  momentum = 0.5, learningrate_scale = 1, output = NULL,
  numepochs = 100, batchsize = NULL, hidden_dropout = 0,
  visible_dropout = 0, ...)

Arguments

resampler

String: Type of resampling to perform: "bootstrap", "kfold", "strat.boot", "strat.sub". Default = "strat.boot" for length(y) < 200, otherwise "strat.sub"

n.resamples

Integer: Number of training/testing sets required

stratify.var

Numeric vector (optional): Variable used for stratification. Defaults to y

train.p

Float (0, 1): Fraction of cases to assign to traininig set for resampler = "strat.sub"

strat.n.bins

Integer: Number of groups to use for stratification for resampler = "strat.sub" / "strat.boot"

target.length

Integer: Number of cases for training set for resampler = "strat.boot". Default = length(y)

seed

Integer: (Optional) Set seed for random number generator, in order to make output reproducible. See ?base::set.seed

verbose

Logical: If TRUE, print messages to screen

type

String: "fork", "psock"

hosts

Vector of strings: For type = "psock": Host names on which to run (macOS, Linux, Windows)

n.cores

Integer: Number of cores to use on localhost for type = "fork" (macOS, Linux only)

...

rtset.cluster: Additional argument to be passed to parallel::makePSOCKcluster

Integer: How many distinct colors you want. If not odd, converted to n + 1 Defaults to 21

colors

String: Acts as a shortcut to defining lo, mid, etc for a number of defaults: "french", "penn", "grnblkred",

space

String: Which colorspace to use. Option: "rgb", or "Lab". Default = "rgb". Recommendation: If mid is "white" or "black" (default), use "rgb", otherwise "Lab"

Color for low end

lomid

Color for low-mid

mid

Color for middle of the range or "mean", which will result in colorOp(c(lo, hi), "mean"). If mid = NA, then only lo and hi are used to create the color gradient.

midhi

Color for middle-high

Color for high end

colorbar

Logical: Create a vertical colorbar

cb.mar

Vector, length 4: Colorbar margins. Default: c(1, 1, 1, 1)

decom

String: Name of decomposer to use. Default = "ICA"

Integer: Number of dimensions to project to. Default = 2

max.depth

Integer: Max depth of additive tree

learning.rate

Float: learning rate

alpha

Float: alpha for method = glmnet or cv.glmnet. Default = 0

lambda

Float: lambda parameter for MASS::lm.ridge Default = .01

minobsinnode

Integer: Minimum N observations needed in node, before considering splitting

interaction.depth

[gS] Integer: Interaction depth

shrinkage

[gS] Float: Shrinkage (learning rate)

bag.fraction

[gS] Float (0, 1): Fraction of cases to use to train each tree. Helps avoid overfitting. Default = .75

n.minobsinnode

[gS] Integer: Minimum number of observation allowed in node

grid.resample.rtset

List: Output of rtset.resample defining gridSearchLearn parameters. Default = rtset.resample("kfold", 5)

ipw

Logical: If TRUE, apply inverse probability weighting (for Classification only). Note: If weights are provided, ipw is not used. Default = TRUE

upsample

Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Caution: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness

upsample.seed

Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed)

n.trees

Integer: Initial number of trees to fit

min.node.size

[gS] Integer: Minimum node size

mtry

[gS] Integer: Number of features sampled randomly at each split. Defaults to square root of n of features for classification, and a third of n of features for regression.

activation

String vector: Activation types to use: 'relu', 'sigmoid', 'softrelu', 'tanh'. If length < n of hidden layers, elements are recycled. See mxnet::mx.symbol.Activation

output

String: "Logistic" for binary classification, "Softmax" for classification of 2 or more classes, "Linear" for Regression. Defaults to "Logistic" for binary outcome, "Softmax" for 3+ classes, "LinearReg" for regression.

n.hidden.nodes

Integer vector: Length must be equal to the number of hidden layers you wish to create

ctx

MXNET context: mxnet::mx.cpu() to use CPU(s). Define N of cores using n.cores argument. mxnet::mx.gpu() to use GPU. For multiple GPUs, provide list like such: ctx = list(mxnet::mx.gpu(0), mxnet::mx.gpu(1) to use two GPUs.

max.epochs

Integer: Number of iterations for training.

dropout

Float (0, 1): Probability of dropping nodes

dropout.before

Integer: Index of hidden layer before which dropout should be applied

dropout.after

Integer: Index of hidden layer after which dropout should be applied

eval.metric

String: Metrix used for evaluation during train. Default: "rmse"

method

String: Method to use: "glm": uses stats::lm.wfit; "glmnet": uses glmnet::glmnet; "cv.glmnet": uses glmnet:cv.glmnet; "lm.ridge": uses MASS::lm.ridge; "allsubsets": uses leaps::regsubsets with method = "exhaustive"; "forwardStepwise": uses leaps::regsubsets with method = "forward}; "backwardStepwise": uses \code{leaps::regsubsets} with \code{method = "backward; "sgd": uses sgd::sgd "solve": uses base::solve

lambda.seq

Float, vector: lambda sequence for glmnet and cv.glmnet. Default = NULL

cv.glmnet.nfolds

Integer: Number of folds for cv.glmnet

which.cv.glmnet.lambda

String: Whitch lambda to pick from cv.glmnet: "lambda.min": Lambda that gives minimum cross-validated error;

nbest

Integer: For method = "allSubsets", number of subsets of each size to record. Default = 1

nvmax

Integer: For method = "allSubsets", maximum number of subsets to examine.

sgd.model

String: Model to use for method = "sgd". Default = "glm"

sgd.model.control

List: model.control list to pass to sgd::sgd

sgd.control

List: sgd.control list to pass to sgd::sgd

decom

String: Name of decomposer to use. Default = "ICA"

Integer: Number of dimensions to project to. Default = 2

Value

List with parameters