Learn R Programming

Rborist (version 0.3-7)

rfTrain: Rapid Decision Tree Training

Description

Accelerated training using the Random Forest (trademarked name) algorithm. Tuned for multicore and GPU hardware. Bindable with most numerical front-end languages in addtion to R.

Usage

# S3 method for default
rfTrain(preFormat,
                 sampler,
                 y,
                autoCompress = 0.25,
                ctgCensus = "votes",
                classWeight = NULL,
                maxLeaf = 0,
                minInfo = 0.01,
                minNode = if (is.factor(y)) 2 else 3,
                nLevel = 0,
                nThread = 0,
                predFixed = 0,
                predProb = 0.0,
                predWeight = NULL, 
                regMono = NULL,
                splitQuant = NULL,
                thinLeaves = FALSE,
                treeBlock = 1,
                verbose = FALSE,
                ...)

Value

an object of class arbTrain, containing:

  • version the version of the Rborist package used to train.

  • samplerHash hash value of the Sampler object used to train. Recorded for consistency of subsequent commands.

  • predInfo a vector of forest-wide Gini (classification) or weighted variance (regression), by predictor.

  • predMap a vector of integers mapping internal to front-end predictor indices.

  • forest an object of class Forest containing:

    • nTree the number of trees trained.

    • node an object of class Node consisting of:

      • treeNode forest-wide vector of packed node representations.

      • extent per-tree node counts.

      • scores numeric vector of scores, for all terminals and nonterminals.

      • factor an object of class Factor consisting of:

        • facSplit forest-wide vector of packed factor bits.

        • extent per-tree extent of factor bits.

        • observed forest-wide vector of observed factor bits.

    • Leaf an object of class Leaf containing:

      • extent forest-wide vector of leaf populations, i.e., counts of unique samples.

      • index forest-wide vector of sample indices.

  • diag diagnostics accumulated over the training task.

Arguments

y

the response (outcome) vector, either numerical or categorical.

preFormat

Compressed, presorted representation of the predictor values. Row count must conform with y.

sampler

Compressed representation of the sampled response.

autoCompress

plurality above which to compress predictor values.

ctgCensus

report categorical validation by vote or by probability.

classWeight

proportional weighting of classification categories.

maxLeaf

maximum number of leaves in a tree. Zero denotes no limit.

minInfo

information ratio with parent below which node does not split.

minNode

minimum number of distinct row references to split a node.

nLevel

maximum number of tree levels to train, including terminals (leaves). Zero denotes no limit.

nThread

suggests an OpenMP-style thread count. Zero denotes the default processor setting.

predFixed

number of trial predictors for a split (mtry).

predProb

probability of selecting individual predictor as trial splitter.

predWeight

relative weighting of individual predictors as trial splitters.

regMono

signed probability constraint for monotonic regression.

splitQuant

(sub)quantile at which to place cut point for numerical splits

.

thinLeaves

bypasses creation of leaf state in order to reduce memory footprint.

treeBlock

maximum number of trees to train during a single level (e.g., coprocessor computing).

verbose

indicates whether to output progress of training.

...

Not currently used.

Author

Mark Seligman at Suiji.

See Also

Rborist

Examples

Run this code
if (FALSE) {
  # Regression example:
  nRow <- 5000
  x <- data.frame(replicate(6, rnorm(nRow)))
  y <- with(x, X1^2 + sin(X2) + X3 * X4) # courtesy of S. Welling.

  # Classification example:
  data(iris)

  # Generic invocation:
  rt <- rfTrain(y)


  # Causes 300 trees to be trained:
  rt <- rfTrain(y, nTree = 300)


  # Causes validation census to report class probabilities:
  rt <- rfTrain(iris[-5], iris[5], ctgCensus="prob")


  # Applies table-weighting to classification categories:
  rt <- rfTrain(iris[-5], iris[5], classWeight = "balance")


  # Weights first category twice as heavily as remaining two:
  rt <- rfTrain(iris[-5], iris[5], classWeight = c(2.0, 1.0, 1.0))


  # Does not split nodes when doing so yields less than a 2% gain in
  # information over the parent node:
  rt <- rfTrain(y, preFormat, sampler, minInfo=0.02)


  # Does not split nodes representing fewer than 10 unique samples:
  rt <- rfTrain(y, preFormat, sampler, minNode=10)


  # Trains a maximum of 20 levels:
  rt <- rfTrain(y, preFormat, sampler, nLevel = 20)


  # Trains, but does not perform subsequent validation:
  rt <- rfTrain(y, preFormat, sampler, noValidate=TRUE)


  # Chooses 500 rows (with replacement) to root each tree.
  rt <- rfTrain(y, preFormat, sampler, nSamp=500)


  # Chooses 2 predictors as splitting candidates at each node (or
  # fewer, when choices exhausted):
  rt <- rfTrain(y, preFormat, sampler, predFixed = 2)  


  # Causes each predictor to be selected as a splitting candidate with
  # distribution Bernoulli(0.3):
  rt <- rfTrain(y, preFormat, sampler, predProb = 0.3) 


  # Causes first three predictors to be selected as splitting candidates
  # twice as often as the other two:
  rt <- rfTrain(y, preFormat, sampler, predWeight=c(2.0, 2.0, 2.0, 1.0, 1.0))


  # Constrains modelled response to be increasing with respect to X1
  # and decreasing with respect to X5.
  rt <- rfTrain(x, y, preFormat, sampler, regMono=c(1.0, 0, 0, 0, -1.0, 0))


  # Suppresses creation of detailed leaf information needed for
  # quantile prediction and external tools.
  rt <- rfTrain(y, preFormat, sampler, thinLeaves = TRUE)

  spq <- rep(0.5, ncol(x))
  spq[0] <- 0.0
  spq[1] <- 1.0
  rt <- rfTrain(y, preFormat, sampler, splitQuant = spq)
  }

Run the code above in your browser using DataLab