predict.arbTrain: predict method for arbTrain result

Description

Prediction and test using Rborist.

Usage

# S3 method for arbTrain
predict(object, newdata, sampler, yTest=NULL,
keyedFrame = FALSE, quantVec=NULL, quantiles = !is.null(quantVec),
ctgCensus = "votes", indexing = FALSE, trapUnobserved = FALSE,
bagging = FALSE, nThread = 0, verbose = FALSE, ...)

Value

an object of one of two classes:

SummaryReg summarizing regression, consisting of:
- prediction an object of class PredictReg consisting of:
  - yPred the estimated numerical response.
  - qPred quantiles of prediction, if requested.
  - qEst quantile of the estimate, if quantiles requested.
  - indices final index of prediction, if requested.
- validation if validation requested, an object of class ValidReg consisting of:
  - mse the mean-squared error of the estimate.
  - rsq the r-squared statistic of the estimate.
  - mae the mean absolute error of the estimate.
- importance if permution importance requested, an object of class importanceReg, containing multiple instances of:
  - names the predictor names.
  - mse the per-predictor mean-squared error, under permutation.
SummaryCtg summarizing classification, consisting of:
- PredictCtg consisting of:
  - yPred estimated categorical response.
  - census factor-valued matrix of the estimate, by category, if requested.
  - prob matrix of estimate probabilities, by category, if requested.
  - indices final index of prediction, if requested.
- validation if validation requested, an object of class ValidCtg consisting of:
  - confusion the confusion matrix.
  - misprediction the misprediction rate.
  - oobError the out-of-bag error.
- importance if permution importance requested, an object of class importanceCtg, consisting of:
  - mispred the misprediction rate, by predictor.
  - oobErr the out-of-bag error, by predictor.

Arguments

object: an object of class arbTrain, created from a previous invocation of the command rfArb, Rborist or rfTrain to train.
newdata: a design frame or matrix containing new data, with the same signature of predictors as in the training command.
sampler: an object of class Sampler used in the command.
yTest: a response vector against which to test the new predictions.
keyedFrame: whether the columns of newdata may appear in arbitrary order or as a superset of the predictors used to train.
quantVec: a vector of quantiles to predict.
quantiles: whether to predict quantiles.
ctgCensus: whether/how to summarize per-category predictions. "votes" specifies the number of trees predicting a given class. "prob" specifies a normalized, probabilistic summary. "probSample" specifies sample-weighted probabilities, similar to quantile histogramming.
indexing: whether to record the final node index, typically terminal, of tree traversal.
trapUnobserved: reports score for nonterminal upon encountering values not observed during training, such as missing data.
bagging: whether prediction is restricted to out-of-bag samples.
nThread: suggests ans OpenMP-style thread count. Zero denotes default processor setting.
verbose: whether to output progress of prediction.
...: not currently used.

Author

Mark Seligman at Suiji.

Examples

Run this code

if (FALSE) {
  # Regression example:
  nRow <- 5000
  x <- data.frame(replicate(6, rnorm(nRow)))
  y <- with(x, X1^2 + sin(X2) + X3 * X4) # courtesy of S. Welling.

  pf <- preformat(x)
  sp <- presample(y)
  rb <- arbTrain(pf, sp, y)


  # Performs separate prediction on new data:
  xx <- data.frame(replace(6, rnorm(nRow)))
  pred <- predict(rb, xx)
  yPred <- pred$yPred

  rb <- Rborist(x,y)

  # Performs separate prediction on new data:
  xx <- data.frame(replacate(6, rnorm(nRow)))
  pred <- predict(rb, xx)
  yPred <- pred$yPred

  # As above, but also records final indices of each tree walk:
  #
  pred <- predict(rb, xx, indexing=TRUE)
  print(pred$indices[c(1:2), ])


  # As above, but predicts over \code{newdata} with unobserved values.
  # In the case of numerical data, only missing values are considered
  # unobserved.  Missing values are encoded as \code{NaN}, which are
  # incomparable, precipitating \code{false} on every test.  Prediction
  # therefore takes the \code{false} branch when encountering missing
  # values:
  #
  xxMissing <- xx
  xxMissing[6, c(15, 32, 87, 101)] <- NA
  pred <- predict(rb, xxMissing)
  

  # As above, but returns a nonterminal score upon encountering
  # unobserved values. Neither the true nor the false branch from the
  # testing node is taken.  Instead, the score returned is derived
  # from all leaf nodes (terminals) reached by the testing
  # (nonterminal) node.
  #
  pred <- predict(rb, xxMissing, trapUnobserved = TRUE)


  # Performs separate prediction, using original response as test
  # vector:
  pred <- predict(rb, xx, y)
  mse <- pred$mse
  rsq <- pred$rsq


  # Performs separate prediction with (default) quantiles:
  pred <- predict(rb, xx, quantiles="TRUE")
  qPred <- pred$qPred


  # Performs separate prediction with deciles:
  pred <- predict(rb, xx, quantVec = seq(0.1, 1.0, by = 0.10))
  qPred <- pred$qPred


  # Classification examples:
  data(iris)
  rb <- Rborist(iris[-5], iris[5])


  # Generic prediction using training set.
  # Census as (default) votes:
  pred <- predict(rb, iris[-5])
  yPred <- pred$yPred
  census <- pred$census

  # Using the \code{keyedFrame} option allows the columns of
  # \code{newdata} to appear in arbitrary order, so long as the
  # columns present during training appear as a subset:
  #
  pred <- predict(rb, iris[c(2, 4, 3, 1)], keyedFrame=TRUE)


  # As above, but validation census to report class probabilities:
  pred <- predict(rb, iris[-5], ctgCensus="prob")
  prob <- pred$prob


  # As above, but with training reponse as test vector:
  pred <- predict(rb, iris[-5], iris[5], ctgCensus = "prob")
  prob <- pred$prob
  conf <- pred$confusion
  misPred <- pred$misPred

  # As above, but predicts nonterminal when encountering categories
  # not observed during training.  That is, prediction returns a score
  # derived from all terminal nodes (leaves) reached from the
  # (nonterminal) testing node.
  #
  # In this case, "unobserved" refers to categories not present in
  # the subpartition over which a splitting is performed.  As training
  # partitions the data into smaller and smaller regions, a given
  # category becomes less likely to appear in a region.
  #
  # More generally, unobserved data can include missing predictors as
  # well as categories appearing in \code{newdata} which were not
  # present during training.
  #
  pred <- predict(rb, trapUnobserved=TRUE)
}

Run the code above in your browser using DataLab

Description

Usage

Value

Arguments

Author

See Also

Examples