varimp: Variable Importance

Description

Standard and conditional variable importance for `cforest', following the permutation principle of the `mean decrease in accuracy' importance in `randomForest'.

Usage

# S3 method for constparty
varimp(object, nperm = 1L, 
       risk = c("loglik", "misclassification"), conditions = NULL,
       mincriterion = 0, ...)
# S3 method for cforest
varimp(object, nperm = 1L, 
       OOB = TRUE, risk = c("loglik", "misclassification"),
       conditional = FALSE, threshold = .2, applyfun = NULL, 
       cores = NULL, ...)

Arguments

object

an object as returned by cforest.

mincriterion

the value of the test statistic or 1 - p-value that must be exceeded in order to include a split in the computation of the importance. The default mincriterion = 0 guarantees that all splits are included.

conditional

a logical determining whether unconditional or conditional computation of the importance is performed.

threshold

the value of the test statistic or 1 - p-value of the association between the variable of interest and a covariate that must be exceeded inorder to include the covariate in the conditioning scheme for the variable of interest (only relevant if conditional = TRUE).

nperm

the number of permutations performed.

OOB

a logical determining whether the importance is computed from the out-of-bag sample or the learning sample (not suggested).

risk

a character determining the risk to be evaluated.

conditions

a list of conditions.

applyfun

an optional lapply-style function with arguments function(X, FUN, …). It is used for computing the variable importances for each tree. The default is to use the basic lapply function unless the cores argument is specified (see below). Extra care is needed to ensure correct seeds are used in the parallel runs (RNGkind("L'Ecuyer-CMRG") for example).

cores

numeric. If set to an integer the applyfun is set to mclapply with the desired number of cores.

…

additional arguments, not used.

Value

A vector of `mean decrease in accuracy' importance scores.

Details

NEEDS UPDATE

Function varimp can be used to compute variable importance measures similar to those computed by importance. Besides the standard version, a conditional version is available, that adjusts for correlations between predictor variables.

If conditional = TRUE, the importance of each variable is computed by permuting within a grid defined by the covariates that are associated (with 1 - p-value greater than threshold) to the variable of interest. The resulting variable importance score is conditional in the sense of beta coefficients in regression models, but represents the effect of a variable in both main effects and interactions. See Strobl et al. (2008) for details.

Note, however, that all random forest results are subject to random variation. Thus, before interpreting the importance ranking, check whether the same ranking is achieved with a different random seed -- or otherwise increase the number of trees ntree in ctree_control.

Note that in the presence of missings in the predictor variables the procedure described in Hapfelmeier et al. (2012) is performed.

References

Leo Breiman (2001). Random Forests. Machine Learning, 45(1), 5--32.

Alexander Hapfelmeier, Torsten Hothorn, Kurt Ulm, and Carolin Strobl (2012). A New Variable Importance Measure for Random Forests with Missing Data. Statistics and Computing, http://dx.doi.org/10.1007/s11222-012-9349-1

Torsten Hothorn, Kurt Hornik, and Achim Zeileis (2006b). Unbiased Recursive Partitioning: A Conditional Inference Framework. Journal of Computational and Graphical Statistics, 15 (3), 651-674. Preprint available from http://statmath.wu-wien.ac.at/~zeileis/papers/Hothorn+Hornik+Zeileis-2006.pdf

Carolin Strobl, Anne-Laure Boulesteix, Thomas Kneib, Thomas Augustin, and Achim Zeileis (2008). Conditional Variable Importance for Random Forests. BMC Bioinformatics, 9, 307. http://www.biomedcentral.com/1471-2105/9/307

Examples

Run this code

# NOT RUN {
    
   set.seed(290875)
   data("readingSkills", package = "party")
   readingSkills.cf <- cforest(score ~ ., data = readingSkills, 
                               mtry = 2, ntree = 50)

   # standard importance
   varimp(readingSkills.cf)

   # conditional importance, may take a while...
   varimp(readingSkills.cf, conditional = TRUE)

# }

Run the code above in your browser using DataLab