Learn R Programming

PredPsych (version 0.4)

DTModel: Generic Decision Tree Function

Description

A simple function to create Decision Trees

Usage

DTModel(Data, classCol, selectedCols, tree, cvType, nTrainFolds,
  ntrainTestFolds, modelTrainFolds, foldSep, cvFraction,
  extendedResults = FALSE, SetSeed = TRUE, silent = FALSE,
  NewData = NULL, ...)

Arguments

Data

(dataframe) a data frame with regressors and response

classCol

(numeric or string) which column should be used as response col

selectedCols

(optional) (numeric or string) which columns should be treated as data(features + response) (defaults to all columns)

tree

which decision tree model to implement; One of the following values:

  • CART = Classification And Regression Tree;

  • CARTNACV = Crossvalidated CART Tree removing missing values;

  • CARTCV = Crossvalidated CART Tree With missing values;

  • CF = Conditional inference framework Tree;

  • RF = Random Forest Tree;

cvType

(optional) (string) which type of cross-validation scheme to follow - only in case of CARTCV or CARTNACV; One of the following values:

  • folds = (default) k-fold cross-validation

  • LOSO = Leave-one-subject-out cross-validation

  • holdout = holdout Crossvalidation. Only a portion of data (cvFraction) is used for training.

  • LOTO = Leave-one-trial out cross-validation.

nTrainFolds

(optional) (parameter for only k-fold cross-validation) No. of folds in which to further divide Training dataset

ntrainTestFolds

(optional) (parameter for only k-fold cross-validation) No. of folds for training and testing dataset

modelTrainFolds

= (optional) (parameter for only k-fold cross-validation) specific folds from the first train/test split (ntrainTestFolds) to use for training

foldSep

(numeric) (parameter for only Leave-One_subject Out) mandatory column number for Leave-one-subject out cross-validation.

cvFraction

(optional) (numeric) Fraction of data to keep for training data

extendedResults

(optional) (logical) Return extended results with model and other metrics

SetSeed

(optional) (logical) Whether to setseed or not. use SetSeed to seed the random number generator to get consistent results;

silent

(optional) (logical) whether to print messages or not

NewData

(optional) (dataframe) New Data frame features for which the class membership is requested

...

(optional) additional arguments for the function

Value

model result for the input tree Results or Test accuracy accTest based on tree. If extendedResults = TRUE outputs Test accuracy accTest of discrimination,ConfMatrix Confusion matrices and fit the model and ConfusionMatrixResults Overall cross-validated confusion matrix results

Details

The function implements the Decision Tree models (DT models). DT models fall under the general "Tree based methods" involving generation of a recursive binary tree (Hastie et al., 2009). In terms of input, DT models can handle both continuous and categorical variables as well as missing data. From the input data, DT models build a set of logical "if ..then" rules that permit accurate prediction of the input cases.

The function "rpart" handles the missing data by creating surrogate variables instead of removing them entirely (Therneau, & Atkinson, 1997). This could be useful in case the data contains multiple missing values.

Unlike regression methods like GLMs, Decision Trees are more flexible and can model nonlinear interactions.

References

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning. Springer Series in Statistics (2nd ed., Vol. 1). New York, NY: Springer New York.

Terry Therneau, Beth Atkinson and Brian Ripley (2015). rpart: Recursive Partitioning and Regression Trees. R package version 4.1-10. https://CRAN.R-project.org/package=rpart

Therneau, T. M., & Atkinson, E. J. (1997). An introduction to recursive partitioning using the RPART routines (Vol. 61, p. 452). Mayo Foundation: Technical report.

Examples

Run this code
# NOT RUN {
# generate a cart model for 10% of the data with cross-validation
model <- DTModel(Data = KinData,classCol=1,
selectedCols = c(1,2,12,22,32,42,52,62,72,82,92,102,112), tree='CARTCV',cvType = "holdout")
# Output:
# Performing Decision Tree Analysis 
#
# [1] "Generating crossvalidated Tree With Missing Values"
#
# Performing holdout Cross-validation
# 
# cvFraction was not specified,
#  Using default value of 0.8 (cvFraction = 0.8)" 
# Proportion of Test/Train Data was :  0.2470588 
# 
# [1] "Test holdout Accuracy is  0.62"
# holdout CART Analysis: 
# cvFraction : 0.8 
# Test Accuracy 0.62
# *Legend:
# cvFraction = Fraction of data to keep for training data 
# Test Accuracy = Accuracy from the Testing dataset

#' # --CART MOdel --

# Alternate uses:  
# k-fold cross-validation with removing missing values
model <- DTModel(Data = KinData,classCol=1,
selectedCols = c(1,2,12,22,32,42,52,62,72,82,92,102,112),
tree='CARTNACV',cvType="folds")

# holdout cross-validation without removing missing values
model <- DTModel(Data = KinData,classCol=1,
selectedCols = c(1,2,12,22,32,42,52,62,72,82,92,102,112),
tree='CARTCV',cvType = "holdout")

# k-fold cross-validation without removing missing values
model <- DTModel(Data = KinData,classCol=1,
selectedCols = c(1,2,12,22,32,42,52,62,72,82,92,102,112),
tree='CARTCV',cvType="folds")

# }

Run the code above in your browser using DataLab