Learn R Programming

TDMR (version 2.2)

tdmClassifyLoop: Core classification double loop returning a TDMclassifier object.

Description

tdmClassifyLoop contains a double loop (opts$NRUN and CV-folds) and calls tdmClassify. It is called by all classification R-functions main_*. It splits - if tset is NULL - the data in dset into training and validation data according to opts$TST.kind. It returns an object of class TDMclassifier.

Usage

tdmClassifyLoop(dset, response.variables, input.variables, opts, tset = NULL)

Arguments

dset

the data frame containing training and validation data.

response.variables

name of column which carries the target variable - or - vector of names specifying multiple target columns (these columns are not used during prediction, only for evaluation)

input.variables

vector with names of input columns

opts

a list from which we need here the following entries

NRUN

number of runs (outer loop)

TST.SEED

=NULL: get a new random number seed with tdmRandomSeed. =any value: set the random number seed to this value to get reproducible random numbers and thus reproducible training-test-set-selection. (only relevant in case TST.kind=="cv" or "rand") (see also MOD.SEED in tdmClassify)

TST.kind

how to create cvi, handed over to tdmModCreateCVindex. If TST.kind="col", then cvi is taken from dset[,opts$TST.col].

GD.RESTART

[TRUE] =TRUE/FALSE: do/don't restart graphic devices

GD.DEVICE

["non"|"win"|"pdf"|"png"]

tset

[NULL] If not NULL, this is the test data set. If NULL, we are in tuning and the validation data set is build from dset according to the procedure prescribed in opts$TST.*.

Value

result, an object of class TDMclassifier, this is a list with results, containing

lastRes

last run, last fold: result from tdmClassify

C_train

classification error on training set

G_train

gain on training set

R_train

relative gain on training set (percentage of max. gain on this set)

*_vali

--- similar, with vali set instead of training set ---

*_vali2

--- similar, with vali2 set instead of training set ---

Err

a data frame with as many rows as opts$NRUN and 9 columns corresponding to the nine variables described above

predictions

last run: data frame with dimensions [nrow(dset),length(response.variable)]. In case of CV, all CV predictions (for each record in dset), in other cases mixed validation / train set predictions.

predictTest

predictions on the test set tset (NULL if tset==NULL )

predProbList

a list, predProbList[[i]] has the prediction probabilities of the ith run. See info on predProb in tdmClassify.

Each performance measure C_*, G_*, R_* is a vector of length opts$NRUN. To be specific, C_train[i] is the classification error on the training set from the i-th run. This error is mean(res$allEVAL$cerr.trn), i.e. the mean of the classification errors from all response variables when res is the return value of tdmClassify. In the case of cross validation, for each performance measure an additional averaging over all folds is done.

See Also

print.TDMclassifier, tdmClassify, tdmRegress, tdmRegressLoop

Examples

Run this code
# NOT RUN {
#*# --------- demo/demo00-0classif.r ---------
#*# This demo shows a simple data mining process (phase 1 of TDMR) for classification on
#*# dataset iris.
#*# The data mining process in tdmClassifyLoop calls randomForest as the prediction model.
#*# It is called opts$NRUN=2 times with different random train-validation set splits.
#*# Therefore data frame result$Err has two rows
#*#
opts=tdmOptsDefaultsSet()                       # set all defaults for data mining process
opts$TST.SEED <- opts$MOD.SEED <- 5             # reproducible results
#opts$VERBOSE <- opts$SRF.verbose <- 0          # no printed outut    
gdObj <- tdmGraAndLogInitialize(opts);          # init graphics and log file

data(iris)
response.variables="Species"                    # names, not data (!)
input.variables=setdiff(names(iris),"Species")

result = tdmClassifyLoop(iris,response.variables,input.variables,opts)

print(result$Err)
# }

Run the code above in your browser using DataLab