Depending on the value of member TST.kind in list opts, the returned index cvi is
TST.kind="cv": a random cross validation index P([111...222...333...]) - or -
TST.kind="rand": a random index with P([00...11...-1-1...]) for training (0), validation (1) and disregard (-1) cases - or -
TST.kind="col": the column dset[,opts$TST.COL] contains the training (0), validation (1) and disregard (-1) set division (and all records with a value <0 in column TST.COL are disregarded).
Here P(.) denotes random permutation of the sequence. The disregard set is optional, i.e. cvi may contain only 0 and 1, if desired. Special case TST.kind="cv" and TST.NFOLD=1: make *every* record a training record, i.e. index [000...]. In case TST.kind="rand" and stratified=TRUE a stratified sample is drawn, where the strata in the training case reflect the rel. frequency of each level of the **1st** response variable and are ensured to be at least of size 1. In summary, TST.kind="cv" means cross validation (TST.NFOLD models are built with TST.NFOLD different train-validation data sets), while TST.kind="rand" or "col" means one model build with a random ("rand") or user-defined ("col") training-validation split.
tdmModCreateCVindex(dset, response.variables, opts, stratified = FALSE)
the data frame for which cvi is needed
issue a warning if length(response.variables)>1
. Use the first
response variable for determining strata size.
a list from which we need here the following entries
TST.kind: ["cv"|"rand"|"col"]
TST.NFOLD: number of CV folds (only relevant in case TST.kind=="cv")
TST.COL: column of dset containing the (0/1/<0) index (only relevant in case TST.kind=="col") or NULL if no such column exists
TST.valiFrac: fraction of records to set aside for validation (only relevant in case TST.kind=="rand")
TST.trnFrac: [1-opts$TST.valiFrac] fraction of records to use for training (only relevant in case TST.kind=="rand")
[F] do stratified sampling for TST.kind="rand" with at least one training record for each response variable level (classification)
cvi training-validation-set (0/>0) index vector (all records with cvi<0, e.g. from column TST.COL, are disregarded)