unbiasedRun: Perform unbiased runs with best-solution parameters.

Description

Read the best solution of a parameter-tuning run from envT$bst, execute with these best parameters the function tdm$mainFunc (usually a classification or regression machine learning task), to see whether the result quality is reproducible on independent test data or on independently trained models.

Usage

unbiasedRun(
  confFile,
  envT,
  dataObj = NULL,
  umode = "RSUB",
  withParams = FALSE,
  tdm = NULL
)

Arguments

confFile

the configuration name, e.g. "appAcid_02.conf"

envT

environment, from which we need the objects

bst: data frame containing best results (merged over repeats)
res: data frame containing all results
theTuner: ["spot"] string
spotConfig: [NULL] a list with SPOT settings. If NULL, try to read spotConfig from confFile.
finals: [NULL] a one-row data frame to which new columns with final results are added

dataObj

[NULL] contains the pre-fetched data with training-set and test-set part. If NULL, set it to tdmReadAndSplit(opts,tdm). It is now deprecated to have dataObj==NULL.

umode

--- deprecated as argument to unbiasedRun --- , use the division provided in dataObj = tdmReadAndSplit(opts,tdm) which makes use of tdm$umode. For downward compatibility only (if dataObj==NULL : [ "RSUB" (default) | "CV" | "TST" | "SP_T" ], how to divide in training and test data for the unbiased runs:

"RSUB": random subsampling into (1-tdm$TST.testFrac)% training and tdm$TST.testFrac% test data
"CV": cross validation (CV) with tdm$nrun folds
"TST": all data in opts$filename (or dsetTrnVa(dataObj)) are used for training, all data in opts$filetest (or dsetTest(dataObj) are used for testing
"SP_T": 'split_test': prior to tuning, the data set was split by random subsampling into tdm$TST.testFrac% test and (1-tdm$TST.testFrac)% training-vali data, tagged via column "tdmSplit". Tuning was done on training-vali data. Now we use column "tdmSplit" to select the test data for unbiased evaluation. Training during unbiased evaluation is done on a fraction tdm$TST.trnFrac of the training-vali data

withParams

[FALSE] if =TRUE, add columns with best parameters to data frame finals (should be FALSE, if different runs have different parameters)

tdm

a list with TDM settings from which we use here the elements

mainFunc: the function to be called for unbiased evaluations
mainFile: change to the directory of mainFile before starting mainFunc
nrun: [5] how often to call the unbiased evaluation
nfold: [10] how many folds in CV (only relevant for umode="CV")
TST.testFrac: [0.2] test set fraction (only relevant for umode="RSUB" or ="SP_T")

The defaults in '[...]' are set by tdmDefaultsFill, if they are not defined on input.

Value

envT the augmdented environment envT, with the following items updated

finals

the final results

tdm

the updated list with TDM settings

results

last results (from last unbiased training)

Examples

Run this code

# NOT RUN {
   ## Load the best results obtained in a prior tuning for the configuration "sonar_04.conf"
   ## with tuning method "spot". The result envT from a prior run of tdmBigLoop with this .conf
   ## is read from demo02sonar/demoSonar.RData.
   ## Run task main_sonar again with these best parameters, using the default settings from 
   ## tdmDefaultsFill: umode="RSUB", tdm$nrun=5  and tdm$TST.testFrac=0.2.
   path = paste(find.package("TDMR"), "demo02sonar",sep="/")
   envT = tdmEnvTLoad("demoSonar.RData",path);    # loads envT
   source(paste(path,"main_sonar.r",sep="/"));
   envT$tdm$optsVerbosity=1;
   envT$sCList[[1]]$opts$path=path;       # overwrite a possibly older stored path
   envT$spotConfig <- envT$sCList[[1]];
   dataObj <- tdmReadTaskData(envT,envT$tdm);
   envT <- unbiasedRun("sonar_04.conf",envT,dataObj,tdm=envT$tdm);
   print(envT$finals);
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples