tdmRegress: Core regression function of TDMR.

Description

tdmRegress is called by tdmRegressLoop and returns an object of class tdmRegre. It trains a model on training set d_train and evaluates it on test set d_test. If this function is used for tuning, the test set d_test plays the role of a validation set.

Usage

tdmRegress(
  d_train,
  d_test,
  d_preproc,
  response.variables,
  input.variables,
  opts,
  tsetStr = c("Validation", "validation", ".vali")
)

Arguments

d_train

training set

d_test

test set, same columns as training set

d_preproc

data used for preprocessing. May be NULL, if no preprocessing is done (opts$PRE.SFA=="none" and opts$PRE.PCA=="none"). If preprocessing is done, then d_preproc is usually all non-validation data.

response.variables

name of column which carries the target variable - or - vector of names specifying multiple target columns (these columns are not used during prediction, only for evaluation)

input.variables

vector with names of input columns

opts

additional parameters [defaults in brackets]

SRF.*: several parameters for sorted_rf_importance (see tdmModelingUtils.r)
RF.*: several parameters for RF (Random Forest, defaults are set, if omitted)
SVM.*: several parameters for SVM (Support Vector Machines, defaults are set, if omitted)
filename
data.title
MOD.method: ["RF"] the main training method ["RF"|"SVM"|"LM"]: use [Random forest| SVM| linear model] for the main model
MOD.SEED: =NULL: set the RNG to system time as seed (different RF trainings) =any value: set the random number seed to this value (+i) to get reproducible random numbers. In this way, the model training part (RF, NNET, ...) gets always a fixed seed. (see also TST.SEED in tdmRegressLoop)
OUTTRAFO: [NULL] string, apply a transformation to the output variable
fct.postproc: [NULL] name of a user-def'd function for postprocessing of predicted output
gr.log: =FALSE (def): make scatter plot as-is, =TRUE: transform output x with log(x+1) (x should be nonnegative)
GD.DEVICE: if !="non", then make a pairs-plot of the 5 most important variables and make a true-false bar plot
VERBOSE: [2] =2: most printed output, =1: less, =0: no output

tsetStr

[c("Validation", "validation",".vali")]

Value

res, an object of class tdmRegre, this is a list containing

d_train

training set + predicted class column(s)

d_test

test set + predicted target output

allRMAE

data frame with columns = (rmae.train, rmae.test, theil.train, theil.test, ...) and rows = response variables. Here Theil's U is based on RMAE (relative mean absolute errror).

allRMSE

data frame with columns = (rmse.train, rmse.test, theil.train, theil.test, ...) and rows = response variables. Here Theil's U is based on RMSE (root mean square error).

lastModel

the last model built (e.g. the last Random Forest in the case of MOD.method=="RF")

opts

parameter list from input, some default values might have been added

The item lastModel is specific for the *last* model (the one built for the last response variable in the last run and last fold)

Examples

Run this code

# NOT RUN {
#*# This example shows a simple data mining process (phase 1 of TDMR) for regression on
#*# dataset iris.
#*# The data mining process in tdmRegress calls randomForest as the prediction model.
#*# It is called  for 2 response variables. Therefore, the data frames allRMAE and allRMSE 
#*# have 2 rows.
#*#
opts=tdmOptsDefaultsSet()                       # set all defaults for data mining process
gdObj <- tdmGraAndLogInitialize(opts);          # init graphics and log file

data(iris)
response.variables=c("Petal.Length","Petal.Width")                # names, not data (!)
input.variables=setdiff(names(iris),response.variables)
opts$rgain.type="rmae"
opts$NRUN=1

idx_train = sample(nrow(iris))[1:110]
d_train=iris[idx_train,]
d_vali=iris[-idx_train,]
res <- tdmRegress(d_train,d_vali,NULL,response.variables,input.variables,opts)

print(res$allRMAE)
print(res$allRMSE)

# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples