Learn R Programming

TDMR (version 2.2)

tdmModSortedRFimport: Sort the input variables decreasingly by their RF-importance.

Description

Build a Random Forest using importance=TRUE. Usually the RF is smaller (50 trees), to speed up computation. Use na.roughfix for missing value replacement. Decide which input variables to keep and return them in SRF$input.variables

Usage

tdmModSortedRFimport(d_train, response.variable, input.variables, opts)

Arguments

d_train

training set

response.variable

the target column from d_train to use for the RF-model

input.variables

the input columns from d_train to use for the RF-model

opts

options, here we use the elements [defaults in brackets]:

  • SRF.kind: ="xperc": keep a certain importance percentage, starting from the most important variable ="ndrop": drop a certain number of least important variables ="nkeep": keep a certain number of most important variables ="none": do not call tdmModSortedRFimport at all (see tdmRegress.r and tdmClassify.r)

  • SRF.ndrop: [0] how many variables to drop (if SRF.kind=="ndrop")

  • SRF.XPerc: [0.95] if >=0, keep that importance percentage, starting with the most important variables (if SRF.kind=="xperc")

  • SRF.calc: [TRUE] =TRUE: calculate importance & save on SRF.file, =F: load from SRF.file (SRF.file = Output/<filename>.SRF.<response.variable>.Rdata)

  • SRF.ntree: [50] number of RF trees

  • SRF.verbose: [2]

  • SRF.maxS: [40] how many variables to show in plot

  • SRF.minlsi: [1] a lower bound for the length of SRF$input.variables

  • RF.sampsize: sampsize for RF, set prior to calling this func via tdmModAdjustSampsize(opts$SRF.samp,...)

  • GD.DEVICE: if !="non", then make a bar plot on current graphic device

  • CLS.CLASSWT: class weight vector to use in random forest training

Value

SRF, a list with the following elements:

input.variables

the vector of input variables which remain after importance processing. These are sorted by decreasing importance.

s_input

all input.variables sorted by decreasing (**NEW**) importance

s_imp1

the importance for s_input

s_dropped

vector with name of dropped variables

lsd

length of s_dropped

perc

the percentage of total importance which is in the dropped variables

opts

some defaults might have been added