dset[,colname]
.This function reduces the number of levels for factor variables with too many levels. It counts the cases in each level and orders them decreasingly. It binds the least frequent levels together in a new level "OTHER" such that the remaining untouched levels have more than opts$PRE.Xpgroup percent of all cases. OR it binds the levels with least cases together in "OTHER" such that the total number of new levels is opts$PRE.MaxLevel. From these two choices for "OTHER" take the one which binds more variables in column "OTHER".
tdmPreGroupLevels(dset, colname, opts)
data frame
name of column to be re-grouped
list, here we need
PRE.Xpgroup [0.99]
PRE.MaxLevel [32] (32 is the maximum number of levels allowed for randomForest
)
dset
, a data frame with dset[,colname]
re-grouped