Learn R Programming

TDMR (version 2.2)

tdmPreGroupLevels: Group the levels of factor variable in dset[,colname].

Description

This function reduces the number of levels for factor variables with too many levels. It counts the cases in each level and orders them decreasingly. It binds the least frequent levels together in a new level "OTHER" such that the remaining untouched levels have more than opts$PRE.Xpgroup percent of all cases. OR it binds the levels with least cases together in "OTHER" such that the total number of new levels is opts$PRE.MaxLevel. From these two choices for "OTHER" take the one which binds more variables in column "OTHER".

Usage

tdmPreGroupLevels(dset, colname, opts)

Arguments

dset

data frame

colname

name of column to be re-grouped

opts

list, here we need

  • PRE.Xpgroup [0.99]

  • PRE.MaxLevel [32] (32 is the maximum number of levels allowed for randomForest)

Value

dset, a data frame with dset[,colname] re-grouped