Learn R Programming

Hmisc (version 5.2-2)

combine.levels: combine.levels

Description

Combine Infrequent Levels of a Categorical Variable

Usage

combine.levels(
  x,
  minlev = 0.05,
  m,
  ord = is.ordered(x),
  plevels = FALSE,
  sep = ","
)

Value

a factor variable, or if `ord=TRUE` an ordered factor variable

Arguments

x

a factor, `ordered` factor, or numeric or character variable that will be turned into a `factor`

minlev

the minimum proportion of observations in a cell before that cell is combined with one or more cells. If more than one cell has fewer than minlev*n observations, all such cells are combined into a new cell labeled `"OTHER"`. Otherwise, the lowest frequency cell is combined with the next lowest frequency cell, and the level name is the combination of the two old level levels. When `ord=TRUE` combinations happen only for consecutive levels.

m

alternative to `minlev`, is the minimum number of observations in a cell before it will be combined with others

ord

set to `TRUE` to treat `x` as if it were an ordered factor, which allows only consecutive levels to be combined

plevels

by default `combine.levels` pools low-frequency levels into a category named `OTHER` when `x` is not ordered and `ord=FALSE`. To instead name this category the concatenation of all the pooled level names, separated by a comma, set `plevels=TRUE`.

sep

the separator for concatenating levels when `plevels=TRUE`

Author

Frank Harrell

Details

After turning `x` into a `factor` if it is not one already, combines levels of `x` whose frequency falls below a specified relative frequency `minlev` or absolute count `m`. When `x` is not treated as ordered, all of the small frequency levels are combined into `"OTHER"`, unless `plevels=TRUE`. When `ord=TRUE` or `x` is an ordered factor, only consecutive levels are combined. New levels are constructed by concatenating the levels with `sep` as a separator. This is useful when comparing ordinal regression with polytomous (multinomial) regression and there are too many categories for polytomous regression. `combine.levels` is also useful when assumptions of ordinal models are being checked empirically by computing exceedance probabilities for various cutoffs of the dependent variable.

Examples

Run this code
x <- c(rep('A', 1), rep('B', 3), rep('C', 4), rep('D',1), rep('E',1))
combine.levels(x, m=3)
combine.levels(x, m=3, plevels=TRUE)
combine.levels(x, ord=TRUE, m=3)
x <- c(rep('A', 1), rep('B', 3), rep('C', 4), rep('D',1), rep('E',1),
       rep('F',1))
combine.levels(x, ord=TRUE, m=3)

Run the code above in your browser using DataLab