Learn R Programming

lares (version 5.2.13)

categ_reducer: Reduce categorical values

Description

This function lets the user reduce categorical values in a vector. It is tidyverse friendly for use on pipelines

Usage

categ_reducer(
  df,
  var,
  nmin = 0,
  pmin = 0,
  pcummax = 100,
  top = NA,
  pvalue_max = 1,
  cor_var = "tag",
  limit = 20,
  other_label = "other",
  ...
)

Value

data.frame df on which var has been transformed

Arguments

df

Categorical Vector

var

Variable. Which variable do you wish to reduce?

nmin

Integer. Number of minimum times a value is repeated

pmin

Numerical. Percentage of minimum times a value is repeated

pcummax

Numerical. Top cumulative percentage of most repeated values

top

Integer. Keep the n most frequently repeated values

pvalue_max

Numeric (0-1]. Max pvalue categories

cor_var

Character. If pvalue_max < 1, you must define which column name will be compared with (numerical or binary).

limit

Integer. Limit one hot encoding to the n most frequent values of each column. Set to NA to ignore argument.

other_label

Character. With which text do you wish to replace the filtered values with?

...

Additional parameters.

See Also

Other Data Wrangling: balance_data(), cleanText(), date_cuts(), date_feats(), file_name(), formatHTML(), holidays(), impute(), left(), normalize(), num_abbr(), ohe_commas(), ohse(), quants(), removenacols(), replaceall(), replacefactor(), textFeats(), textTokenizer(), vector2text(), year_month(), zerovar()

Examples

Run this code
data(dft) # Titanic dataset
categ_reducer(dft, Embarked, top = 2) %>% freqs(Embarked)
categ_reducer(dft, Ticket, nmin = 7, other_label = "Other Ticket") %>% freqs(Ticket)
categ_reducer(dft, Ticket, pvalue_max = 0.05, cor_var = "Survived") %>% freqs(Ticket)

Run the code above in your browser using DataLab