powered by
This function lets the user balance a given data.frame by resampling with a given relation rate and a binary feature.
balance_data(df, var, rate = 1, target = "auto", seed = 0, quiet = FALSE)
data.frame. Reduced sampled data.frame following the rate of appearance of a specific variable.
rate
Vector or Dataframe. Contains different variables in each column, separated by a specific character
Variable. Which variable should we used to re-sample dataset?
Numeric. How many X for every Y we need? Default: 1. If there are more than 2 unique values, rate will represent percentage for number of rows
Character. If binary, which value should be reduced? If kept in "auto", then the most frequent value will be reduced.
"auto"
Numeric. Seed to replicate and obtain same values
Boolean. Keep quiet? If not, messages will be printed
Other Data Wrangling: categ_reducer(), cleanText(), date_cuts(), date_feats(), file_name(), formatHTML(), holidays(), impute(), left(), normalize(), num_abbr(), ohe_commas(), ohse(), quants(), removenacols(), replaceall(), replacefactor(), textFeats(), textTokenizer(), vector2text(), year_month(), zerovar()
categ_reducer()
cleanText()
date_cuts()
date_feats()
file_name()
formatHTML()
holidays()
impute()
left()
normalize()
num_abbr()
ohe_commas()
ohse()
quants()
removenacols()
replaceall()
replacefactor()
textFeats()
textTokenizer()
vector2text()
year_month()
zerovar()
data(dft) # Titanic dataset df <- balance_data(dft, Survived, rate = 0.5) df <- balance_data(dft, .data$Survived, rate = 0.1, target = "TRUE")
Run the code above in your browser using DataLab