Learn R Programming

nbc4va (version 1.2)

internalSubAsRest: Substitute values in a dataframe proportionally to all other values

Description

Substitute a target value proportionally to the distribution of the rest of the values in a column, given the following conditions:

  • If a column contains only the target value, the column is removed

  • If there are not enough target values to be distributed, then each target value will be randomly sampled from the rest of the column values with replacement

Usage

internalSubAsRest(
  dataset,
  x,
  cols = 1:ncol(dataset),
  ignore = c(NA, NaN),
  removal = FALSE
)

Arguments

dataset

A dataframe with value(s) of x in it.

x

A target value in dataframe to replace with the rest of values per column.

cols

A numeric vector of columns to consider for substitution.

ignore

A vector of the rest of the values to ignore for substitution.

removal

Set to TRUE to remove column(s) that consist only of x values.

Value

out A dataframe or list depending on removal:

  • if (removal is FALSE) return the dataset with values of x substituted by the rest of the values per column

  • if (removal is TRUE) return a list with the following:

    • $removed (vectorof numeric): the removed column indices if the column(s) consists only of x values

    • $dataset (dataframe): the dataset with values of x substituted by the rest of the values per column

Details

Pseudocode of algorithm:

  SET dataset = table of values with columns and rows
  SET x = target value for substitution

IF x in dataset: FOR EACH column y in a dataset: SET xv = all x values in y SET rest = all values not equal to x in y IF xv == values in y: REMOVE y in dataset IF number of unique values of rest == 1: MODIFY xv = rest IF number of xv values < number of unique values of rest: SET xn = number of xv values MODIFY xv = random sample of rest with size xn ELSE: SET xn = number of xv values SET p = proportions of rest SET xnp = xn * p IF xnp has decimals: MODIFY xnp = round xnp such that sum(xnp) == xn via largest remainder method MODIFY xv = rest values with distribution of xnp RETURN dataset

See Also

Other data functions: internalRoundFixedSum()

Examples

Run this code
# NOT RUN {
library(nbc4va)
data(nbc4vaDataRaw)
unclean <- nbc4vaDataRaw
clean <- nbc4va::internalSubAsRest(unclean, 99)

# }

Run the code above in your browser using DataLab