Learn R Programming

nbc4va (version 1.2)

internalSubAsRest: Substitute values in a dataframe proportionally to all other values


Substitute a target value proportionally to the distribution of the rest of the values in a column, given the following conditions:

  • If a column contains only the target value, the column is removed

  • If there are not enough target values to be distributed, then each target value will be randomly sampled from the rest of the column values with replacement


  cols = 1:ncol(dataset),
  ignore = c(NA, NaN),
  removal = FALSE



A dataframe with value(s) of x in it.


A target value in dataframe to replace with the rest of values per column.


A numeric vector of columns to consider for substitution.


A vector of the rest of the values to ignore for substitution.


Set to TRUE to remove column(s) that consist only of x values.


out A dataframe or list depending on removal:

  • if (removal is FALSE) return the dataset with values of x substituted by the rest of the values per column

  • if (removal is TRUE) return a list with the following:

    • $removed (vectorof numeric): the removed column indices if the column(s) consists only of x values

    • $dataset (dataframe): the dataset with values of x substituted by the rest of the values per column


Pseudocode of algorithm:

  SET dataset = table of values with columns and rows
  SET x = target value for substitution

IF x in dataset: FOR EACH column y in a dataset: SET xv = all x values in y SET rest = all values not equal to x in y IF xv == values in y: REMOVE y in dataset IF number of unique values of rest == 1: MODIFY xv = rest IF number of xv values < number of unique values of rest: SET xn = number of xv values MODIFY xv = random sample of rest with size xn ELSE: SET xn = number of xv values SET p = proportions of rest SET xnp = xn * p IF xnp has decimals: MODIFY xnp = round xnp such that sum(xnp) == xn via largest remainder method MODIFY xv = rest values with distribution of xnp RETURN dataset

See Also

Other data functions: internalRoundFixedSum()


Run this code
unclean <- nbc4vaDataRaw
clean <- nbc4va::internalSubAsRest(unclean, 99)

# }

Run the code above in your browser using DataLab