Learn R Programming

missMethods (version 0.4.0)

delete_MAR_rank: Create MAR values using a ranking mechanism

Description

Create missing at random (MAR) values using a ranking mechanism in a data frame or a matrix

Usage

delete_MAR_rank(
  ds,
  p,
  cols_mis,
  cols_ctrl,
  n_mis_stochastic = FALSE,
  ties.method = "average",
  miss_cols,
  ctrl_cols
)

Value

An object of the same class as ds with missing values.

Arguments

ds

A data frame or matrix in which missing values will be created.

p

A numeric vector with length one or equal to length cols_mis; the probability that a value is missing.

cols_mis

A vector of column names or indices of columns in which missing values will be created.

cols_ctrl

A vector of column names or indices of columns, which controls the creation of missing values in cols_mis. Must be of the same length as cols_mis.

n_mis_stochastic

Logical, should the number of missing values be stochastic? If n_mis_stochastic = TRUE, the number of missing values for a column with missing values cols_mis[i] is a random variable with expected value nrow(ds) * p[i]. If n_mis_stochastic = FALSE, the number of missing values will be deterministic. Normally, the number of missing values for a column with missing values cols_mis[i] is round(nrow(ds) * p[i]). Possible deviations from this value, if any exists, are documented in Details.

ties.method

How ties are handled. Passed to rank.

miss_cols

Deprecated, use cols_mis instead.

ctrl_cols

Deprecated, use cols_ctrl instead.

Details

This function creates missing at random (MAR) values in the columns specified by the argument cols_mis. The probability for missing values is controlled by p. If p is a single number, then the overall probability for a value to be missing will be p in all columns of cols_mis. (Internally p will be replicated to a vector of the same length as cols_mis. So, all p[i] in the following sections will be equal to the given single number p.) Otherwise, p must be of the same length as cols_mis. In this case, the overall probability for a value to be missing will be p[i] in the column cols_mis[i]. The position of the missing values in cols_mis[i] is controlled by cols_ctrl[i]. The following procedure is applied for each pair of cols_ctrl[i] and cols_mis[i] to determine the positions of missing values:

At first, the probability for a value to be missing is calculated. This probability for a missing value in a row of cols_mis[i] is proportional to the rank of the value in cols_ctrl[i] in the same row. If n_mis_stochastic = FALSE these probabilities are given to the prob argument of sample. If n_mis_stochastic = TRUE, they are scaled to sum up to nrow(ds) * p[i]. Then for each probability a uniformly distributed random number is generated. If this random number is less than the probability, the value in cols_mis[i] is set NA.

The ranks are calculated via rank. The argument ties.method is directly passed to this function. Possible choices for ties.method are documented in rank.

For high values of p it is mathematically not possible to get probabilities proportional to the ranks. In this case, a warning is given. This warning can be silenced by setting the option missMethods.warn.too.high.p to false.

References

Santos, M. S., Pereira, R. C., Costa, A. F., Soares, J. P., Santos, J., & Abreu, P. H. (2019). Generating Synthetic Missing Data: A Review by Missing Mechanism. IEEE Access, 7, 11651-11667

See Also

rank, delete_MNAR_rank

Other functions to create MAR: delete_MAR_1_to_x(), delete_MAR_censoring(), delete_MAR_one_group()

Examples

Run this code
ds <- data.frame(X = 1:20, Y = 101:120)
delete_MAR_rank(ds, 0.2, "X", "Y")

Run the code above in your browser using DataLab