Learn R Programming

mdgc (version 0.1.2)

mdgc_impute: Impute Missing Values

Description

Imputes missing values given a covariance matrix and mean vector using a similar quasi-random numbers method as mdgc_log_ml.

Usage

mdgc_impute(
  object,
  vcov,
  mea,
  rel_eps = 0.001,
  maxit = 10000L,
  abs_eps = -1,
  n_threads = 1L,
  do_reorder = TRUE,
  minvls = 1000L,
  use_aprx = FALSE
)

Arguments

object

returned object from get_mdgc.

vcov

covariance matrix to condition on in the imputation.

mea

vector with non-zero mean entries to condition on.

rel_eps

relative convergence threshold for each term in the approximation.

maxit

maximum number of samples

abs_eps

absolute convergence threshold for each term in the approximation.

n_threads

number of threads to use.

do_reorder

logical for whether to use a heuristic variable reordering. TRUE is likely the best option.

minvls

minimum number of samples.

use_aprx

logical for whether to use an approximation of pnorm and qnorm. This may yield a noticeable reduction in the computation time.

Value

A list of lists with imputed values for the continuous variables and a vector with probabilities for each level for the ordinal, binary, and multinomial variables.

Examples

Run this code
# NOT RUN {
# randomly mask data
set.seed(11)
masked_data <- iris
masked_data[matrix(runif(prod(dim(iris))) < .10, NROW(iris))] <- NA

# use the functions in the package
library(mdgc)
obj <- get_mdgc(masked_data)
ptr <- get_mdgc_log_ml(obj)
start_vals <- mdgc_start_value(obj)

fit <- mdgc_fit(ptr, start_vals, obj$means, rel_eps = 1e-2, maxpts = 10000L,
                minvls = 1000L, use_aprx = TRUE, batch_size = 100L, lr = .001,
                maxit = 100L, n_threads = 2L)

# impute using the estimated values
imputed <- mdgc_impute(obj, fit$result$vcov, fit$result$mea, minvls = 1000L,
                       maxit = 10000L, n_threads = 2L, use_aprx = TRUE)
imputed[1:5] # first 5 observations
head(masked_data, 5) # observed
head(iris       , 5) # truth
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab