sim_apply: Apply a function to simulated parameter values

Description

sim_apply() applies a function that produces quantities of interest to each set of simulated coefficients produced by sim(); these calculated quantities form the posterior sampling distribution for the quantities of interest. Capabilities are available for parallelization.

Usage

sim_apply(sim, FUN, verbose = TRUE, cl = NULL, ...)

Value

A clarify_est object, which is a matrix with a column for each estimated quantity and a row for each simulation. The original estimates (FUN applied to the original coefficients or model fit object) are stored in the attribute "original". The "sim_hash" attribute contains the simulation hash produced by sim().

Arguments

sim: a clarify_sim object; the output of a call to sim() or misim().
FUN: a function to be applied to each set of simulated coefficients. See Details.
verbose: logical; whether to display a text progress bar indicating progress and estimated time remaining for the procedure. Default is TRUE.
cl: a cluster object created by parallel::makeCluster(), or an integer to indicate the number of child-processes (integer values are ignored on Windows) for parallel evaluations. See pbapply::pblapply() for details. If NULL, no parallelization will take place.
...: optional arguments passed to FUN.

Details

sim_apply() applies a function, FUN, to each set of simulated coefficients, similar to apply(). This function should return a numeric vector containing one or more estimated quantities. This should be a named vector to more easily keep track of the meaning of each estimated quantity. Care should be taken to ensure that the returned vector is the same length each time FUN is called. NAs are allowed in the output but should be avoided if possible.

The arguments to FUN can be specified in a few ways. If FUN has an argument called coefs, a simulated set of coefficients will be passed to this argument, and FUN should compute and return a quantity based on the coefficients (e.g., the difference between two coefficients if one wants to test whether two coefficients are equal). If FUN has an argument called fit, a model fit object of the same type as the one originally supplied to sim() (e.g., an lm or glm object) will be passed to this argument, where the coefficients of the fit object have been replaced by the simulated coefficients generated by sim(), and FUN should compute and return a quantity based on the model fit (e.g., a computation based on the output of predict()). If neither coefs nor fit are the names of arguments to FUN, the model fit object with replaced coefficients will be supplied to the first argument of FUN.

When custom coefficients are supplied to sim(), i.e., when the coefs argument to sim() is not left at its default value, FUN must accept a coefs argument and a warning will be thrown if it accepts a fit argument. This is because sim_apply() does not know how to reconstruct the original fit object with the new coefficients inserted. The quantities computed by sim_apply() must therefore be computed directly from the coefficients.

If FUN is not supplied at all, the simulated values of the coefficients will be returned in the output with a warning. Set FUN to NULL or verbose to FALSE to suppress this warning.

`sim_apply()` with multiply imputed data

When using misim() and sim_apply() with multiply imputed data, the coefficients are supplied to the model fit corresponding to the imputation identifier associated with each set of coefficients, which means if FUN uses a dataset extracted from a model (e.g., using insight::get_data()), it will do so from the model fit in the corresponding imputation.

The original estimates (see Value below) are computed as the mean of the estimates across the imputations using the original coefficients averaged across imputations. That is, first, the coefficients estimated in the models in the imputed datasets are combined to form a single set of pooled coefficients; then, for each imputation, the quantities of interest are computed using the pooled coefficients; finally, the mean of the resulting estimates across the imputations are taken as the "original" estimates. Note this procedure is only valid for quantities with symmetric sampling distributions, which excludes quantities like risk ratios and odds ratios, but includes log risk ratios and log odds ratios. The desired quantities can be transformed from their log versions using transform().

Examples

Run this code


data("lalonde", package = "MatchIt")
fit <- lm(re78 ~ treat + age + race + nodegree + re74,
          data = lalonde)
coef(fit)

set.seed(123)
s <- sim(fit, n = 500)

# Function to compare predicted values for two units
# using `fit` argument
sim_fun <- function(fit) {
  pred1 <- unname(predict(fit, newdata = lalonde[1,]))
  pred2 <- unname(predict(fit, newdata = lalonde[2,]))
  c(pred1 = pred1, pred2 = pred2)
}

est <- sim_apply(s, sim_fun, verbose = FALSE)

# Add difference between predicted values as
# additional quantity
est <- transform(est, `diff 1-2` = pred1 - pred2)

# Examine estimates and confidence intervals
summary(est)

# Function to compare coefficients using `coefs`
# argument
sim_fun <- function(coefs) {
  setNames(coefs["racewhite"] - coefs["racehispan"],
           "wh - his")
}

est <- sim_apply(s, sim_fun, verbose = FALSE)

# Examine estimates and confidence intervals
summary(est)

# Another way to do the above:
est <- sim_apply(s, FUN = NULL)
est <- transform(est,
                 `wh - his` = `racewhite` - `racehispan`)

summary(est, parm = "wh - his")

Run the code above in your browser using DataLab