- Y
the outcome.
- X
the covariates. If type = "average_value", then the exposure
variable should be part of X, with its name provided in exposure_name.
- f1
the fitted values from a flexible estimation technique
regressing Y on X. A vector of the same length as Y; if sample-splitting
is desired, then the value of f1 at each position should be the result
of predicting from a model trained without that observation.
- f2
the fitted values from a flexible estimation technique
regressing either (a) f1 or (b) Y on X withholding the columns in
indx. A vector of the same length as Y; if sample-splitting
is desired, then the value of f2 at each position should be the result
of predicting from a model trained without that observation.
- indx
the indices of the covariate(s) to calculate variable
importance for; defaults to 1.
- type
the type of importance to compute; defaults to
r_squared, but other supported options are auc,
accuracy, deviance, and anova.
- run_regression
if outcome Y and covariates X are passed to
vimp_accuracy, and run_regression is TRUE,
then Super Learner will be used; otherwise, variable importance
will be computed using the inputted fitted values.
- SL.library
a character vector of learners to pass to
SuperLearner, if f1 and f2 are Y and X,
respectively. Defaults to SL.glmnet, SL.xgboost,
and SL.mean.
- alpha
the level to compute the confidence interval at.
Defaults to 0.05, corresponding to a 95% confidence interval.
- delta
the value of the \(\delta\)-null (i.e., testing if
importance < \(\delta\)); defaults to 0.
- scale
should CIs be computed on original ("identity") or
another scale? (options are "log" and "logit")
- na.rm
should we remove NAs in the outcome and fitted values
in computation? (defaults to FALSE)
- sample_splitting
should we use sample-splitting to estimate the full and
reduced predictiveness? Defaults to TRUE, since inferences made using
sample_splitting = FALSE will be invalid for variables with truly zero
importance.
- sample_splitting_folds
the folds used for sample-splitting;
these identify the observations that should be used to evaluate
predictiveness based on the full and reduced sets of covariates, respectively.
Only used if run_regression = FALSE.
- final_point_estimate
if sample splitting is used, should the final point estimates
be based on only the sample-split folds used for inference ("split", the default),
or should they instead be based on the full dataset ("full") or the average
across the point estimates from each sample split ("average")? All three
options result in valid point estimates -- sample-splitting is only required for valid inference.
- stratified
if run_regression = TRUE, then should the generated
folds be stratified based on the outcome (helps to ensure class balance
across cross-validation folds)
- C
the indicator of coarsening (1 denotes observed, 0 denotes
unobserved).
- Z
either (i) NULL (the default, in which case the argument
C above must be all ones), or (ii) a character vector
specifying the variable(s) among Y and X that are thought to play a
role in the coarsening mechanism. To specify the outcome, use "Y"; to
specify covariates, use a character number corresponding to the desired
position in X (e.g., "1").
- ipc_scale
what scale should the inverse probability weight correction be applied on (if any)?
Defaults to "identity". (other options are "log" and "logit")
- ipc_weights
weights for the computed influence curve (i.e.,
inverse probability weights for coarsened-at-random settings).
Assumed to be already inverted (i.e., ipc_weights = 1 / [estimated
probability weights]).
- ipc_est_type
the type of procedure used for coarsened-at-random
settings; options are "ipw" (for inverse probability weighting) or
"aipw" (for augmented inverse probability weighting).
Only used if C is not all equal to 1.
- scale_est
should the point estimate be scaled to be greater than or equal to 0?
Defaults to TRUE.
- nuisance_estimators_full
(only used if type = "average_value")
a list of nuisance function estimators on the
observed data (may be within a specified fold, for cross-fitted estimates).
Specifically: an estimator of the optimal treatment rule; an estimator of the
propensity score under the estimated optimal treatment rule; and an estimator
of the outcome regression when treatment is assigned according to the estimated optimal rule.
- nuisance_estimators_reduced
(only used if type = "average_value")
a list of nuisance function estimators on the
observed data (may be within a specified fold, for cross-fitted estimates).
Specifically: an estimator of the optimal treatment rule; an estimator of the
propensity score under the estimated optimal treatment rule; and an estimator
of the outcome regression when treatment is assigned according to the estimated optimal rule.
- exposure_name
(only used if type = "average_value") the name of
the exposure of interest; binary, with 1 indicating presence of the exposure and
0 indicating absence of the exposure.
- bootstrap
should bootstrap-based standard error estimates be computed?
Defaults to FALSE (and currently may only be used if
sample_splitting = FALSE).
- b
the number of bootstrap replicates (only used if bootstrap = TRUE
and sample_splitting = FALSE); defaults to 1000.
- boot_interval_type
the type of bootstrap interval (one of "norm",
"basic", "stud", "perc", or "bca", as in
boot{boot.ci}) if requested. Defaults to "perc".
- clustered
should the bootstrap resamples be performed on clusters
rather than individual observations? Defaults to FALSE.
- cluster_id
vector of the same length as Y giving the cluster IDs
used for the clustered bootstrap, if clustered is TRUE.
- ...
other arguments to the estimation tool, see "See also".