- Y
the outcome.
- X
the covariates. If type = "average_value"
, then the exposure
variable should be part of X
, with its name provided in exposure_name
.
- f1
the fitted values from a flexible estimation technique
regressing Y on X. A vector of the same length as Y
; if sample-splitting
is desired, then the value of f1
at each position should be the result
of predicting from a model trained without that observation.
- f2
the fitted values from a flexible estimation technique
regressing either (a) f1
or (b) Y on X withholding the columns in
indx
. A vector of the same length as Y
; if sample-splitting
is desired, then the value of f2
at each position should be the result
of predicting from a model trained without that observation.
- indx
the indices of the covariate(s) to calculate variable
importance for; defaults to 1.
- type
the type of importance to compute; defaults to
r_squared
, but other supported options are auc
,
accuracy
, deviance
, and anova
.
- run_regression
if outcome Y and covariates X are passed to
vimp_accuracy
, and run_regression
is TRUE
,
then Super Learner will be used; otherwise, variable importance
will be computed using the inputted fitted values.
- SL.library
a character vector of learners to pass to
SuperLearner
, if f1
and f2
are Y and X,
respectively. Defaults to SL.glmnet
, SL.xgboost
,
and SL.mean
.
- alpha
the level to compute the confidence interval at.
Defaults to 0.05, corresponding to a 95% confidence interval.
- delta
the value of the \(\delta\)-null (i.e., testing if
importance < \(\delta\)); defaults to 0.
- scale
should CIs be computed on original ("identity") or
another scale? (options are "log" and "logit")
- na.rm
should we remove NAs in the outcome and fitted values
in computation? (defaults to FALSE
)
- sample_splitting
should we use sample-splitting to estimate the full and
reduced predictiveness? Defaults to TRUE
, since inferences made using
sample_splitting = FALSE
will be invalid for variables with truly zero
importance.
- sample_splitting_folds
the folds used for sample-splitting;
these identify the observations that should be used to evaluate
predictiveness based on the full and reduced sets of covariates, respectively.
Only used if run_regression = FALSE
.
- final_point_estimate
if sample splitting is used, should the final point estimates
be based on only the sample-split folds used for inference ("split"
, the default),
or should they instead be based on the full dataset ("full"
) or the average
across the point estimates from each sample split ("average"
)? All three
options result in valid point estimates -- sample-splitting is only required for valid inference.
- stratified
if run_regression = TRUE, then should the generated
folds be stratified based on the outcome (helps to ensure class balance
across cross-validation folds)
- C
the indicator of coarsening (1 denotes observed, 0 denotes
unobserved).
- Z
either (i) NULL (the default, in which case the argument
C
above must be all ones), or (ii) a character vector
specifying the variable(s) among Y and X that are thought to play a
role in the coarsening mechanism. To specify the outcome, use "Y"
; to
specify covariates, use a character number corresponding to the desired
position in X (e.g., "1"
).
- ipc_scale
what scale should the inverse probability weight correction be applied on (if any)?
Defaults to "identity". (other options are "log" and "logit")
- ipc_weights
weights for the computed influence curve (i.e.,
inverse probability weights for coarsened-at-random settings).
Assumed to be already inverted (i.e., ipc_weights = 1 / [estimated
probability weights]).
- ipc_est_type
the type of procedure used for coarsened-at-random
settings; options are "ipw" (for inverse probability weighting) or
"aipw" (for augmented inverse probability weighting).
Only used if C
is not all equal to 1.
- scale_est
should the point estimate be scaled to be greater than or equal to 0?
Defaults to TRUE
.
- nuisance_estimators_full
(only used if type = "average_value"
)
a list of nuisance function estimators on the
observed data (may be within a specified fold, for cross-fitted estimates).
Specifically: an estimator of the optimal treatment rule; an estimator of the
propensity score under the estimated optimal treatment rule; and an estimator
of the outcome regression when treatment is assigned according to the estimated optimal rule.
- nuisance_estimators_reduced
(only used if type = "average_value"
)
a list of nuisance function estimators on the
observed data (may be within a specified fold, for cross-fitted estimates).
Specifically: an estimator of the optimal treatment rule; an estimator of the
propensity score under the estimated optimal treatment rule; and an estimator
of the outcome regression when treatment is assigned according to the estimated optimal rule.
- exposure_name
(only used if type = "average_value"
) the name of
the exposure of interest; binary, with 1 indicating presence of the exposure and
0 indicating absence of the exposure.
- bootstrap
should bootstrap-based standard error estimates be computed?
Defaults to FALSE
(and currently may only be used if
sample_splitting = FALSE
).
- b
the number of bootstrap replicates (only used if bootstrap = TRUE
and sample_splitting = FALSE
); defaults to 1000.
- boot_interval_type
the type of bootstrap interval (one of "norm"
,
"basic"
, "stud"
, "perc"
, or "bca"
, as in
boot{boot.ci}
) if requested. Defaults to "perc"
.
- clustered
should the bootstrap resamples be performed on clusters
rather than individual observations? Defaults to FALSE
.
- cluster_id
vector of the same length as Y
giving the cluster IDs
used for the clustered bootstrap, if clustered
is TRUE
.
- ...
other arguments to the estimation tool, see "See also".