Perform a hypothesis test against the null hypothesis of zero importance by: (i) for a user-specified level \(\alpha\), compute a \((1 - \alpha)\times 100\)% confidence interval around the predictiveness for both the full and reduced regression functions (these must be estimated on independent splits of the data); (ii) if the intervals do not overlap, reject the null hypothesis.
vimp_hypothesis_test(
full,
reduced,
y,
folds,
delta = 0,
weights = rep(1, length(y)),
type = "r_squared",
alpha = 0.05,
cv = FALSE,
scale = "identity",
na.rm = FALSE
)
either (i) fitted values from a regression of the outcome on the full set of covariates from a first independent split of the data (if cv = FALSE
) or (ii) a list of predicted values from a cross-validated procedure (if cv = TRUE
).
fitted values from a regression either (1) of the outcome on the reduced set of covariates, or (2) of the predicted values from the full regression on the reduced set of covariates; either (i) a single set of predictions (if cv = FALSE
) fit on an independent split of the data from full
or (ii) a list of predicted values from a cross-validated procedure (if cv = TRUE
).
the outcome.
the folds used for splitting. If cv = FALSE
, assumed to be a vector with 1 for the full regression and 2 for the reduced regression (if V = 2). If cv = TRUE
, assumed to be a list with first element the outer folds (for hypothesis testing) and second element a list with the inner cross-validation folds.
the value of the \(\delta\)-null (i.e., testing if importance < \(\delta\)); defaults to 0.
weights for the computed influence curve (e.g., inverse probability weights for coarsened-at-random settings)
which parameter are you estimating (defaults to r_squared
, for difference in R-squared-based variable importance)?
the desired type I error rate (defaults to 0.05).
was V-fold cross-validation used to estimate the predictiveness (TRUE
) or was the sample split in two (FALSE
); defaults to FALSE
.
scale to compute CI on ("identity" for identity scale, "logit" for logit scale and back-transform)
logical; should NAs be removed in computation? (defaults to FALSE
)
TRUE
if the null hypothesis is rejected (i.e., if the confidence intervals do not overlap); otherwise, FALSE
.
See the paper by Williamson, Gilbert, Simon, and Carone for more details on the mathematics behind this function and the definition of the parameter of interest.