
Selection of influential variables or model components with error control.
## a method to compute stability selection paths for fitted mboostLSS models
# S3 method for mboostLSS
stabsel(x, cutoff, q, PFER, mstop = NULL,
folds = subsample(model.weights(x), B = B),
B = ifelse(sampling.type == "MB", 100, 50),
assumption = c("unimodal", "r-concave", "none"),
sampling.type = c("SS", "MB"),
papply = mclapply, verbose = TRUE, FWER, eval = TRUE, ...)
## a method to get the selected parameters
# S3 method for stabsel_mboostLSS
selected(object, parameter = NULL, ...)
An object of class stabsel
with a special print
method.
The object has the following elements:
selection probabilities.
elements with maximal selection probability greater
cutoff
.
maximum of selection probabilities.
cutoff used.
average number of selected variables used.
per-family error rate.
the sampling type used for stability selection.
the assumptions made on the selection probabilities.
the call.
an fitted model of class "mboostLSS"
or "nc_mboostLSS"
.
cutoff between 0.5 and 1. Preferably a value between 0.6 and 0.9 should be used.
number of (unique) selected variables (or groups of variables depending on the model) that are selected on each subsample.
upper bound for the per-family error rate. This specifies the amount of falsely selected base-learners, which is tolerated. See details.
mstop value to use, if no value is supplied the mstop value of the fitted model is used.
a weight matrix with number of rows equal to the number
of observations, see cvrisk
and
subsample
. Usually one should not
change the default here as subsampling with a fraction of
Defines the type of assumptions on the
distributions of the selection probabilities and simultaneous
selection probabilities. Only applicable for
sampling.type = "SS"
. For sampling.type = "MB"
we
always use "none"
.
use sampling scheme of of Shah & Samworth
(2013), i.e., with complementarty pairs (sampling.type = "SS"
),
or the original sampling scheme of Meinshausen & Buehlmann (2010).
number of subsampling replicates. Per default, we use 50
complementary pairs for the error bounds of Shah & Samworth (2013)
and 100 for the error bound derived in Meinshausen & Buehlmann
(2010). As we use
(parallel) apply function, defaults to
mclapply
. Alternatively, parLapply
can be used. In the latter case, usually more setup is needed (see
example of cvrisk
for some details).
logical (default: TRUE
) that determines wether
warnings
should be issued.
deprecated. Only for compatibility with older versions, use PFER instead.
logical. Determines whether stability selection is
evaluated (eval = TRUE
; default) or if only the parameter
combination is returned.
a object of class "stabsel_mboostLSS"
.
select one or multiple effects.
additional arguments to parallel apply methods such as
mclapply
and to cvrisk
.
Stability selection is to be preferably used with non-cyclic gamboostLSS
models, as proposed by Thomas et al. (2018). In this publication, the combination
of package gamboostLSS with stability selection was devoloped and is
investigated in depth.
For details on stability selection see stabsel
in package
stabs and Hofner et al. (2014).
B. Hofner, L. Boccuto and M. Goeker (2015), Controlling false discoveries in high-dimensional situations: Boosting with stability selection. BMC Bioinformatics, 16:144.
N. Meinshausen and P. Buehlmann (2010), Stability selection. Journal of the Royal Statistical Society, Series B, 72, 417--473.
R.D. Shah and R.J. Samworth (2013), Variable selection with error control: another look at stability selection. Journal of the Royal Statistical Society, Series B, 75, 55--80.
Thomas, J., Mayr, A., Bischl, B., Schmid, M., Smith, A., and Hofner, B. (2018),
Gradient boosting for distributional regression - faster tuning and improved
variable selection via noncyclical updates.
Statistics and Computing. 28: 673-687.
tools:::Rd_expr_doi("10.1007/s11222-017-9754-6")
(Preliminary version: https://arxiv.org/abs/1611.10171).
stabsel
and
stabsel_parameters
### Data generating process:
set.seed(1907)
x1 <- rnorm(500)
x2 <- rnorm(500)
x3 <- rnorm(500)
x4 <- rnorm(500)
x5 <- rnorm(500)
x6 <- rnorm(500)
mu <- exp(1.5 +1 * x1 +0.5 * x2 -0.5 * x3 -1 * x4)
sigma <- exp(-0.4 * x3 -0.2 * x4 +0.2 * x5 +0.4 * x6)
y <- numeric(500)
for( i in 1:500)
y[i] <- rnbinom(1, size = sigma[i], mu = mu[i])
dat <- data.frame(x1, x2, x3, x4, x5, x6, y)
### linear model with y ~ . for both components: 400 boosting iterations
model <- glmboostLSS(y ~ ., families = NBinomialLSS(), data = dat,
control = boost_control(mstop = 400),
center = TRUE, method = "noncyclic")
### Do not test the following code per default on CRAN as it takes some time to run:
#run stability selection
(s <- stabsel(model, q = 5, PFER = 1))
#get selected effects
selected(s)
#visualize selection frequencies
plot(s)
### END (don't test automatically)
Run the code above in your browser using DataLab