Learn R Programming

spikeslab (version 1.1.6)

cv.spikeslab: K-fold Cross-Validation for Spike and Slab Regression

Description

Computes the K-fold cross-validated mean squared prediction error for the generalized elastic net from spike and slab regression. Returns a stability index for each variable.

Usage

cv.spikeslab(x = NULL, y = NULL, K = 10,
    plot.it = TRUE, n.iter1 = 500, n.iter2 = 500, mse = TRUE,
    bigp.smalln = FALSE, bigp.smalln.factor = 1, screen = (bigp.smalln),
    r.effects = NULL, max.var = 500, center = TRUE, intercept = TRUE,
    fast = TRUE, beta.blocks = 5, verbose = TRUE, save.all = TRUE,
    ntree = 300, seed = NULL, ...)

Arguments

x

x-predictor matrix.

y

y-response values.

K

Number of folds.

plot.it

If TRUE, plots the mean prediction error and its standard error.

n.iter1

Number of burn-in Gibbs sampled values (i.e., discarded values).

n.iter2

Number of Gibbs sampled values, following burn-in.

mse

If TRUE, an external estimate for the overall variance is calculated.

bigp.smalln

Use if p >> n.

bigp.smalln.factor

Top n times this value of variables to be kept in the filtering step (used when p >> n).

screen

If TRUE, variables are first pre-filtered.

r.effects

List used for grouping variables (see details below).

max.var

Maximum number of variables allowed in the final model.

center

If TRUE, variables are centered by their means. Default is TRUE and should only be adjusted in extreme examples.

intercept

If TRUE, an intercept is included in the model, otherwise no intercept is included. Default is TRUE.

fast

If TRUE, use blocked Gibbs sampling to accelerate the algorithm.

beta.blocks

Update beta using this number of blocks (fast must be TRUE).

verbose

If TRUE, verbose output is sent to the terminal.

save.all

If TRUE, spikeslab object for each fold is saved and returned.

ntree

Number of trees used by random forests (applies only when mse is TRUE).

seed

Seed for random number generator. Must be a negative integer.

...

Further arguments passed to or from other methods.

Value

Invisibly returns a list with components:

spikeslab.obj

Spike and slab object from the full data.

cv.spikeslab.obj

List containing spike and slab objects from each fold. Can be NULL.

cv.fold

List containing the cv splits.

cv

Mean-squared error for each fold for the gnet.

cv.path

A matrix of mean-squared errors for the gnet solution path. Rows correspond to model sizes, columns are the folds.

stability

Matrix containing stability for each variable defined as the percentage of times a variable is identified over the K-folds. Also includes bma and gnet coefficient values and their cv-fold-averaged values.

bma

bma coefficients from the full data in terms of the standardized x.

bma.scale

bma coefficients from the full data, scaled in terms of the original x.

gnet

cv-optimized gnet in terms of the standardized x.

gnet.scale

cv-optimized gnet in terms of the original x.

gnet.model

List of models selected by gnet over the K-folds.

gnet.path

gnet path from the full data, scaled in terms of the original x.

gnet.obj

gnet object from fitting the full data (a lars-type object).

gnet.obj.vars

Variables (in order) used to calculate the gnet object.

verbose

Verbose details (used for printing).

References

Ishwaran H. and Rao J.S. (2005a). Spike and slab variable selection: frequentist and Bayesian strategies. Ann. Statist., 33:730-773.

Ishwaran H. and Rao J.S. (2010). Generalized ridge regression: geometry and computational solutions when p is larger than n.

Ishwaran H. and Rao J.S. (2011). Mixing generalized ridge regressions.

See Also

sparsePC.spikeslab, plot.spikeslab, predict.spikeslab, print.spikeslab.

Examples

Run this code
# NOT RUN {
#------------------------------------------------------------
# Example 1: 10-fold validation using parallel processing
#------------------------------------------------------------

data(ozoneI, package = "spikeslab")
y <- ozoneI[,  1]
x <- ozoneI[, -1]
cv.obj <- cv.spikeslab(x = x, y = y, parallel = 4)
plot(cv.obj, plot.type = "cv")
plot(cv.obj, plot.type = "path")

#------------------------------------------------------------
# Example 2: 10-fold validation using parallel processing
# (high dimensional diabetes data)
#------------------------------------------------------------

# add 2000 noise variables
data(diabetesI, package = "spikeslab")
diabetes.noise <- cbind(diabetesI,
      noise = matrix(rnorm(nrow(diabetesI) * 2000), nrow(diabetesI)))
x <- diabetes.noise[, -1]
y <- diabetes.noise[, 1]

cv.obj <- cv.spikeslab(x = x, y = y, bigp.smalln=TRUE, parallel = 4)
plot(cv.obj)
# }

Run the code above in your browser using DataLab