Perform k-fold cross validation for penalized regression models over a grid of values for the regularization parameter lambda.
cv.biglasso(
X,
y,
row.idx = 1:nrow(X),
family = c("gaussian", "binomial", "cox", "mgaussian"),
eval.metric = c("default", "MAPE", "auc", "class"),
ncores = parallel::detectCores(),
...,
nfolds = 5,
seed,
cv.ind,
trace = FALSE,
grouped = TRUE
)
An object with S3 class "cv.biglasso"
which inherits from
class "cv.ncvreg"
. The following variables are contained in the
class (adopted from cv.ncvreg
).
The error
for each value of lambda
, averaged across the cross-validation
folds.
The estimated standard error associated with each value
of for cve
.
The sequence of regularization parameter values along which the cross-validation error was calculated.
The fitted biglasso
object for the whole data.
The index of lambda
corresponding to lambda.min
.
The value of lambda
with the minimum
cross-validation error.
The largest value of lambda
for which the cross-validation error is at most one standard error larger
than the minimum cross-validation error.
The deviance for the intercept-only model.
If family="binomial"
, the
cross-validation prediction error for each value of lambda
.
Same as above.
The design matrix, without an intercept, as in
biglasso
.
The response vector, as in biglasso
.
The integer vector of row indices of X
that used for
fitting the model. as in biglasso
.
Either "gaussian"
, "binomial"
, "cox"
or
"mgaussian"
depending on the response. "cox"
and "mgaussian"
are not supported yet.
The evaluation metric for the cross-validated error and
for choosing optimal lambda
. "default" for linear regression is MSE
(mean squared error), for logistic regression is binomial deviance.
"MAPE", for linear regression only, is the Mean Absolute Percentage Error.
"auc", for binary classification, is the area under the receiver operating
characteristic curve (ROC).
"class", for binary classification, gives the misclassification error.
The number of cores to use for parallel execution of the
cross-validation folds, run on a cluster created by the parallel
package. (This is also supplied to the ncores
argument in
biglasso
, which is the number of OpenMP threads, but only for
the first call of biglasso
that is run on the entire data. The
individual calls of biglasso
for the CV folds are run without
the ncores
argument.)
Additional arguments to biglasso
.
The number of cross-validation folds. Default is 5.
The seed of the random number generator in order to obtain reproducible results.
Which fold each observation belongs to. By default the
observations are randomly assigned by cv.biglasso
.
If set to TRUE, cv.biglasso will inform the user of its progress by announcing the beginning of each CV fold. Default is FALSE.
Whether to calculate CV standard error (cvse
) over
CV folds (TRUE
), or over all cross-validated predictions. Ignored
when eval.metric
is 'auc'.
Yaohui Zeng and Patrick Breheny
Maintainer: Yaohui Zeng <yaohui.zeng@gmail.com>
The function calls biglasso
nfolds
times, each time leaving
out 1/nfolds
of the data. The cross-validation error is based on the
residual sum of squares when family="gaussian"
and the binomial
deviance when family="binomial"
.
The S3 class object
cv.biglasso
inherits class cv.ncvreg
. So S3
functions such as "summary", "plot"
can be directly applied to the
cv.biglasso
object.
biglasso
, plot.cv.biglasso
,
summary.cv.biglasso
, setupX
if (FALSE) {
## cv.biglasso
data(colon)
X <- colon$X
y <- colon$y
X.bm <- as.big.matrix(X)
## logistic regression
cvfit <- cv.biglasso(X.bm, y, family = 'binomial', seed = 1234, ncores = 2)
par(mfrow = c(2, 2))
plot(cvfit, type = 'all')
summary(cvfit)
}
Run the code above in your browser using DataLab