Learn R Programming

ipflasso (version 1.1)

cvr2.ipflasso: Cross-validated integrative lasso with cross-validated penalty factors

Description

Runs cvr.glmnet giving different penalty factors to predictors from different blocks and chooses the penalty factors by cross-validation from the list pflist of candidates.

Usage

cvr2.ipflasso(X, Y, family, type.measure, standardize=TRUE, 
              alpha=1, blocks, pflist, nfolds, ncv, 
              nzeromax = +Inf, plot=FALSE)

Arguments

X

a (nxp) matrix of predictors with observations in rows and predictors in columns

Y

n-vector giving the value of the response (either continuous, numeric-binary 0/1, or Surv object)

family

should be "gaussian" for continuous Y, "binomial" for binary Y, "cox" for Y of type Surv

type.measure

The accuracy/error measure computed in cross-validation. If not specified, type.measure is "class" (classification error) if family="binomial", "mse" (mean squared error) if family="gaussian" and partial likelihood if family="cox". If family="binomial", one may specify type.measure="auc" (area under the ROC curve).

standardize

whether the predictors should be standardized or not. Default is TRUE.

alpha

the elastic net mixing parameter: alpha=1 yields the L1 penalty (lasso), alpha=0 yields the L2 penalty. Default is alpha=1 (lasso).

blocks

a list of length M the format list(block1=...,block2=..., where the dots should be replaced by the indices of the predictors included in this block. The blocks should form a partition of 1:p.

pflist

a list of candidate penalty factors (see the argument pf of the function cvr.ipflasso) of the format weightslist=list(c(1,1),c(1,2),c(2,1),...).

nfolds

the number of folds of CV procedure.

ncv

the number of repetitions of CV. Not to be confused with nfolds. For example, if one repeats 50 times 5-fold-CV (i.e. considers 50 random partitions into 5 folds in turn and averages the results), nfolds equals 5 and ncv equals 50.

nzeromax

the maximal number of predictors allowed in the final model. Default is +Inf, i.e. the best model is selected based on CV without restriction.

plot

If plot=TRUE, the function outputs plots of CV errors and number of included predictors for each block.

Value

A list with the following arguments:

coeff

the matrix of coefficients obtained with the best combination of penalty factors, with covariates corresponding to rows and lambda values corresponding to columns. The first row contains the intercept of the model.

ind.bestlambda

the index of the best lambda as selected by CV for the best combination of penalty factors.

bestlambda

the best lambda as selected by CV for the best combination of penalty factors.

ind.bestpf

the index of the best penalty factor selected by CV from the list of candidates pflist.

cvm

the CV error for each candidate lambda value, averaged over the ncv runs of cv.glmnet.

a

a list of length length(pflist) containing the outputs of the function cvr.ipflasso for all candidate penalty factors from pflist.

family

See arguments.

References

Boulesteix AL, De Bin R, Jiang X, Fuchs M, 2017. IPF-lasso: integrative L1-penalized regression with penalty factors for prediction based on multi-omics data. Comput Math Methods Med 2017:7691937.

Examples

Run this code
# NOT RUN {
# load ipflasso library
library(ipflasso)

# generate dummy data
X<-matrix(rnorm(50*200),50,200)
Y<-rbinom(50,1,0.5)

cvr2.ipflasso(X=X,Y=Y,family="binomial",type.measure="class",standardize=FALSE,
              blocks=list(block1=1:50,block2=51:200),
              pflist=list(c(1,1),c(1,2),c(2,1)),nfolds=5,ncv=10)
# }

Run the code above in your browser using DataLab