cvr.adaptive.ipflasso: Cross-validated integrative lasso with adaptive penalty factors

Description

Runs cvr.ipflasso applying different data based penalty factors to predictors from different blocks.

Usage

cvr.adaptive.ipflasso(X, Y, family, type.measure, standardize = TRUE,
                                  alpha, type.step1, blocks, nfolds, ncv)

Arguments

a (nxp) matrix of predictors with observations in rows and predictors in columns.

n-vector giving the value of the response (either continuous, numeric-binary 0/1, or Surv object).

family

should be "gaussian" for continuous Y, "binomial" for binary Y, "cox" for Y of type Surv.

type.measure

the accuracy/error measure computed in cross-validation. If not specified, type.measure is "class" (classification error) if family="binomial", "mse" (mean squared error) if family="gaussian" and partial likelihood if family="cox". If family="binomial", one may specify type.measure="auc" (area under the ROC curve).

standardize

whether the predictors should be standardized or not. Default is TRUE.

alpha

the elastic net mixing parameter for step 1: alpha=1 yields the L1 penalty (Lasso), alpha=0 yields the L2 penalty (Ridge).

type.step1

whether the models of step 1 should be run on the whole data set X (type.step1="comb") or separately for each block (type.step1="sep").

blocks

a list of length M of the format list(block1=...,block2=..., where the dots should be replaced by the indices of the predictors included in this block. The blocks should form a partition of 1:p.

nfolds

the number of folds of the CV procedure.

ncv

the number of repetitions of the CV. Not to be confused with nfolds. For example, if one repeats 50 times 5-fold-CV (i.e. considers 50 random partitions into 5 folds in turn and averages the results), nfolds equals 5 and ncv equals 50.

Value

A list with the following arguments:

coeff

the matrix of coefficients with predictors corresponding to rows and lambda values corresponding to columns. The first row contains the intercept of the models (for all families other than "cox"). In the special case of separate step 1 Lasso models and all coefficient means equal to zero, the intercept is the average of the separate model intercepts per block.

ind.bestlambda

the index of the best lambda according to CV.

lambda

the lambda sequence. In the special case of separate step 1 Lasso models and all coefficient means equal to zero, it is the lambda sequence with the highest lambda value among the lambda sequences of all blocks.

cvm

the CV estimate of the measure specified by type.measure for each candidate lambda value. In the special case of separate step 1 Lasso models and all coefficient means equal to zero, cmv is the average of the separate model cvms per block.

nzero

the number of non-zero coefficients in the selected model.

In the special case of separate step 1 Lasso models and all coefficient means equal to zero, nzero is the sum of the non-zero coefficients of the separate models per block.

family

see arguments.

means.step1

the arithmetic means of the absolute model coefficients per block, returned by the first step of the function.

exc

the exclusion vector containing the indices of the block(s) to be excluded from X.

Details

The penalty factors are the inverse arithmetic means of the absolute model coefficients per block, generated in a first step of the function. The user can choose to determine these coefficients by running a Lasso model (alpha=1) or a Ridge model (alpha=0) either on the whole data set (type.step1="comb") or seperately for each block (type.step1="sep"). If type.step1 is ommited, it will be set to "sep" for Lasso and to "comb" for Ridge. If a Lasso model in step 1 returns any zero coefficient mean, the corresponding block will be excluded from the input date set X and step 2 will be run with the remaining blocks. If all model coefficient means are zero, step 2 will not be performed.

References

Schulze, Gerhard (2017): Clinical Outcome Prediction Based on Multi-Omics Data: Extension of IPF-LASSO. Masterarbeit, Ludwig-Maximilians-Universitaet Muenchen (Department of Statistics: Technical Reports) https://doi.org/10.5282/ubm/epub.59092

Examples

Run this code

# NOT RUN {
# load ipflasso library
library(ipflasso)

# generate dummy data
X<-matrix(rnorm(50*200),50,200)
Y<-rbinom(50,1,0.5)

cvr.adaptive.ipflasso(X=X,Y=Y,family="binomial",type.measure="class",standardize=FALSE,
                      alpha = 1,blocks=list(block1=1:50,block2=51:200),nfolds=5,ncv=10)
# }

Run the code above in your browser using DataLab