compCL: Fit regularization path for log-contrast model of compositional data with lasso penalty.

Description

Fit regression with compositional predictors via penalized log-contrast model which was proposed by Lin et al. (2014) <doi:10.1093/biomet/asu031>. The model estimation is conducted by minimizing a linearly constrained lasso criterion. The regularization paths are computed at a grid of tuning parameter lambda.

Usage

compCL(y, Z, Zc = NULL, intercept = TRUE,
       lam = NULL, nlam = 100, lambda.factor = ifelse(n < p, 0.05, 0.001),
       pf = rep(1, times = p), dfmax = p, pfmax = min(dfmax * 1.5, p),
       u = 1, mu_ratio = 1.01, tol = 1e-10,
       inner_maxiter = 1e+4, inner_eps = 1e-6,
       outer_maxiter = 1e+08, outer_eps = 1e-8)

Arguments

a response vector with length n.

a $n \times p$ design matrix of compositional data or categorical data. If Z is categorical data, i.e., row-sums of Z differ from 1, the program automatically transforms Z into compositional data by dividing each row by its sum. Z could NOT include entry of 0's.

a $n*p_c$ design matrix of control variables (not penalized). Default is NULL.

intercept

Boolean, specifying whether to include an intercept. Default is FALSE.

lam

a user supplied lambda sequence. If lam is provided as a scaler and nlam$>1$, lam sequence is created starting from lam. To run a single value of lam, set nlam$=1$. The program will sort user-defined lambda sequence in decreasing order.

nlam

the length of the lam sequence. Default is 100. No effect if lam is provided.

lambda.factor

the factor for getting the minimal lambda in the lam sequence, where min(lam) = lambda.factor * max(lam). max(lam) is the smallest value of lam for which all penalized coefficients become zero. If $n >= p$, the default is 0.001. If $n < p$, the default is 0.05.

penalty factor, a vector of length p. Zero implies no shrinkage. Default value for each entry is 1.

dfmax

limit the maximum number of groups in the model. Useful for handling very large $p$, if a partial path is desired. Default is $p$.

pfmax

limit the maximum number of groups ever to be nonzero. For example once a group enters the model along the path, no matter how many times it re-enters the model through the path, it will be counted only once. Default is min(dfmax*1.5, p).

the inital value of the penalty parameter of the augmented Lagrange method adopted in the outer loop. Default value is 1.

mu_ratio

the increasing ratio, with value at least 1, for u. Default value is 1.01. Inital values for scaled Lagrange multipliers are set as 0's. If mu_ratio < 1, the program automatically set u as 0 and outer_maxiter as 1, indicating that there is no linear constraints included.

tol

tolerance for the estimated coefficients to be considered as non-zero, i.e., if $abs(\beta_j)$ < tol, set $\beta_j$ as 0. Default value is 1e-10.

inner_maxiter, inner_eps

inner_maxiter is the maximun number of loops allowed in the coordinate descent; and inner_eps is the corresponding convergence tolerance.

outer_maxiter, outer_eps

outer_maxiter is the maximum number of loops allowed in the Augmented Lagrange method; and outer_eps is the corresponding convergence tolerance.

Value

An object with S3 calss "compCL" is a list containing:

beta

a matrix of coefficients for $p+p_c+1$ rows. If intercept=FALSE, then the last row of beta is set to 0's.

lam

the sequence of lam values used.

the number of non-zero $\beta_p$'s in estimated coefficients for Z at each value of lam.

npass

total iterations.

error

error messages. If 0, no error occurs.

call

the call that produces this object.

dim

dimension of the coefficient matrix beta.

Details

The log-contrast regression model with compositional predictors is expressed as $$y = Z\beta + e, s.t. \sum_{j=1}^{p}\beta_j=0,$$ where $Z$ is the n-by-p design matrix of log-transforemd compositional data, $\beta$ is the p-vector of regression cofficients, and $e$ is an n-vector of random errors. If zero(s) exists in the original compositional data, user should pre-process these zero(s).

To enable variable selection, we conduct model estimation via linearly constrained lasso $$ argmin_{\beta}(\frac{1}{2n}\|y-Z\beta\|_2^2 + \lambda\|\beta\|_1), s.t. \sum_{j=1}^{p}\beta_j= 0. $$

References

Lin, W., Shi, P., Peng, R. and Li, H. (2014) Variable selection in regression with compositional covariates, https://academic.oup.com/biomet/article/101/4/785/1775476. Biometrika 101 785-979

Examples

Run this code

# NOT RUN {
p = 30
n = 50
beta = c(1, -0.8, 0.6, 0, 0, -1.5, -0.5, 1.2)
beta = c(beta, rep(0, times = p - length(beta)))
Comp_data = comp_Model(n = n, p = p, beta = beta, intercept = FALSE)
m1 <- compCL(y = Comp_data$y, Z = Comp_data$X.comp,
             Zc = Comp_data$Zc, intercept = Comp_data$intercept)
print(m1)
plot(m1)
beta = coef(m1)
Test_data = comp_Model(n = 30, p = p, beta = Comp_data$beta, intercept = FALSE)
predmat = predict(m1, Znew = Test_data$X.comp, Zcnew = Test_data$Zc)

# }

Run the code above in your browser using DataLab