Fit regression with compositional predictors via penalized log-contrast model which was proposed by Lin et al. (2014) <doi:10.1093/biomet/asu031>.
The model estimation is conducted by minimizing a linearly constrained lasso criterion. The regularization paths are
computed at a grid of tuning parameter lambda
.
compCL(y, Z, Zc = NULL, intercept = TRUE,
lam = NULL, nlam = 100, lambda.factor = ifelse(n < p, 0.05, 0.001),
pf = rep(1, times = p), dfmax = p, pfmax = min(dfmax * 1.5, p),
u = 1, mu_ratio = 1.01, tol = 1e-10,
inner_maxiter = 1e+4, inner_eps = 1e-6,
outer_maxiter = 1e+08, outer_eps = 1e-8)
a response vector with length n.
a \(n \times p\) design matrix of compositional data or categorical data.
If Z
is categorical data, i.e., row-sums of Z
differ from 1, the program automatically transforms
Z
into compositional data by dividing each row by its sum.
Z
could NOT include entry of 0's.
a \(n*p_c\) design matrix of control variables (not penalized). Default is NULL
.
Boolean, specifying whether to include an intercept.
Default is FALSE
.
a user supplied lambda sequence.
If lam
is provided as a scaler and nlam
\(>1\), lam
sequence is created starting from
lam
. To run a single value of lam
, set nlam
\(=1\).
The program will sort user-defined lambda
sequence in decreasing order.
the length of the lam
sequence. Default is 100. No effect if lam
is
provided.
the factor for getting the minimal lambda in the lam
sequence,
where min(lam)
= lambda.factor
* max(lam)
.
max(lam)
is the smallest value of lam
for which all penalized coefficients become zero.
If \(n >= p\), the default is 0.001
. If \(n < p\), the default is 0.05
.
penalty factor, a vector of length p. Zero implies no shrinkage. Default value for each entry is 1.
limit the maximum number of groups in the model. Useful for handling very large \(p\), if a partial path is desired. Default is \(p\).
limit the maximum number of groups ever to be nonzero. For example once a group enters the model along the path,
no matter how many times it re-enters the model through the path, it will be counted only once.
Default is min(dfmax*1.5, p)
.
the inital value of the penalty parameter of the augmented Lagrange method adopted in the outer loop. Default value is 1.
the increasing ratio, with value at least 1, for u
. Default value is 1.01.
Inital values for scaled Lagrange multipliers are set as 0's.
If mu_ratio
< 1, the program automatically set u
as 0 and outer_maxiter
as 1, indicating
that there is no linear constraints included.
tolerance for the estimated coefficients to be considered as non-zero, i.e., if \(abs(\beta_j)\) < tol
, set \(\beta_j\) as 0.
Default value is 1e-10.
inner_maxiter
is the maximun number of loops allowed in the coordinate descent;
and inner_eps
is the corresponding convergence tolerance.
outer_maxiter
is the maximum number of loops allowed in the Augmented Lagrange method;
and outer_eps
is the corresponding convergence tolerance.
An object with S3 calss "compCL"
is a list containing:
a matrix of coefficients for \(p+p_c+1\) rows.
If intercept=FALSE
, then the last row of beta
is set to 0's.
the sequence of lam
values used.
the number of non-zero \(\beta_p\)'s in estimated coefficients for Z
at each value of lam
.
total iterations.
error messages. If 0, no error occurs.
the call that produces this object.
dimension of the coefficient matrix beta
.
The log-contrast regression model with compositional predictors is expressed as $$y = Z\beta + e, s.t. \sum_{j=1}^{p}\beta_j=0,$$ where \(Z\) is the n-by-p design matrix of log-transforemd compositional data, \(\beta\) is the p-vector of regression cofficients, and \(e\) is an n-vector of random errors. If zero(s) exists in the original compositional data, user should pre-process these zero(s).
To enable variable selection, we conduct model estimation via linearly constrained lasso $$ argmin_{\beta}(\frac{1}{2n}\|y-Z\beta\|_2^2 + \lambda\|\beta\|_1), s.t. \sum_{j=1}^{p}\beta_j= 0. $$
Lin, W., Shi, P., Peng, R. and Li, H. (2014) Variable selection in regression with compositional covariates, https://academic.oup.com/biomet/article/101/4/785/1775476. Biometrika 101 785-979
coef
, predict
,
print
and plot
methods
for "compCL"
object
and cv.compCL
and GIC.compCL
.
# NOT RUN {
p = 30
n = 50
beta = c(1, -0.8, 0.6, 0, 0, -1.5, -0.5, 1.2)
beta = c(beta, rep(0, times = p - length(beta)))
Comp_data = comp_Model(n = n, p = p, beta = beta, intercept = FALSE)
m1 <- compCL(y = Comp_data$y, Z = Comp_data$X.comp,
Zc = Comp_data$Zc, intercept = Comp_data$intercept)
print(m1)
plot(m1)
beta = coef(m1)
Test_data = comp_Model(n = 30, p = p, beta = Comp_data$beta, intercept = FALSE)
predmat = predict(m1, Znew = Test_data$X.comp, Zcnew = Test_data$Zc)
# }
Run the code above in your browser using DataLab