Fit the penalized log-contrast regression with functional compositional predictors
proposed by Zhe et al. (2020) <arXiv:1808.02403>. The model estimation is
conducted by minimizing a linearly constrained group lasso criterion. The regularization
paths are computed for the group lasso penalty at grid values of the regularization
parameter lam
and the degree of freedom of the basis function K
.
FuncompCGL(y, X, Zc = NULL, intercept = TRUE, ref = NULL,
k, degree = 3, basis_fun = c("bs", "OBasis", "fourier"),
insert = c("FALSE", "X", "basis"), method = c("trapezoidal", "step"),
interval = c("Original", "Standard"), Trange,
T.name = "TIME", ID.name = "Subject_ID",
W = rep(1,times = p - length(ref)),
dfmax = p - length(ref), pfmax = min(dfmax * 1.5, p - length(ref)),
lam = NULL, nlam = 100, lambda.factor = ifelse(n < p1, 0.05, 0.001),
tol = 1e-8, mu_ratio = 1.01,
outer_maxiter = 1e+6, outer_eps = 1e-8,
inner_maxiter = 1e+4, inner_eps = 1e-8)
response vector with length n.
data frame or matrix.
If nrow(X)
> \(n\),
X
should be a data frame or matrix of the functional compositional
predictors with \(p\) columns for the values of the composition components,
a column indicating subject ID and a column of observation times.
Order of Subject ID should be the SAME as that of y
.
Zero entry is not allowed.
If nrow(X)[1]
=\(n\),
X
is considered as after taken integration, a
n*(k*p - length(ref))
matrix.
a \(n\times p_{c}\) design matrix of unpenalized variables. Default is NULL.
Boolean, specifying whether to include an intercept. Default is TRUE.
reference level (baseline), either an integer between \([1,p]\) or NULL
.
Default value is NULL
.
If ref
is set to be an integer between [1,p]
, the group lasso penalized log-contrast model (with log-ratios) is fitted
with the ref
-th component chosed as baseline.
If ref
is set to be NULL
, the linearly constrained group lasso penalized log-contrast model is fitted.
an integer, degrees of freedom of the basis function.
degrees of freedom of the basis function. Default value is 3.
method to generate basis:
"bs"
(Default) B-splines. See fucntion bs
.
"OBasis"
Orthoganal B-splines. See function OBasis
and package orthogonalsplinebasis.
"fourier"
Fourier basis. See fucntion create.fourier.basis
and package fda.
a character string sepcifying method to perform functional interpolation.
"FALSE"
(Default) no interpolation.
"X"
linear interpolation of functional compositional
data along the time grid.
"basis"
the functional compositional data is interplolated
as a step function along the time grid.
If insert
= "X"
or "basis"
, interplolation is conducted
on sseq
, where sseq
is the sorted sequence of all the observed time points.
a character string sepcifying method used to approximate integral.
"trapezoidal"
(Default) Sum up areas under the trapezoids.
"step"
Sum up area under the rectangles.
a character string sepcifying the domain of the integral.
"Original"
(Default) On the original time scale,
interval
= range(Time)
.
"Standard"
Time points are mapped onto \([0,1]\),
interval
= c(0,1)
.
range of time points
a character string specifying names of the time variable and the Subject
ID variable in X
.
This is only needed when X is a data frame or matrix of the functional
compositional predictors.
Default are "TIME"
and "Subject_ID"
.
a vector of length p (the total number of groups),
or a matrix with dimension \(p_{1} \times p_{1}\), where
p1=(p - length(ref)) * k
,
or character specifying the function used to calculate weight matrix for each group.
a vector of penalization weights for the groups of coefficients. A zero weight implies no shrinkage.
a diagonal matrix with positive diagonal elements.
if character string of function name or an object of type function
to compute the weights.
limit the maximum number of groups in the model. Useful for handling very large \(p\), if a partial path is desired. Default is \(p\).
limit the maximum number of groups ever to be nonzero. For example once a group enters the model along the path,
no matter how many times it re-enters the model through the path, it will be counted only once.
Default is min(dfmax*1.5, p)
.
a user supplied lambda sequence.
If lam
is provided as a scaler and nlam
\(>1\), lam
sequence is created starting from
lam
. To run a single value of lam
, set nlam
\(=1\).
The program will sort user-defined lambda
sequence in decreasing order.
the length of the lam
sequence. Default is 100. No effect if lam
is
provided.
the factor for getting the minimal lambda in lam
sequence,
where min(lam)
= lambda.factor
* max(lam)
.
max(lam)
is the smallest value of lam
for which all penalized group are 0's.
If \(n >= p1\), the default is 0.001
. If \(n < p1\), the default is 0.05
.
tolerance for coefficient to be considered as non-zero. Once the convergence criterion is satisfied, for each element \(\beta_j\) in coefficient vector \(\beta\), \(\beta_j = 0\) if \(\beta_j < tol\).
the increasing ratio of the penalty parameter u
. Default value is 1.01.
Inital values for scaled Lagrange multipliers are set as 0's.
If mu_ratio
< 1, the program automatically set
the initial penalty parameter u
as 0
and outer_maxiter
as 1, indicating
that there is no linear constraint.
outer_maxiter
is the maximum number of loops allowed for the augmented Lanrange method;
and outer_eps
is the corresponding convergence tolerance.
inner_maxiter
is the maximum number of loops allowed for blockwise-GMD;
and inner_eps
is the corresponding convergence tolerance.
An object with S3 class "FuncompCGL"
, which is a list containing:
the integral matrix for the functional compositional predictors with dimension \(n \times (pk)\).
the sequence of lam
values.
the number of non-zero groups in the estimated coefficients for
the functional compositional predictors at each value of lam
.
a matrix of coefficients with length(lam)
columns and
\(p_{1}+p_{c}+1\) rows, where p_1=p*k
.
The first \(p_{1}\) rows are the estimated values for
the coefficients for the functional compositional preditors, and the
last row is for the intercept. If intercept = FALSE
, the last row is 0's.
dimension of the coefficient matrix.
sequence of the time points.
the call that produces this object.
The functional log-contrast regression model
for compositional predictors is defined as
$$
y = 1_n\beta_0 + Z_c\beta_c + \int_T Z(t)\beta(t)dt + e, s.t. (1_p)^T \beta(t)=0 \forall t \in T,
$$
where \(\beta_0\) is the intercept,
\(\beta_c\) is the regression coefficient vector with length \(p_c\) corresponding to the control variables,
\(\beta(t)\) is the functional regression coefficient vector with length \(p\) as a funtion of \(t\)
and \(e\) is the random error vector with zero mean with length \(n\).
Moreover, Z(t)
is the log-transformed functional compostional data.
If zero(s) exists in the original functional compositional data, user should pre-process these zero(s).
For example, if count data provided, user could replace 0's with 0.5.
After adopting a truncated basis expansion approach to re-express \(\beta(t)\) $$\beta(t) = B \Phi(t),$$ where \(B\) is a p-by-k unkown but fixed coefficient matrix, and \(\Phi(t)\) consists of basis with degree of freedom \(k\). We could write functional log-contrast regression model as $$ y = 1_n\beta_0 + Z_c\beta_c + Z\beta + e, s.t. \sum_{j=1}^{p}\beta_j=0_k, $$ where \(Z\) is a n-by-pk matrix corresponding to the integral, \(\beta=vec(B^T)\) is a pk-vector with every each k-subvector corresponding to the coefficient vector for the \(j\)-th compositional component.
To enable variable selection, FuncompCGL
model is estimated via linearly constrained group lasso,
$$
argmin_{\beta_0, \beta_c, \beta}(\frac{1}{2n}\|y - 1_n\beta_0 - Z_c\beta_c - Z\beta\|_2^2
+ \lambda \sum_{j=1}^{p} \|\beta_j\|_2), s.t. \sum_{j=1}^{p} \beta_j = 0_k.
$$
Sun, Z., Xu, W., Cong, X., Li G. and Chen K. (2020) Log-contrast regression with functional compositional predictors: linking preterm infant's gut microbiome trajectories to neurobehavioral outcome, https://arxiv.org/abs/1808.02403 Annals of Applied Statistics.
Yang, Y. and Zou, H. (2015) A fast unified algorithm for computing group-lasso penalized learning problems, https://link.springer.com/article/10.1007/s11222-014-9498-5 Statistics and Computing 25(6) 1129-1141.
Aitchison, J. and Bacon-Shone, J. (1984) Log-contrast models for experiments with mixtures, Biometrika 71 323-330.
cv.FuncompCGL
and GIC.FuncompCGL
, and
predict
, coef
,
plot
and print
methods for "FuncompCGL"
object.
# NOT RUN {
df_beta = 5
p = 30
beta_C_true = matrix(0, nrow = p, ncol = df_beta)
beta_C_true[1, ] <- c(-0.5, -0.5, -0.5 , -1, -1)
beta_C_true[2, ] <- c(0.8, 0.8, 0.7, 0.6, 0.6)
beta_C_true[3, ] <- c(-0.8, -0.8 , 0.4 , 1 , 1)
beta_C_true[4, ] <- c(0.5, 0.5, -0.6 ,-0.6, -0.6)
Data <- Fcomp_Model(n = 50, p = p, m = 0, intercept = TRUE,
SNR = 4, sigma = 3, rho_X = 0, rho_T = 0.6, df_beta = df_beta,
n_T = 20, obs_spar = 1, theta.add = FALSE,
beta_C = as.vector(t(beta_C_true)))
m1 <- FuncompCGL(y = Data$data$y, X = Data$data$Comp, Zc = Data$data$Zc,
intercept = Data$data$intercept, k = df_beta, tol = 1e-10)
print(m1)
plot(m1)
beta <- coef(m1)
arg_list <- as.list(Data$call)[-1]
arg_list$n <- 30
TEST <- do.call(Fcomp_Model, arg_list)
y_hat <- predict(m1, Znew = TEST$data$Comp, Zcnew = TEST$data$Zc)
plot(y_hat[, floor(length(m1$lam)/2)], TEST$data$y,
ylab = "Observed Response", xlab = "Predicted Response")
beta <- coef(m1, s = m1$lam[20])
beta_C <- matrix(beta[1:(p*df_beta)], nrow = p, byrow = TRUE)
colSums(beta_C)
Non.zero <- (1:p)[apply(beta_C, 1, function(x) max(abs(x)) > 0)]
Non.zero
# }
Run the code above in your browser using DataLab