Learn R Programming

Compack (version 0.1.0)

GIC.FuncompCGL: Compute information crieteria for the FuncompCGL model.

Description

Tune the grid values of the penalty parameter codelam and the degrees of freedom of the basis function k in the FuncompCGL model by GIC, BIC, or AIC. This function calculates the GIC, BIC, or AIC curve and returns the optimal values of lam and k.

Usage

GIC.FuncompCGL(y, X, Zc = NULL, lam = NULL, nlam = 100, k = 4:10, ref = NULL,
              intercept = TRUE, W = rep(1,times = p - length(ref)),
              type = c("GIC", "BIC", "AIC"),
              mu_ratio = 1.01, outer_maxiter = 1e+6, ...)

Arguments

y

response vector with length n.

X

data frame or matrix.

  • If nrow(X) > \(n\), X should be a data frame or matrix of the functional compositional predictors with \(p\) columns for the values of the composition components, a column indicating subject ID and a column of observation times. Order of Subject ID should be the SAME as that of y. Zero entry is not allowed.

  • If nrow(X)[1]=\(n\), X is considered as after taken integration, a n*(k*p - length(ref)) matrix.

Zc

a \(n\times p_{c}\) design matrix of unpenalized variables. Default is NULL.

lam

a user supplied lambda sequence. If lam is provided as a scaler and nlam\(>1\), lam sequence is created starting from lam. To run a single value of lam, set nlam\(=1\). The program will sort user-defined lambda sequence in decreasing order.

nlam

the length of the lam sequence. Default is 100. No effect if lam is provided.

k

an integer vector specifying the degrees of freedom of the basis function.

ref

reference level (baseline), either an integer between \([1,p]\) or NULL. Default value is NULL.

  • If ref is set to be an integer between [1,p], the group lasso penalized log-contrast model (with log-ratios) is fitted with the ref-th component chosed as baseline.

  • If ref is set to be NULL, the linearly constrained group lasso penalized log-contrast model is fitted.

intercept

Boolean, specifying whether to include an intercept. Default is TRUE.

W

a vector of length p (the total number of groups), or a matrix with dimension \(p_{1} \times p_{1}\), where p1=(p - length(ref)) * k, or character specifying the function used to calculate weight matrix for each group.

  • a vector of penalization weights for the groups of coefficients. A zero weight implies no shrinkage.

  • a diagonal matrix with positive diagonal elements.

  • if character string of function name or an object of type function to compute the weights.

type

a character string specifying which crieterion to use. The choices include "GIC" (default), "BIC", and "AIC".

mu_ratio

the increasing ratio of the penalty parameter u. Default value is 1.01. Inital values for scaled Lagrange multipliers are set as 0's. If mu_ratio < 1, the program automatically set the initial penalty parameter u as 0 and outer_maxiter as 1, indicating that there is no linear constraint.

outer_maxiter

maximum number of loops allowed for the augmented Lanrange method.

other arguments that could be passed to FuncompCL.

Value

An object of S3 class "GIC.FuncompCGL" is returned, which is a list containing:

FuncompCGL.fit

a list of length length(k), with fitted FuncompCGL objects of different degrees of freedom of the basis function.

lam

the sequence of the penalty parameter lam.

GIC

a k by length(lam) matirx of GIC values.

lam.min

the optimal values of the degrees of freedom k and the penalty parameter lam.

MSE

a k by length(lam) matirx of mean squared errors.

Details

The FuncompCGL model estimation is conducted through minimizing the linearly constrained group lasso criterion $$ \frac{1}{2n}\|y - 1_n\beta_0 - Z_c\beta_c - Z\beta\|_2^2 + \lambda \sum_{j=1}^{p} \|\beta_j\|_2, s.t. \sum_{j=1}^{p} \beta_j = 0_k.$$

The tuning parameters can be selected by the generalized information crieterion (GIC), $$ GIC(\lambda,k) = \log{(\hat{\sigma}^2(\lambda,k))} + (s(\lambda, k) - 1)k \log{(max(p*k+p_c+1, n))} \log{(\log{n})}/n ,$$ where \(\hat{\sigma}^2(\lambda,k) = \|y - 1_n\hat{\beta_0}(\lambda, k) - Z_c\hat{\beta_c}(\lambda, k) - Z\hat{\beta}(\lambda, k) \|_{2}^{2}/n\) with \(\hat{\beta_0}(\lambda, k)\), \(\hat{\beta_c}(\lambda, k)\) and \(\hat{\beta}(\lambda, k)\) being the regularized estimators of the regression coefficients, and \(s(\lambda, k)\) is the number of nonzero coefficient groups in \(\hat{\beta}(\lambda, k)\).

References

Sun, Z., Xu, W., Cong, X., Li G. and Chen K. (2020) Log-contrast regression with functional compositional predictors: linking preterm infant's gut microbiome trajectories to neurobehavioral outcome, https://arxiv.org/abs/1808.02403 Annals of Applied Statistics.

Fan, Y., and Tang, C. Y. (2013) Tuning parameter selection in high dimensional penalized likelihood, https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/rssb.12001 Journal of the Royal Statistical Society. Series B 75 531-552.

See Also

FuncompCGL and cv.FuncompCGL, and predict, coef and plot methods for "GIC.FuncompCGL" object.

Examples

Run this code
# NOT RUN {
df_beta = 5
p = 30
beta_C_true = matrix(0, nrow = p, ncol = df_beta)
beta_C_true[1, ] <- c(-0.5, -0.5, -0.5 , -1, -1)
beta_C_true[2, ] <- c(0.8, 0.8,  0.7,  0.6,  0.6)
beta_C_true[3, ] <- c(-0.8, -0.8 , 0.4 , 1 , 1)
beta_C_true[4, ] <- c(0.5, 0.5, -0.6  ,-0.6, -0.6)
n_train = 50
n_test = 30
k_list <- c(4,5)
Data <- Fcomp_Model(n = n_train, p = p, m = 0, intercept = TRUE,
                    SNR = 4, sigma = 3, rho_X = 0.2, rho_T = 0.5,
                    df_beta = df_beta, n_T = 20, obs_spar = 1, theta.add = FALSE,
                    beta_C = as.vector(t(beta_C_true)))
arg_list <- as.list(Data$call)[-1]
arg_list$n <- n_test
Test <- do.call(Fcomp_Model, arg_list)

## GIC_cgl: Constrained group lasso
GIC_cgl <- GIC.FuncompCGL(y = Data$data$y, X = Data$data$Comp,
                          Zc = Data$data$Zc, intercept = Data$data$intercept,
                          k = k_list)
coef(GIC_cgl)
plot(GIC_cgl)
y_hat <- predict(GIC_cgl, Znew = Test$data$Comp, Zcnew = Test$data$Zc)
plot(Test$data$y, y_hat, xlab = "Observed response", ylab = "Predicted response")

## GIC_naive: ignoring the zero-sum constraints
## set mu_raio = 0 to identifying without linear constraints,
## no outer_loop for Lagrange augmented multiplier
GIC_naive <- GIC.FuncompCGL(y = Data$data$y, X = Data$data$Comp,
                            Zc = Data$data$Zc, intercept = Data$data$intercept,
                            k = k_list, mu_ratio = 0)
coef(GIC_naive)
plot(GIC_naive)
y_hat <- predict(GIC_naive, Znew = Test$data$Comp, Zcnew = Test$data$Zc)
plot(Test$data$y, y_hat, xlab = "Observed response", ylab = "Predicted response")

## GIC_base: random select a component as reference
## mu_ratio is set to 0 automatically once ref is set to a integer
ref <- sample(1:p, 1)
GIC_base <- GIC.FuncompCGL(y = Data$data$y, X = Data$data$Comp,
                            Zc = Data$data$Zc, intercept = Data$data$intercept,
                            k = k_list, ref = ref)
coef(GIC_base)
plot(GIC_base)
y_hat <- predict(GIC_base, Znew = Test$data$Comp, Zcnew = Test$data$Zc)
plot(Test$data$y, y_hat, xlab = "Observed response", ylab = "Predicted response")
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab