‘cglasso
’ function is used to fit an l1-penalized censored Gaussian graphical model.
cglasso(X, lo, up, weights, pendiag = FALSE, nrho = 50L, rho.min.ratio,
rho, maxR2, maxit_em = 1.0e+3, thr_em = 1.0e-4, maxit_bcd = 1.0e+4,
thr_bcd = 1.0e-4, trace = 0L)
optional argument. If the argument ‘X
’ is a matrix then ‘lo
’ is used to create an object with class ‘datacggm
’.
optional argument. If the argument ‘X
’ is a matrix then ‘up
’ is used to create an object with class ‘datacggm
’.
an optional symmetric matrix of non-negative weights. This matrix can be used to specify the unpenalized partial correlation coefficients (‘weights[i, j] = 0
’) or the structural zeros in the precision matrix (‘weights[i, j] = +Inf
’). See below for an example. By default, cglasso model is fitted without weights.
flag used to specify if the diagonal elements of the concentration matrix are penalized (‘pendiag = TRUE
’) or unpenalized (‘pendiag = FALSE
’).
the integer specifying the number of tuning parameters used to fit the cglasso model. Default is ‘nrho = 50
’.
the smallest value for the tuning parameter \(\rho\), as a fraction of the smallest tuning parameter for which all the estimated partial correlation coefficients are zero. The default depends on the sample size ‘\(n\)’ relative to the number of variables ‘\(p\)’. If ‘\(p < n\)’, the default is ‘1.0E-4’ otherwise the value ‘1.0E-2’ is used as default. A very small value of ‘rho.min.ratio
’ will lead to a saturated fitted model in the ‘\(p < n\)’ case.
optional argument. A user supplied rho
sequence. WARNING: avoid supplying a single value for the tuning parameter; supply instead a decreasing sequence of \(\rho\)-values.
a value belonging to the interval \([0, 1]\) specifying the largest value of the pseudo R-squared measure (see Section Details). The regularization path is stopped when \(R^2\) exceeds ‘maxR2
’. Default depends on the sample size ‘\(n\)’ relative to the number of variables ‘\(p\)’. If ‘\(p < n\)’, the default is ‘1’ otherwise the value ‘0.9’ is used as default.
maximum number of iterations of the EM algorithm. Default is 1.0E+3
.
threshold for the convergence of the EM algorithm. Default value is 1.0E-4
.
maximum number of iterations of the glasso algorithm. Default is 1.0E+4
.
threshold for the convergence of the glasso algorithm. Default is 1.0E-4
.
integer for printing out information as iterations proceed: trace = 0
no information is printed out on video; trace = 1
basic information is printed out on video; trace = 2
detailed information is printed out on video.
cglasso
returns an object with S3 class “cglasso
”, i.e., a list containing the
following components:
the call that produced this object.
the object with S3 class ‘datacggm
’ used to fit the cglasso model.
the weights used to fit the cglasso model.
flag used to specify if the diagonal elements of the concentration matrix are penalized.
the \(p\)-dimensional vector reporting the estimates of the marginal expected values under the assumption that the precision matrix is diagonal.
the \(p\)-dimensional vector reporting the estimates of the marginal variances under the assumption that the precision matrix is diagonal.
the number of fitted cglasso model.
the scale factor used to compute the smallest value of the tuning parameter.
the \(p\)-dimensional vector reporting the values of the tuning parameter used to fit the cglasso model.
the threshold value used to stop the regularization path.
the maximum number of iterations of the EM algorithm.
the threshold for the convergence of the EM algorithm.
the maximum number of iterations of the glasso algorithm.
the threshold for the convergence of the glasso algorithm.
an array of dimension \(n\times p\times\texttt{nrho}\). Xipt[, , k]
is the matrix where the censored values are replaced with the conditional expected values computed in the E-step of the algorithm describted in section Details.
an array of dimension \(p\times p\times\texttt{nrho}\). S[, , k]
is the matrix \(S^\rho\) used to fit the glasso model in the M-step of the algorithm describted in section Details.
a matrix of dimension \(p\times\texttt{nrho}\). The \(k\)th column is the estimate of the expected values of the cglasso model fitted using rho[k]
.
an array of dimension \(p\times p\times\texttt{nrho}\). Sgm[, , k]
is the estimate of the covariance matrix of the cglasso model fitted using rho[k]
.
an array of dimension \(p\times p\times\texttt{nrho}\). Tht[, , k]
is the estimate of the precision matrix of the cglasso model fitted using rho[k]
.
an array of dimension \(p\times p\times\texttt{nrho}\). Adj[, , k]
is the adjacency matrix associated to Tht[, , k]
, i.e. Adj[i, j, k]
\(= 1\) iff Tht[i, j, k]
\(\neq 0\) and 0
otherwise.
the \(nrho\)-dimensional vector reporting the number of non-zero partial correlation coefficients.
the \(nrho\)-dimensional vector reporting the values of the measure \(R^2\) described in section Details.
the \(nrho\)-dimensional vector reporting the number of connected components (for internal purposes only).
the \((p\times\texttt{nrho})\)-dimensional matrix encoding the connected components (for internal purposes only).
the \((p\times\texttt{nrho})\)-dimensional matrix reporting the number of vertices per connected component (for internal purposes only).
the \((\texttt{nrho}\times 2)\)-dimensional matrix reporting the number of iterations.
a description of the error that has occurred.
the name of the Fortran subroutine where the error has occurred (for internal debug only).
the integer used for printing out information.
The censored graphical lasso (cglasso) estimator (Augugliaro and other, 2018) is an extension of the classical graphical lasso (glasso) estimator (Yuan and other, 2007) developed to fit a sparse censored Gaussian graphical model (see Section 2 in Augugliaro and other (2018) for a formal definition).
cglasso
function fits the model using the following EM-like algorithm:
Step | Description |
1. | Let \(\{\hat{\mu}^\rho_{ini};\hat{\Theta}^\rho_{ini}\}\) be initial estimates; |
2. | E-step |
use the moments of the truncated normal distribution to compute the current estimates of the | |
marginal means, denoted by \(\bar{x}^\rho\), and to complete the empirical covariance matrix \(S^\rho\); | |
3. | M-step |
let \(\hat{\mu}^\rho = \bar{x}^\rho\); | |
compute \(\hat{\Theta}^\rho\) using \(S^\rho\) and the glasso algorithm (Friedman and other, 2008); |
In order to reduce the computational burdern of the algorithm, in Step 2. the matrix \(S^\rho\) is approximated using the method proposed in Guo and others (2015).
In order to avoid the overfitting of the model, we use the following pseudo R-squared measure: $$R^2 = 1 - \frac{\|S^\rho - \hat{\Sigma}^\rho\|_F}{\|S^{\rho_{\max}} - \hat{\Sigma}^{\rho_{\max}}\|_F},$$ where \(\|\cdot\|_F\) denotes the Frobenius norm and \(\rho_{\max}\) denotes the smallest value of the tuning parameter for which all the estimated partial correlation coefficients are zero. By straightforward algebra, it is easy to show that the proposed pseudo R-squared belongs to the closed interval \([0, 1]\): \(R^2 = 0\) when the tuning parameter is equal to \(\rho_{\max}\) and \(R^2 = 1\) when \(\rho = 0\). The regularization path is stopped when \(R^2\) exceeds the threshold specify by ‘maxR2
’.
Augugliaro, L., Abbruzzo, A. and Vinciotti, V. (2018). \(\ell_1\)-Penalized gaussian graphical model. Biostatistics (to appear).
Friedman, J.H., Hastie, T., and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432--441.
Guo, J., Levina, E., Michailidis, G. and Zhu, J. (2015). Graphical models for ordinal data. Journal of Computational and Graphical Statistics 24, 183--204.
Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika 94, 19--35.
datacggm
, glasso
, to_graph
, mle
and the method functions summary
, coef
, plot
, aic
, bic
, ebic
.
# NOT RUN {
library("cglasso")
set.seed(123)
p <- 5L
n <- 100L
mu <- rep(0L, p)
Tht <- diag(p)
diag(Tht[-1L, -p]) <- diag(Tht[-p, -1L]) <- 0.3
Sgm <- solve(Tht)
X <- rdatacggm(n = n, mu = mu, Sigma = Sgm, probr = 0.05)
out <- cglasso(X = X)
out
# in this example we use the argument 'weights' to specify
# the unpenalized partial correlation coefficients and the
# structural zeros in the precision matrix
w <- rep(1L, p * p)
dim(w) <- c(p, p)
# specifing the unpenalized partial correlation coefficients
diag(w) <- diag(w[-1L, -p]) <- diag(w[-p, -1L]) <- 0L
# specifing the structural zeros
w[1L, 4L:5L] <- w[4L:5L, 1L] <- +Inf
w[2L, 5L] <- w[5L, 2L] <- +Inf
w
out <- cglasso(X = X, weights = w)
# checking structural zeros
out$Tht[, , out$nrho][w == +Inf]
# checking stationarity conditions of the MLE estimators
# (the unpenalized partial correlation coefficients)
(out$Sgm[, , out$nrho] - out$S[, , out$nrho])[w == 0]
# }
Run the code above in your browser using DataLab