glasso: Lasso Estimator for Gaussian Graphical Models

Description

‘glasso’ fits the l1-penalized Gaussian graphical model.

Usage

glasso(X, weights, pendiag = FALSE, nrho = 50L, rho.min.ratio, rho, maxR2,
       maxit = 1.0e+4, thr = 1.0e-04, trace = 0L)

Arguments

the $(n\times p)$-dimensional matrix used to compute the covariance matrix.

weights

an optional symmetric matrix of non-negative weights. This matrix can be used to specify the unpenalized partial correlation coefficients (‘weights[i, j] = 0’) or the structural zeros in the precision matrix (‘weights[i, j] = +Inf’). See below for an example. By default, glasso model is fitted without weights.

pendiag

flag used to specify if the diagonal elements of the concentration matrix are penalized (‘pendiag = TRUE’) or unpenalized (‘pendiag = FALSE’).

nrho

the integer specifying the number of tuning parameters used to fit the glasso model. Default is ‘nrho = 50L’.

rho.min.ratio

the smallest value for the tuning parameter $\rho$, as a fraction of the smallest tuning parameter for which all the estimated partial correlation coefficients are zero. The default depends on the sample size ‘$n$’ relative to the number of variables ‘$p$’. If ‘$p < n$’, the default is ‘1.0E-4’ otherwise the value ‘1.0E-2’ is used as default. A very small value of ‘rho.min.ratio’ will lead to a saturated fitted model in the ‘$p < n$’ case.

rho

optional argument. A user supplied rho sequence. WARNING: avoid supplying a single value for the tuning parameter; supply instead a decreasing sequence of $\rho$-values.

maxR2

a value belonging to the interval $[0, 1]$ specifying the largest value of the pseudo R-squared measure (see Section Details). The regularization path is stopped when $R^2$ exceeds ‘maxR2’. The default depends on the sample size ‘$n$’ relative to the number of variables ‘$p$’. If ‘$p < n$’, the default is ‘1’ otherwise the value ‘0.9’ is used as default.

maxit

maximum number of iterations of the glasso algorithm. Default is 1.0E+4.

thr

threshold for the convergence of the glasso algorithm. Default is 1.0E-4.

trace

integer for printing out information as iterations proceed: trace = 0 no information is printed out on video; trace = 1 basic information is printed out on video; trace = 2 detailed information is printed out on video.

Value

‘glasso’ returns an object with S3 class “glasso”, i.e. a list containing the following components:

call

the call that produced this object.

the matrix used to compute the covariance matrix.

the covariance matrix used to fit the glasso model.

weights

the used weights.

pendiag

the flag specifying if the diagonal elements of the precisione matrix are penalized.

nrho

the number of fitted glasso model.

rho.min.ratio

the scale factor used to compute the smallest value of the tuning parameter.

rho

the $p$-dimensional vector reporting the values of the tuning parameter used to fit the glasso model.

maxR2

the threshold value used for the pseudo R-squared measure.

maxit

the maximum number of iterations of the glasso algorithm.

thr

the threshold for the convergence of the glasso algorithm.

Sgm

an array of dimension $(p\times p\times\texttt{nrho})$. Sgm[, , k] is the estimate of the covariance matrix of the glasso model fitted using rho[k].

Tht

an array of dimension $(p\times p\times\texttt{nrho})$. Tht[, , k] is the estimate of the precision matrix of the glasso model fitted using rho[k].

Adj

an array of dimension $(p\times p\times\texttt{nrho})$. Adj[, , k] is the adjacency matrix associated to Tht[, , k], i.e. Adj[i, j, k] $= 1$ iff Tht[i, j, k] $\neq 0$ and 0 otherwise.

the $nrho$-dimensional vector reporting the number of non-zero partial correlation coefficients.

the $nrho$-dimensional vector reporting the values of the measure $R^2$ described in the section Details.

ncomp

the $nrho$-dimensional vector reporting the number of connected components (for internal purposes only).

the $(p\times\texttt{nrho})$-dimensional matrix encoding the connected components (for internal purposes only).

the $(p\times\texttt{nrho})$-dimensional matrix reporting the number of vertices per connected component (for internal purposes only).

nit

the $p$-dimensional vector reporting the number of iterations.

conv

a description of the error that has occurred.

trace

the integer used for printing out information.

Details

For a fixed value of the tuning parameter, glasso solves the following maximization problem: $$\max_{\Theta}\log det\Theta - tr\{S\Theta\}-\rho \sum_{h,k}w_{hk}|\theta_{hk}|,$$ where $w_{hk}$ is the non-negative weight for $\theta_{hk}$. The previous maximization problem is solved effeciently combining the block-coordinate descent algorithm (Friedman and others, 2008) with the screening rule proposed in Witten and others (2011).

In order to avoid the overfitting of the model, we use the following pseudo R-squared measure: $$R^2 = 1 - \frac{\|S - \hat{\Sigma}^\rho\|_F}{\|S - \hat{\Sigma}^{\rho_{\max}}\|_F},$$ where $\|\cdot\|_F$ denotes the Frobenius norm and $\rho_{\max}$ denotes the smallest value of the tuning parameter for which all the estimated partial correlation coefficients are zero. By straightforward algebra, it is easy to show that the proposed pseudo R-squared belongs to the closed interval $[0, 1]$: $R^2 = 0$ when the tuning parameter is equal to $\rho_{\max}$ and $R^2 = 1$ when $\rho = 0$. The regularization path is stopped when $R^2$ exceeds ‘maxR2’.

References

Friedman, J.H., Hastie, T., and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432--441.

Witten, D.M., Friedman, J.H, and Simon, N. (2011). New insights and faster computations for the graphical lasso. Journal of Computational and Graphical Statistics 20, 892--900.

Examples

Run this code

# NOT RUN {
library("cglasso")
set.seed(123)

p <- 5L
n <- 100L
mu <- rep(0L, p)
Tht <- diag(p)
diag(Tht[-1L, -p]) <- diag(Tht[-p, -1L]) <- 0.3
Sgm <- solve(Tht)
X <- MASS::mvrnorm(n = n, mu = mu, Sigma = Sgm)
out <- glasso(X)
out

# in this example we use the argument 'weights' to specify
# the unpenalized partial correlation coefficients and the
# structural zeros in the precision matrix

w <- rep(1, p * p)
dim(w) <- c(p, p)

# specifing the unpenalized partial correlation coefficients
diag(w) <- diag(w[-1L, -p]) <- diag(w[-p, -1L]) <- 0

# specifing the structural zero
w[1L, 4L:5L] <- w[4L:5L, 1L] <- +Inf
w[2L, 5L] <- w[5L, 2L] <- +Inf
w

out <- glasso(X = X, weights = w)

# checking structural zeros
out$Tht[, , out$nrho][w == +Inf]

# checking stationarity conditions of the MLE estimators
# (the unpenalized partial correlation coefficients)
(out$Sgm[, , out$nrho] - out$S)[w == 0L]
# }

Run the code above in your browser using DataLab