The main function for high-dimensional undirected graph estimation. Three graph estimation methods, including (1) Meinshausen-Buhlmann graph estimation (mb
) (2) graphical lasso (glasso
) (3) correlation thresholding graph estimation (ct
) and (4) tuning-insensitive graph estimation (tiger
), are available for data analysis.
huge(
x,
lambda = NULL,
nlambda = NULL,
lambda.min.ratio = NULL,
method = "mb",
scr = NULL,
scr.num = NULL,
cov.output = FALSE,
sym = "or",
verbose = TRUE
)
There are 2 options: (1) x
is an n
by d
data matrix (2) a d
by d
sample covariance matrix. The program automatically identifies the input matrix by checking the symmetry. (n
is the sample size and d
is the dimension).
A sequence of decreasing positive numbers to control the regularization when method = "mb"
, "glasso"
or "tiger"
, or the thresholding in method = "ct"
. Typical usage is to leave the input lambda = NULL
and have the program compute its own lambda
sequence based on nlambda
and lambda.min.ratio
. Users can also specify a sequence to override this. When method = "mb"
, "glasso"
or "tiger"
, use with care - it is better to supply a decreasing sequence values than a single (small) value.
The number of regularization/thresholding parameters. The default value is 30
for method = "ct"
and 10
for method = "mb"
, "glasso"
or "tiger"
.
If method = "mb"
, "glasso"
or "tiger"
, it is the smallest value for lambda
, as a fraction of the upperbound (MAX
) of the regularization/thresholding parameter which makes all estimates equal to 0
. The program can automatically generate lambda
as a sequence of length = nlambda
starting from MAX
to lambda.min.ratio*MAX
in log scale. If method = "ct"
, it is the largest sparsity level for estimated graphs. The program can automatically generate lambda
as a sequence of length = nlambda
, which makes the sparsity level of the graph path increases from 0
to lambda.min.ratio
evenly.The default value is 0.1
when method = "mb"
, "glasso"
or "tiger"
, and 0.05 method = "ct"
.
Graph estimation methods with 4 options: "mb"
, "ct"
, "glasso"
and "tiger"
. The default value is "mb"
.
If scr = TRUE
, the lossy screening rule is applied to preselect the neighborhood before the graph estimation. The default value is FALSE
. NOT applicable when method = "ct"
, "mb", or "tiger".
The neighborhood size after the lossy screening rule (the number of remaining neighbors per node). ONLY applicable when scr = TRUE
. The default value is n-1
. An alternative value is n/log(n)
. ONLY applicable when scr = TRUE
and method = "mb"
.
If cov.output = TRUE
, the output will include a path of estimated covariance matrices. ONLY applicable when method = "glasso"
. Since the estimated covariance matrices are generally not sparse, please use it with care, or it may take much memory under high-dimensional setting. The default value is FALSE
.
Symmetrize the output graphs. If sym = "and"
, the edge between node i
and node j
is selected ONLY when both node i
and node j
are selected as neighbors for each other. If sym = "or"
, the edge is selected when either node i
or node j
is selected as the neighbor for each other. The default value is "or"
. ONLY applicable when method = "mb"
or "tiger".
If verbose = FALSE
, tracing information printing is disabled. The default value is TRUE
.
An object with S3 class "huge"
is returned:
The n
by d
data matrix or d
by d
sample covariance matrix from the input
An indicator of the sample covariance.
The scr.num
by k
matrix with each column corresponding to a variable in ind.group
and contains the indices of the remaining neighbors after the GSS. ONLY applicable when scr = TRUE
and approx = FALSE
The sequence of regularization parameters used in mb or thresholding parameters in ct.
The sym
from the input. ONLY applicable when method = "mb"
or "tiger"
.
The scr
from the input. ONLY applicable when method = "mb"
or "glasso"
.
A list of k
by k
adjacency matrices of estimated graphs as a graph path corresponding to lambda
.
The sparsity levels of the graph path.
A list of d
by d
precision matrices as an alternative graph path (numerical path) corresponding to lambda
. ONLY applicable when method = "glasso"
or "tiger"
.
A list of d
by d
estimated covariance matrices corresponding to lambda
. ONLY applicable when cov.output = TRUE
and method = "glasso"
The method used in the graph estimation stage.
If method = "mb"
or "tiger"
, it is a k
by nlambda
matrix. Each row contains the number of nonzero coefficients along the lasso solution path. If method = "glasso"
, it is a nlambda
dimensional vector containing the number of nonzero coefficients along the graph path icov
.
A nlambda
dimensional vector containing the likelihood scores along the graph path (icov
). ONLY applicable when method = "glasso"
. For an estimated inverse covariance Z, the program only calculates log(det(Z)) - trace(SZ) where S is the empirical covariance matrix. For the likelihood for n observations, please multiply by n/2.
The graph structure is estimated by Meinshausen-Buhlmann graph estimation or the graphical lasso, and both methods can be further accelerated via the lossy screening rule by preselecting the neighborhood of each variable by correlation thresholding. We target on high-dimensional data analysis usually d >> n, and the computation is memory-optimized using the sparse matrix output. We also provide a highly computationally efficient approaches correlation thresholding graph estimation.
huge.generator
, huge.select
, huge.plot
, huge.roc
, and huge-package
.
# NOT RUN {
#generate data
L = huge.generator(n = 50, d = 12, graph = "hub", g = 4)
#graph path estimation using mb
out1 = huge(L$data)
out1
plot(out1) #Not aligned
plot(out1, align = TRUE) #Aligned
huge.plot(out1$path[[3]])
#graph path estimation using the sample covariance matrix as the input.
#out1 = huge(cor(L$data), method = "glasso")
#out1
#plot(out1) #Not aligned
#plot(out1, align = TRUE) #Aligned
#huge.plot(out1$path[[3]])
#graph path estimation using ct
#out2 = huge(L$data,method = "ct")
#out2
#plot(out2)
#graph path estimation using glasso
#out3 = huge(L$data, method = "glasso")
#out3
#plot(out3)
#graph path estimation using tiger
#out4 = huge(L$data, method = "tiger")
#out4
#plot(out4)
# }
Run the code above in your browser using DataLab