sbmspcov: Bayesian Sparse Covariance Estimation using Sure Screening

Description

Provides a Bayesian sparse and positive definite estimate of a covariance matrix via screened beta-mixture prior.

Usage

sbmspcov(X, Sigma, cutoff = list(), prior = list(), nsample = list())

Value

Sigma: a nmc $\times$ p(p+1)/2 matrix including lower triangular elements of covariance matrix.
p: dimension of covariance matrix.
Phi: a nmc $\times$ p(p+1)/2 matrix including shrinkage parameters corresponding to lower triangular elements of covariance matrix.
INDzero: a list including indices of off-diagonal elements screened by sure screening.
cutoff: the cutoff value specified by FNR-approach.

Arguments

X: a n $\times$ p data matrix with column mean zero.
Sigma: an initial guess for Sigma.
cutoff: a list giving the information for the threshold. The list includes the following parameters (with default values in parentheses): method ('FNR') giving the method for determining the threshold value (method='FNR' uses the false negative rate (FNR)-based approach, method='corr' chooses the threshold value by sample correlations), rho a lower bound of meaningfully large correlations whose the defaults values are 0.25 and 0.2 for method = 'FNR' and method = 'corr', respectively. Note. If method='corr', rho is used as the threshold value. FNR (0.05) giving the prespecified FNR level when method = 'FNR'. nsimdata (1000) giving the number of simulated datasets for calculating Jeffreys’ default Bayes factors when method = 'FNR'.
prior: a list giving the prior information. The list includes the following parameters (with default values in parentheses): a (1/2) and b (1/2) giving the shape parameters for beta distribution, lambda (1) giving the hyperparameter for the diagonal elements, tau1sq (log(p)/(p^2*n)) giving the hyperparameter for the shrinkage prior of covariance.
nsample: a list giving the MCMC parameters. The list includes the following integers (with default values in parentheses): burnin (1000) giving the number of MCMC samples in transition period, nmc (1000) giving the number of MCMC samples for analysis.

Author

Kyoungjae Lee and Seongil Jo

Details

Lee, Jo, Lee, and Lee (2023+) proposed the screened beta-mixture shrinkage prior for estimating a sparse and positive definite covariance matrix. The screened beta-mixture shrinkage prior for $\Sigma = (\sigma_{jk})$ is defined as $$ \pi(\Sigma) = \frac{\pi^{u}(\Sigma)I(\Sigma \in C_p)}{\pi^{u}(\Sigma \in C_p)}, ~ C_p = \{\mbox{ all } p \times p \mbox{ positive definite matrices }\}, $$ where $\pi^{u}(\cdot)$ is the unconstrained prior given by $$ \pi^{u}(\sigma_{jk} \mid \psi_{jk}) = N\left(\sigma_{jk} \mid 0, \frac{\psi_{jk}}{1 - \psi_{jk}}\tau_1^2\right), ~ \psi_{jk} = 1 - 1/(1 + \phi_{jk})$$ $$ \pi^{u}(\psi_{jk}) = Beta(\psi_{jk} \mid a, b) ~ \mbox{for } (j, k) \in S_r(\hat{R})$$ $$ \pi^{u}(\sigma_{jj}) = Exp(\sigma_{jj} \mid \lambda), $$ where $S_r(\hat{R}) = \{(j,k) : 1 < j < k \le p, |\hat{\rho}_{jk}| > r\}$, $\hat{\rho}_{jk}$ are sample correlations, and $r$ is a threshold given by user.

For more details, see Lee, Jo, Lee and Lee (2022+).

References

Lee, K., Jo, S., Lee, K., and Lee, J. (2023+), "Scalable and optimal Bayesian inference for sparse covariance matrices via screened beta-mixture prior", arXiv:2206.12773.

Examples

Run this code


set.seed(1)
n <- 20
p <- 5

# generate a sparse covariance matrix:
True.Sigma <- matrix(0, nrow = p, ncol = p)
diag(True.Sigma) <- 1
Values <- -runif(n = p*(p-1)/2, min = 0.2, max = 0.8)
nonzeroIND <- which(rbinom(n=p*(p-1)/2,1,prob=1/p)==1)
zeroIND = (1:(p*(p-1)/2))[-nonzeroIND]
Values[zeroIND] <- 0
True.Sigma[lower.tri(True.Sigma)] <- Values
True.Sigma[upper.tri(True.Sigma)] <- t(True.Sigma)[upper.tri(True.Sigma)]
if(min(eigen(True.Sigma)$values) <= 0){
  delta <- -min(eigen(True.Sigma)$values) + 1.0e-5
  True.Sigma <- True.Sigma + delta*diag(p)
}

# generate a data
X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = True.Sigma)

# compute sparse, positive covariance estimator:
fout <- bspcov::sbmspcov(X = X, Sigma = diag(diag(cov(X))))
post.est.m <- bspcov::estimate(fout)
sqrt(mean((post.est.m - True.Sigma)^2))
sqrt(mean((cov(X) - True.Sigma)^2))

Run the code above in your browser using DataLab