Let \(S_{+}^q\) be the set of \(q\) by \(q\) symmetric and positive definite matrices and let \(y_i\in R^q\) be the measurements of the \(q\) responses for the \(i\)th subject (\(i=1,\ldots, n\)). The model assumes that \(y_i\) is a realization of the \(q\)-variate random vector $$Y_i = \mu + \beta'x_i + \varepsilon_i, \ \ \ \ i=1,\ldots, n$$ where \(\mu\in R^q\) is an unknown intercept vector; \(\beta\in R^{p\times q}\) is an unknown regression coefficient matrix; \(x_i \in R^p\) is the known vector of values for \(i\)th subjects's predictors, and \(\varepsilon_1,\ldots, \varepsilon_n\) are \(n\) independent copies of a \(q\)-variate Normal random vector with mean 0 and unknown inverse covariance matrix \(\Omega \in S_{+}^q\).
This function computes penalized likelihood estimates of the unknown parameters \(\mu\), \(\beta\), and \(\Omega\). Let \(\bar y=n^{-1} \sum_{i=1}^n y_i\) and \(\bar{x} = n^{-1}\sum_{i=1}^n x_i\). These estimates are $$ (\hat{\beta}, \hat\Omega) = \arg\min_{(B, Q)\in R^{p\times q}\times S_{+}^q} \left\{g(B, Q) +\lambda_1 \left(\sum_{j\neq k} |Q_{jk}| + 1(p\geq n) \sum_{j=1}^q |Q_{jj}| \right) + 2\lambda_{2}\sum_{j=1}^p\sum_{k=1}^q |B_{jk}|\right\}$$ and \(\hat\mu=\bar y - \hat\beta'\bar x\), where $$ g(B, Q) = {\rm tr}\{n^{-1}(Y-XB)'(Y-XB) Q\}-\log|Q|,$$ \(Y\in R^{n\times q}\) has \(i\)th row \((y_{i}-\bar y)'\), and \(X\in R^{n\times p}\) has \(i\)th row \((x_{i}-\bar{x})'\).
mrce(X,Y, lam1=NULL, lam2=NULL, lam1.vec=NULL, lam2.vec=NULL,
method=c("single", "cv", "fixed.omega"),
cov.tol=1e-4, cov.maxit=1e3, omega=NULL,
maxit.out=1e3, maxit.in=1e3, tol.out=1e-8,
tol.in=1e-8, kfold=5, silent=TRUE, eps=1e-5,
standardize=FALSE, permute=FALSE)
An \(n\) by \(p\) matrix of the values for the prediction variables.
The \(i\)th row of X
is \(x_i\) defined above (\(i=1,\ldots, n\)).
Do not include a column of ones.
An \(n\) by \(q\) matrix of the observed responses.
The \(i\)th row of Y
is \(y_i\) defined above (\(i=1,\ldots, n\)).
A single value for \(\lambda_1\) defined above. This
argument is only used if method="single"
A single value for \(\lambda_2\) defined above
(or a \(p\) by \(q\) matrix with \((j,k)\)th entry \(\lambda_{2jk}\)
in which case the penalty \(2\lambda_{2}\sum_{j=1}^p\sum_{k=1}^q |B_{jk}|\) becomes
\(2\sum_{j=1}^p\sum_{k=1}^q \lambda_{2jk}|B_{jk}|\)). This
argument is not used if method="cv"
.
A vector of candidate values for \(\lambda_1\) from which the cross validation procedure
searches: only used when method="cv"
and must be specified by the user when method="cv"
. Please arrange in decreasing order.
A vector of candidate values for \(\lambda_2\) from which the cross validation procedure
searches: only used when method="cv"
and must be specified by the user when method="cv"
. Please arrange in decreasing order.
There are three options:
method="single"
computes the MRCE estimate of the regression coefficient matrix
with penalty tuning parameters lam1
and lam2
;
method="cv"
performs kfold
cross
validation using candidate tuning parameters in lam1.vec
and lam2.vec
;
method="fixed.omega"
computes the regression coefficient matrix estimate for which \(Q\) (defined above)
is fixed at omega
.
Convergence tolerance for the glasso algorithm that minimizes the objective function (defined above) with \(B\) fixed.
The maximum number of iterations allowed for the glasso algorithm that minimizes the objective function (defined above) with \(B\) fixed.
A user-supplied fixed value of \(Q\). Only used when
method="fixed.omega"
in which case the minimizer of the objective function (defined above) with \(Q\)
fixed at omega
is returned.
The maximum number of iterations allowed for the outer loop of the exact MRCE algorithm.
The maximum number of iterations allowed for the algorithm that minimizes the objective function, defined above, with \(\Omega\) fixed.
Convergence tolerance for outer loop of the exact MRCE algorithm.
Convergence tolerance for the algorithm that minimizes the objective function, defined above, with \(\Omega\) fixed.
The number of folds to use when method="cv"
.
Logical: when silent=FALSE
this function displays progress updates to the screen.
The algorithm will terminate if the minimum diagonal entry of the current iterate's residual
sample covariance is less than eps
. This may need adjustment depending on the scales of the variables.
Logical: should the columns of X
be standardized so each has unit length and zero average. The parameter estimates are returned on the original unstandarized scale.
The default is FALSE
.
Logical: when method="cv"
, should the subject indices be permutted? The default is FALSE
.
A list containing
This is \(\hat\beta \in R^{p\times q}\) defined above. If method="cv"
,
then best.lam1
and best.lam2
defined below are used for \(\lambda_1\) and \(\lambda_2\).
This is the intercept estimate \(\hat\mu \in R^q\) defined above.
If method="cv"
,
then best.lam1
and best.lam2
defined below are used for \(\lambda_1\) and \(\lambda_2\).
This is \(\hat\Omega \in S_{+}^q\) defined above. If method="cv"
,
then best.lam1
and best.lam2
defined below are used for \(\lambda_1\) and \(\lambda_2\).
This is \(\bar x \in R^p\) defined above.
This is \(\bar y \in R^q\) defined above.
The selected value for \(\lambda_1\) by cross validation. Will be NULL
unless method="cv"
.
The selected value for \(\lambda_2\) by cross validation. Will be NULL
unless method="cv"
.
Cross validation error matrix with length(lam1.vec)
rows and
length(lam2.vec)
columns. Will be NULL
unless method="cv"
.
Please see Rothman, Levina, and Zhu (2010)
for more information on the algorithm and model.
This version of the software uses the glasso algorithm (Friedman et al., 2008) through the R package glasso
.
If the algorithm is running slowly, track its progress with silent=FALSE
.
In some cases, choosing cov.tol=0.1
and tol.out=1e-10
allows the algorithm to make
faster progress. If one uses a matrix for lam2
, consider setting tol.in=1e-12
.
When \(p \geq n\), the diagonal of the optimization variable corresponding to the inverse covariance matrix of the error is penalized. Without diagonal penalization, if there exists a \(\bar B\) such that the \(q\)th column of \(Y\) is equal to the \(q\)th column of \(X\bar B\), then a global minimizer of the objective function (defined above) does not exist.
The algorithm that minimizes the objective function, defined above,
with \(Q\) fixed uses a similar update strategy and termination
criterion to those used by Friedman et al. (2010) in the corresponding R package glmnet
.
Rothman, A. J., Levina, E., and Zhu, J. (2010) Sparse multivariate regression with covariance estimation. Journal of Computational and Graphical Statistics. 19: 947--962.
Jerome Friedman, Trevor Hastie, Robert Tibshirani (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3), 432-441.
Jerome Friedman, Trevor Hastie, Robert Tibshirani (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1-22.
# NOT RUN {
set.seed(48105)
n=50
p=10
q=5
Omega.inv=diag(q)
for(i in 1:q) for(j in 1:q)
Omega.inv[i,j]=0.7^abs(i-j)
out=eigen(Omega.inv, symmetric=TRUE)
Omega.inv.sqrt=tcrossprod(out$vec*rep(out$val^(0.5), each=q),out$vec)
Omega=tcrossprod(out$vec*rep(out$val^(-1), each=q),out$vec)
X=matrix(rnorm(n*p), nrow=n, ncol=p)
E=matrix(rnorm(n*q), nrow=n, ncol=q)%*%Omega.inv.sqrt
Beta=matrix(rbinom(p*q, size=1, prob=0.1)*runif(p*q, min=1, max=2), nrow=p, ncol=q)
mu=1:q
Y=rep(1,n)%*%t(mu) + X%*%Beta + E
lam1.vec=rev(10^seq(from=-2, to=0, by=0.5))
lam2.vec=rev(10^seq(from=-2, to=0, by=0.5))
cvfit=mrce(Y=Y, X=X, lam1.vec=lam1.vec, lam2.vec=lam2.vec, method="cv")
cvfit
fit=mrce(Y=Y, X=X, lam1=10^(-1.5), lam2=10^(-0.5), method="single")
fit
lam2.mat=1000*(fit$Bhat==0)
refit=mrce(Y=Y, X=X, lam2=lam2.mat, method="fixed.omega", omega=fit$omega, tol.in=1e-12)
refit
# }
Run the code above in your browser using DataLab