Learn R Programming

mdpeer (version 1.0.1)

riPEER: Graph-constrained regression with penalty term being a linear combination of graph-based and ridge penalty terms

Description

Graph-constrained regression with penalty term being a linear combination of graph-based and ridge penalty terms.

See Details for model description and optimization problem formulation.

Usage

riPEER(Q, y, Z, X = NULL, optim.metod = "rootSolve",
  rootSolve.x0 = c(1e-05, 1e-05), rootSolve.Q0.x0 = 1e-05, sbplx.x0 = c(1,
  1), sbplx.lambda.lo = c(10^(-5), 10^(-5)), sbplx.lambda.up = c(1e+06,
  1e+06), compute.boot.CI = FALSE, boot.R = 1000, boot.conf = 0.95,
  boot.set.seed = TRUE, boot.parallel = "multicore", boot.ncpus = 4,
  verbose = TRUE)

Arguments

Q
graph-originated penalty matrix \((p \times p)\); typically: a graph Laplacian matrix
y
response values matrix \((n \times 1)\)
Z
design matrix \((n \times p)\) modeled as random effects variables (to be penalized in regression modeling); assumed to be already standarized
X
design matrix \((n \times k)\) modeled as fixed effects variables (not to be penalized in regression modeling); if does not contain columns of 1s, such column will be added to be treated as intercept in a model
optim.metod
optimization method used to optimize \(\lambda = (\lambda_Q, \lambda_R)\)
  • "rootSolve" (default) - optimizes by finding roots of non-linear equations by the Newton-Raphson method; from rootSolve package
  • "sbplx" - optimizes with the use of Subplex Algorithm: 'Subplex is a variant of Nelder-Mead that uses Nelder-Mead on a sequence of subspaces'; from nloptr package
rootSolve.x0
vector containing initial guesses for \(\lambda = (\lambda_Q, \lambda_R)\) used in "rootSolve" algorithm
rootSolve.Q0.x0
vector containing initial guess for \(\lambda_R\) used in "rootSolve" algorithm
sbplx.x0
vector containing initial guesses for \(\lambda = (\lambda_Q, \lambda_R)\) used in "sbplx" algorithm
sbplx.lambda.lo
vector containing minimum values of \(\lambda = (\lambda_Q, \lambda_R)\) grid search in "sbplx" algorithm
sbplx.lambda.up
vector containing mximum values of \(\lambda = (\lambda_Q, \lambda_R)\) grid search in "sbplx" algorithm
compute.boot.CI
logical whether or not compute bootstrap confidence intervals for \(b\) regression coefficient estimates
boot.R
number of bootstrap replications used in bootstrap confidence intervals computation
boot.conf
confidence level assumed in bootstrap confidence intervals computation
boot.set.seed
logical whether or not set seed in bootstrap confidence intervals computation
boot.parallel
value of parallel argument in boot function in bootstrap confidence intervals computation
boot.ncpus
value of ncpus argument in boot function in bootstrap confidence intervals computation
verbose
logical whether or not set verbose mode (print out function execution messages)

Value

b.est
vector of \(b\) coefficient estimates
beta.est
vector of \(\beta\) coefficient estimates
lambda.Q
\(\lambda_Q\) regularization parameter value
lambda.R
\(\lambda_R\) regularization parameter value
lambda.2
lambda.R/lambda.Q value
boot.CI
data frame with two columns, lower and upper, containing, respectively, values of lower and upper bootstrap confidence intervals for \(b\) regression coefficient estimates
obj.fn.val
optimization problem objective function value

Details

Estimates coefficients of linear model of the formula: $$y = X\beta + Zb + \varepsilon$$ where:
  • \(y\) - response,
  • \(X\) - data matrix,
  • \(Z\) - data matrix,
  • \(\beta\) - regression coefficients, not penalized in estimation process,
  • \(b\) - regression coefficients, penalized in estimation process and for whom there is, possibly a prior graph of similarity / graph of connections available.

The method uses a penalty being a linear combination of a graph-based and ridge penalty terms: $$\beta_{est}, b_{est}= arg \; min_{\beta,b} \{ (y - X\beta - Zb)^T(y - X\beta - Zb) + \lambda_Qb^TQb + \lambda_Rb^Tb \}$$ where:

  • \(Q\) - a graph-originated penalty matrix; typically: a graph Laplacian matrix,
  • \(\lambda_Q\) - regularization parameter for a graph-based penalty term
  • \(\lambda_R\) - regularization parameter for ridge penalty term

The two regularization parameters, \(\lambda_Q\) and \(\lambda_R\), are estimated as ML estimators from equivalent Linear Mixed Model optimizaton problem formulation (see: References).

  • Graph-originated penalty term allows imposing similarity between coefficients based on graph information given.
  • Ridge-originated penalty term facilitates parameters estimation: it reduces computational issues arising from singularity in a graph-originated penalty matrix and yields plausible results in situations when graph information is not informative.

Bootstrap confidence intervals computation is available (not set as a default option).

References

Karas, M., Brzyski, D., Dzemidzic, M., J., Kareken, D.A., Randolph, T.W., Harezlak, J. (2017). Brain connectivity-informed regularization methods for regression. doi: https://doi.org/10.1101/117945

Examples

Run this code
set.seed(1234)
n <- 200
p1 <- 10
p2 <- 90
p <- p1 + p2
# Define graph adjacency matrix
A <- matrix(rep(0, p*p), nrow = p, ncol = p)
A[1:p1, 1:p1] <- 1
A[(p1+1):p, (p1+1):p] <- 1
L <- Adj2Lap(A)
# Define Q penalty matrix as graph Laplacian matrix normalized)
Q <- L2L.normalized(L)
# Define Z,X design matrices and aoutcome y
Z <- matrix(rnorm(n*p), nrow = n, ncol = p)
b.true <- c(rep(1, p1), rep(0, p2))
X <- matrix(rnorm(n*3), nrow = n, ncol = 3)
beta.true <- runif(3)
intercept <- 0
eta <- intercept + Z %*% b.true + X %*% beta.true
R2 <- 0.5
sd.eps <- sqrt(var(eta) * (1 - R2) / R2)
error <- rnorm(n, sd = sd.eps)
y <- eta + error

## Not run: ------------------------------------
# riPEER.out <- riPEER(Q, y, Z, X)
# plt.df <- data.frame(x = 1:p, y = riPEER.out$b.est)
# ggplot(plt.df, aes(x = x, y = y, group = 1)) + geom_line() + labs("b estimates")
## ---------------------------------------------

## Not run: ------------------------------------
# # riPEER with 0.95 bootstrap confidence intervals computation
# riPEER.out <- riPEER(Q, y, Z, X, compute.boot.CI = TRUE, boot.R = 500)
# plt.df <- data.frame(x = 1:p, 
#                      y = riPEER.out$b.est, 
#                      lo = riPEER.out$boot.CI[,1], 
#                      up =  riPEER.out$boot.CI[,2])
# ggplot(plt.df, aes(x = x, y = y, group = 1)) + geom_line() +  
#   geom_ribbon(aes(ymin=lo, ymax=up), alpha = 0.3)
## ---------------------------------------------

Run the code above in your browser using DataLab