Bayesian linear regression (BLR) models:
- unified mapping of genetic variants, estimation of genetic parameters (e.g. heritability) and prediction of disease risk)
- handles different genetic architectures (few large, many small effects)
- scale to large data (e.g. sparse LD)
In the Bayesian multiple regression model the posterior density of the model parameters depend on the likelihood of the data given the parameters and a prior probability for the model parameters
The prior density of marker effects defines whether the model will induce variable selection and shrinkage or shrinkage only. Also, the choice of prior will define the extent and type of shrinkage induced. Ideally the choice of prior for the marker effect should reflect the genetic architecture of the trait, and will vary (perhaps a lot) across traits.
The following prior distributions are provided:
Bayes N: Assigning a Gaussian prior to marker effects implies that the posterior means are the BLUP estimates (same as Ridge Regression).
Bayes L: Assigning a double-exponential or Laplace prior is the density used in the Bayesian LASSO
Bayes A: similar to ridge regression but t-distribution prior (rather than Gaussian) for the marker effects ; variance comes from an inverse-chi-square distribution instead of being fixed. Estimation via Gibbs sampling.
Bayes C: uses a “rounded spike” (low-variance Gaussian) at origin many small effects can contribute to polygenic component, reduces the dimensionality of the model (makes Gibbs sampling feasible).
Bayes R: Hierarchical Bayesian mixture model with 4 Gaussian components, with variances scaled by 0, 0.0001 , 0.001 , and 0.01 .
gbayes(
y = NULL,
X = NULL,
W = NULL,
stat = NULL,
covs = NULL,
trait = NULL,
fit = NULL,
Glist = NULL,
chr = NULL,
rsids = NULL,
b = NULL,
bm = NULL,
seb = NULL,
LD = NULL,
n = NULL,
formatLD = "dense",
vg = NULL,
vb = NULL,
ve = NULL,
ssg_prior = NULL,
ssb_prior = NULL,
sse_prior = NULL,
lambda = NULL,
scaleY = TRUE,
h2 = NULL,
pi = 0.001,
updateB = TRUE,
updateG = TRUE,
updateE = TRUE,
updatePi = TRUE,
adjustE = TRUE,
models = NULL,
nug = 4,
nub = 4,
nue = 4,
verbose = FALSE,
msize = 100,
mask = NULL,
GRMlist = NULL,
ve_prior = NULL,
vg_prior = NULL,
tol = 0.001,
nit = 100,
nburn = 0,
nit_local = NULL,
nit_global = NULL,
method = "mixed",
algorithm = "mcmc"
)
Returns a list structure including
vector or matrix (mxt) of posterior means for marker effects
vector or matrix (mxt) of posterior means for marker inclusion probabilities
scalar or vector (t) of posterior means for marker variances
scalar or vector (t) of posterior means for genomic variances
scalar or vector (t) of posterior means for residual variances
matrix (txt) of posterior means for marker correlations
matrix (txt) of posterior means for genomic correlations
matrix (txt) of posterior means for residual correlations
vector (1xnmodels) of posterior probabilities for models
vector (1xt) of posterior means for model probability
a list current parameters (same information as item listed above) used for restart of the analysis
matrix (mxt) of marker information and effects used for genomic risk scoring
is a vector or matrix of phenotypes
is a matrix of covariates
is a matrix of centered and scaled genotypes
dataframe with marker summary statistics
is a list of summary statistics (output from internal cvs function)
is an integer used for selection traits in covs object
is a list of results from gbayes
list of information about genotype matrix stored on disk
is the chromosome for which to fit BLR models
is a character vector of rsids
is a vector or matrix of marginal marker effects
is a vector or matrix of adjusted marker effects for the BLR model
is a vector or matrix of standard error of marginal effects
is a list with sparse LD matrices
is a scalar or vector of number of observations for each trait
is a character specifying LD format (formatLD="dense" is default)
is a scalar or matrix of genetic (co)variances
is a scalar or matrix of marker (co)variances
is a scalar or matrix of residual (co)variances
is a scalar or matrix of prior genetic (co)variances
is a scalar or matrix of prior marker (co)variances
is a scalar or matrix of prior residual (co)variances
is a vector or matrix of lambda values
is a logical; if TRUE y is centered and scaled
is the trait heritability
is the proportion of markers in each marker variance class (e.g. pi=c(0.999,0.001),used if method="ssvs")
is a logical for updating marker (co)variances
is a logical for updating genetic (co)variances
is a logical for updating residual (co)variances
is a logical for updating pi
is a logical for adjusting residual variance
is a list structure with models evaluated in bayesC
is a scalar or vector of prior degrees of freedom for prior genetic (co)variances
is a scalar or vector of prior degrees of freedom for marker (co)variances
is a scalar or vector of prior degrees of freedom for prior residual (co)variances
is a logical; if TRUE it prints more details during iteration
number of markers used in compuation of sparseld
is a vector or matrix of TRUE/FALSE specifying if marker should be ignored
is a list providing information about GRM matrix stored in binary files on disk
is a scalar or matrix of prior residual (co)variances
is a scalar or matrix of prior genetic (co)variances
is tolerance, i.e. convergence criteria used in gbayes
is the number of iterations
is the number of burnin iterations
is the number of local iterations
is the number of global iterations
specifies the methods used (method="bayesN","bayesA","bayesL","bayesC","bayesR")
specifies the algorithm
Peter Sørensen
# Simulate data and test functions
W <- matrix(rnorm(100000),nrow=1000)
set1 <- sample(1:ncol(W),5)
set2 <- sample(1:ncol(W),5)
sets <- list(set1,set2)
g <- rowSums(W[,c(set1,set2)])
e <- rnorm(nrow(W),mean=0,sd=1)
y <- g + e
fitM <- gbayes(y=y, W=W, method="bayesN")
fitA <- gbayes(y=y, W=W, method="bayesA")
fitL <- gbayes(y=y, W=W, method="bayesL")
fitC <- gbayes(y=y, W=W, method="bayesC")
Run the code above in your browser using DataLab