EMMA: Efficient Mixed Model Association Algorithm

Description

This function is used internally in the function mmer when one or more variance component other than the error needs to be estimated through the use of the efficient mixed model association (EMMA) algorithm. When multiple random effects are fitted the function internally create weight for each random effect first and then does all calculations.

Usage

EMMA(y, X=NULL, ZETA=NULL, REML=TRUE, silent=FALSE, che=TRUE, forced=NULL, EIGEND=FALSE)

Arguments

a numeric vector for the response variable

an incidence matrix for fixed effects.

ZETA

an incidence matrix for random effects. This can be for one or more random effects. This NEEDS TO BE PROVIDED AS A LIST STRUCTURE. For example Z=list(list(Z=Z1, K=K1), list(Z=Z2, K=K2), list(Z=Z3, K=K3)) makes a 2 level list for 3 random effects. The general idea is that each random effect with or without its variance-covariance structure is a list, i.e. list(Z=Z1, K=K1) where Z is the incidence matrix and K the var-cov matrix. When moving to more than one random effect we need to make several lists that need to be inside another list. What we call a 2-level list, i.e. list(Z=Z1, K=K1) and list(Z=Z2, K=K2) would need to be put in the form; list(list(Z=Z1, K=K1),list(Z=Z1, K=K1)), which as can be seen, is a list of lists (2-level list).

REML

a TRUE/FALSE value indicating if restricted maximum likelihood should be used instead of ML. The default is TRUE.

silent

a TRUE/FALSE value indicating if the function should draw the progress bar or iterations performed while working or should not be displayed.

che

a TRUE/FALSE value indicating if list structure provided by the user is correct to fix it. The default is TRUE but is turned off to FALSE within the mmer function which would imply a double check.

forced

a vector of numeric values for variance components including error if the user wants to force the values of the variance components. On the meantime only works for forcing all of them and not a subset of them. The default is NULL, meaning that variance components will be estimated by REML/ML.

EIGEND

a TRUE/FALSE value indicating if the function should perform the eigen decomposition of the K matrix for square problems to accelerate inversion and estimate parameters faster. Is 2 to 3 times faster than regular EMMA but only works for square problems.

Value

If all parameters are correctly indicated the program will return a list with the following information:

$Vu: a scalar value for the variance component estimated
$Ve: a scalar value for the error variance estimated
$V.inv: a matrix with the inverse of the phenotypic variance V = ZGZ+R, V^-1
$u.hat: a vector with BLUPs for random effects
$Var.u.hat: a vector with variances for BLUPs
$PEV.u.hat: a vector with predicted error variance for BLUPs
$beta.hat: a vector for BLUEs of fixed effects
$Var.beta.hat: a vector with variances for BLUEs
$X: incidence matrix for fixed effects, if not passed is assumed to only include the intercept
$Z: incidence matrix for random effects, if not passed is assumed to be a diagonal matrix
$K: the var-cov matrix for the random effect fitted in Z
$ll: the log-likelihood value for obtained when optimizing the likelihood function when using ML or REML

Details

This algorithm is based on Kang et al. (2008), it is based on REML using the ridge regression parameter lambda as the ration of Var(e)/Var(u). This handles models of the form:

y = Xb + Zu + e

b ~ N[b.hat, 0] zero variance because is a fixed term

u ~ N[0, K*sigma(u)] where: K*sigma(u) = G

e ~ N[0, I*sigma(e)] where: I*sigma(e) = R

y ~ N[Xb, var(Zu+e)] where;

var(y) = var(Zu+e) = ZGZ+R = V which is the phenotypic variance

The function allows the user to specify the incidence matrices for the random effect "u" as Z and its variance-covariance matrix as K.

The likelihood function optimized in this algorithm is:

logL = (n - p) * log(sum(eta^2/ lambda + delta)) + sum(log(lambda + delta))

where: (n-p) refers to the degrees of freedom lambda are the eigenvalues mentioned by Kang et al.(2008) delta is the REML estimator of the ridge parameter

The algorithm can be summarized in the next steps:

1) provide initial value for the ridge parameter

2) estimate S = I - X(X'X)-X'

3) obtain the phenotypic variance V = ZKZ' + delta.prov*I

4) perform an eigen decomposition of SVS

5) create "lambda"" as the eigenvalues of SVS and "U"" as the eigenvectors

6) estimate eta=U'y

7) optimize the likelihood shown above providing "eta", "lambdas" and optimize with respect to "delta" which is the ridge parameter and contains Ve/Vu

References

Kang et al. 2008. Efficient control of population structure in model organism association mapping. Genetics 178:1709-1723.

Covarrubias-Pazaran G (2016) Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6): doi:10.1371/journal.pone.0156744

Examples

Run this code

# NOT RUN {
####=========================================####
#### breeding values with 1 variance component
#### using EMMA algorithm
####=========================================####

####=========================================####
#### random population of 200 lines with 1000 markers
####=========================================####
M <- matrix(rep(0,200*1000),200,1000)
for (i in 1:200) {
  M[i,] <- ifelse(runif(1000)<0.5,-1,1)
}
####=========================================####
#### Simulate random phenotypes
####=========================================####
u <- rnorm(1000)
g <- as.vector(crossprod(t(M),u))
h2 <- 0.5  #heritability
y <- g + rnorm(200,mean=0,sd=sqrt((1-h2)/h2*var(g)))
A <- A.mat(M)
ETA <- list(add=list(K=A))
ans <- EMMA(y=y, ZETA=ETA)
# }

Run the code above in your browser using DataLab