fitNullMM
fits a mixed model with random effects specified by their covariance structures; this allows for the inclusion of a polygenic random effect using a kinship matrix or genetic relationship matrix (GRM). The output of fitNullMM
can be used to estimate genetic heritability and can be passed to assocTestMM
for the purpose of genetic association testing.fitNullMM(scanData, outcome, covars = NULL, covMatList, scan.include = NULL, family = gaussian, group.var = NULL, start = NULL, AIREML.tol = 1e-6, maxIter = 100, dropZeros = TRUE, verbose = TRUE)
ScanAnnotationDataFrame
from the package GWASTools
or class data.frame
containing the outcome and covariate data for the samples to be used for the analysis. scanData
must have a column scanID
containing unique IDs for all samples.scanData
.scanData
; an intercept term is automatically included. If NULL (default) the only fixed effect covariate is the intercept term.scanData
. If only one random effect is being used, a single matrix (not in a list) can be used. See 'Details' for more information.scanData
are included.family
for further options, and see 'Details' for more information.family
= gaussian. A character string specifying the name of a categorical variable in scanData
that is used to fit heterogeneous residual error variances. If NULL (default), then a standard LMM with constant residual variance for all samples is fit. See 'Details' for more information.covMatList
. When family
is gaussian, there are additional residual variance components; one residual variance component when group.var
is NULL, and as many residual variance components as there are unique values of group.var
when it is specified.varComp
. This can be used for hypothesis tests regarding the variance components.covars
.family
is gaussian, this will typically be 1 since the residual variance component is estimated separately.family
is gaussian, this is just the original outcome vector. When family
is not gaussian, this is the PQL linearization of the outcome vector. This is used by assocTestMM
for genetic association testing. See 'Details' for more information.assocTestMM
for genetic association testing.varComp
specifying whether the corresponding variance component estimate was set to 0 by the function due to convergence to the boundary in the AIREML procedure.group.var
).covMatList
is used to specify the covariance structures of the random effects terms in the model. For example, to include a polygenic random effect, one matrix in covMatList
could be a kinship matrix or a genetic relationship matrix (GRM). As another example, to include household membership as a random effect, one matrix in covMatList
should be a 0/1 matrix with a 1 in the [i,j] and [j,i] entries if individuals i and j are in the same household and 0 otherwise; the diagonals of such a matrix should all be 1.
When family
is not gaussian, the penalized quasi-likelihood (PQL) approximation to the generlized linear mixed model (GLMM) is fit following the procedure of GMMAT (Chen et al.).
For some outcomes, there may be evidence that different groups of observations have different residual variances, and the standard LMM assumption of homoscedasticity is violated. When group.var
is specified, separate (heterogeneous) residual variance components are fit for each unique value of group.var
.
Let m be the number of matrices in covMatList
and let g be the number of categories in the variable specified by group.var
. The length of the start
vector must be (m + 1) when family
is gaussian and group.var
is NULL; (m + g) when family
is gaussian and group.var
is specified; or m when family
is not gaussian.
A Newton-Raphson iterative procedure with Average Information REML (AIREML) is used to estimate the variance components of the random effects. When the Euclidean distance between the new and previous variance component estimates is less than AIREML.tol
, the algorithm declares convergence of the estimates. Sometimes a variance component may approach the boundary of the parameter space at 0; step-halving is used to prevent any component from becomming negative. However, when a variance component gets near the 0 boundary, the algorithm can sometimes get "stuck", preventing the other variance components from converging; if dropZeros
is TRUE, then variance components that converge to a value less than AIREML.tol
will be dropped from the model and the estimation procedure will continue with the remaining variance components.
varCompCI
for estimating confidence intervals for the variance components and the proportion of variability (heritability) they explain, assocTestMM
for running mixed model genetic association tests using the output from fitNullMM
.
GWASTools
for a description of the package containing the ScanAnnotationDataFrame
class.
# file path to GDS file
gdsfile <- system.file("extdata", "HapMap_ASW_MXL_geno.gds", package="GENESIS")
# read in GDS data
HapMap_geno <- GdsGenotypeReader(filename = gdsfile)
# create a GenotypeData class object
HapMap_genoData <- GenotypeData(HapMap_geno)
# load saved matrix of KING-robust estimates
data("HapMap_ASW_MXL_KINGmat")
# run PC-AiR
mypcair <- pcair(genoData = HapMap_genoData, kinMat = HapMap_ASW_MXL_KINGmat,
divMat = HapMap_ASW_MXL_KINGmat)
# run PC-Relate
mypcrel <- pcrelate(genoData = HapMap_genoData, pcMat = mypcair$vectors[,1],
training.set = mypcair$unrels)
close(HapMap_genoData)
# generate a phenotype
set.seed(4)
pheno <- 0.2*mypcair$vectors[,1] + rnorm(mypcair$nsamp, mean = 0, sd = 1)
# make ScanAnnotationDataFrame
scanAnnot <- ScanAnnotationDataFrame(data.frame(scanID = mypcrel$sample.id,
pc1 = mypcair$vectors[,1], pheno = pheno))
# make covMatList
covMatList <- list("Kin" = pcrelateMakeGRM(mypcrel))
# fit the null mixed model
nullmod <- fitNullMM(scanData = scanAnnot, outcome = "pheno", covars = "pc1", covMatList = covMatList)
Run the code above in your browser using DataLab