Computes a positive semi-definite symmetric genomic relation matrix G=XX'
offering options for centering and scaling the columns of X
beforehand.
getG_symDMatrix(X, center = TRUE, scale = TRUE, impute = TRUE, scaleG = TRUE,
minVar = 1e-05, blockSize = 5000L,
folderOut = paste0("symDMatrix_", randomString()), vmode = "double",
i = seq_len(nrow(X)), j = seq_len(ncol(X)), chunkSize = 5000L,
nCores = getOption("mc.cores", 2L), verbose = FALSE)
A symDMatrix
object.
A matrix-like object, typically the genotypes of a BGData
object.
Either a logical value or a numeric vector of length equal to the
number of columns of X
. If FALSE
, no centering is done.
Defaults to TRUE
.
Either a logical value or a numeric vector of length equal to the
number of columns of X
. If FALSE
, no scaling is done.
Defaults to TRUE
.
Indicates whether missing values should be imputed. Defaults to
TRUE
.
TRUE/FALSE whether xx' must be scaled.
Columns with variance lower than this value will not be used in the
computation (only if scale
is not FALSE
).
The number of rows and columns of each block. If NULL
, a single
block of the same length as i
will be created. Defaults to 5000.
The path to the folder where to save the symDMatrix
object.
Defaults to a random string prefixed with "symDMatrix_".
vmode of ff
objects.
Indicates which rows of X
should be used. Can be integer,
boolean, or character. By default, all rows are used.
Indicates which columns of X
should be used. Can be integer,
boolean, or character. By default, all columns are used.
The number of columns of X
that are brought into physical memory
for processing per core. If NULL
, all columns of X
are
used. Defaults to 5000.
The number of cores (passed to mclapply
). Defaults to the number
of cores as detected by detectCores
.
Whether progress updates will be posted. Defaults to FALSE
.
Even very large genomic relationship matrices are supported by partitioning
X
into blocks and calling getG
on these blocks. This function
performs the block computations sequentially, which may be slow. In an HPC
environment, performance can be improved by manually distributing these
operations to different nodes.
multi-level-parallelism
for more information on multi-level
parallelism. symDMatrix-class
and
BGData-class
for more information on the BGData
class.
getG
to learn more about the underlying method.