Learn R Programming

BGData (version 2.4.1)

getG_symDMatrix: Computes a Very Large Genomic Relationship Matrix

Description

Computes a positive semi-definite symmetric genomic relation matrix G=XX' offering options for centering and scaling the columns of X beforehand.

Usage

getG_symDMatrix(X, center = TRUE, scale = TRUE, impute = TRUE, scaleG = TRUE,
  minVar = 1e-05, blockSize = 5000L,
  folderOut = paste0("symDMatrix_", randomString()), vmode = "double",
  i = seq_len(nrow(X)), j = seq_len(ncol(X)), chunkSize = 5000L,
  nCores = getOption("mc.cores", 2L), verbose = FALSE)

Value

A symDMatrix object.

Arguments

X

A matrix-like object, typically the genotypes of a BGData object.

center

Either a logical value or a numeric vector of length equal to the number of columns of X. If FALSE, no centering is done. Defaults to TRUE.

scale

Either a logical value or a numeric vector of length equal to the number of columns of X. If FALSE, no scaling is done. Defaults to TRUE.

impute

Indicates whether missing values should be imputed. Defaults to TRUE.

scaleG

TRUE/FALSE whether xx' must be scaled.

minVar

Columns with variance lower than this value will not be used in the computation (only if scale is not FALSE).

blockSize

The number of rows and columns of each block. If NULL, a single block of the same length as i will be created. Defaults to 5000.

folderOut

The path to the folder where to save the symDMatrix object. Defaults to a random string prefixed with "symDMatrix_".

vmode

vmode of ff objects.

i

Indicates which rows of X should be used. Can be integer, boolean, or character. By default, all rows are used.

j

Indicates which columns of X should be used. Can be integer, boolean, or character. By default, all columns are used.

chunkSize

The number of columns of X that are brought into physical memory for processing per core. If NULL, all columns of X are used. Defaults to 5000.

nCores

The number of cores (passed to mclapply). Defaults to the number of cores as detected by detectCores.

verbose

Whether progress updates will be posted. Defaults to FALSE.

Details

Even very large genomic relationship matrices are supported by partitioning X into blocks and calling getG on these blocks. This function performs the block computations sequentially, which may be slow. In an HPC environment, performance can be improved by manually distributing these operations to different nodes.

See Also

multi-level-parallelism for more information on multi-level parallelism. symDMatrix-class and BGData-class for more information on the BGData class. getG to learn more about the underlying method.