bed_randomSVD: Randomized partial SVD

Description

Partial SVD (or PCA) of a genotype matrix stored as a PLINK (.bed) file.#'

Usage

bed_randomSVD(
  obj.bed,
  fun.scaling = bed_scaleBinom,
  ind.row = rows_along(obj.bed),
  ind.col = cols_along(obj.bed),
  k = 10,
  tol = 1e-04,
  verbose = FALSE,
  ncores = 1
)

Value

A named list (an S3 class "big_SVD") of

d, the singular values,
u, the left singular vectors,
v, the right singular vectors,
niter, the number of the iteration of the algorithm,
nops, number of Matrix-Vector multiplications used,
center, the centering vector,
scale, the scaling vector.

Note that to obtain the Principal Components, you must use predict on the result. See examples.

Arguments

obj.bed: Object of type bed, which is the mapping of some bed file. Use obj.bed <- bed(bedfile) to get this object.
fun.scaling: A function with parameters X, ind.row and ind.col, and that returns a data.frame with $center and $scale for the columns corresponding to ind.col, to scale each of their elements such as followed: $$\frac{X_{i,j} - center_j}{scale_j}.$$ Default doesn't use any scaling. You can also provide your own center and scale by using as_scaling_fun().
ind.row: An optional vector of the row indices (individuals) that are used. If not specified, all rows are used.
Don't use negative indices.
ind.col: An optional vector of the column indices (SNPs) that are used. If not specified, all columns are used.
Don't use negative indices.
k: Number of singular vectors/values to compute. Default is 10. This algorithm should be used to compute only a few singular vectors/values.
tol: Precision parameter of svds. Default is 1e-4.
verbose: Should some progress be printed? Default is FALSE.
ncores: Number of cores used. Default doesn't use parallelism. You may use nb_cores.

Examples

Run this code

bedfile <- system.file("extdata", "example.bed", package = "bigsnpr")
obj.bed <- bed(bedfile)

str(bed_randomSVD(obj.bed))

Run the code above in your browser using DataLab