snp_autoSVD: Truncated SVD while limiting LD

Description

Fast truncated SVD with initial pruning and that iteratively removes long-range LD regions.

Usage

snp_autoSVD(
  G,
  infos.chr,
  infos.pos = NULL,
  ind.row = rows_along(G),
  ind.col = cols_along(G),
  fun.scaling = snp_scaleBinom(),
  thr.r2 = 0.2,
  size = 100/thr.r2,
  k = 10,
  roll.size = 50,
  int.min.size = 20,
  alpha.tukey = 0.05,
  min.mac = 10,
  max.iter = 5,
  is.size.in.bp = NULL,
  ncores = 1,
  verbose = TRUE
)
bed_autoSVD(
  obj.bed,
  ind.row = rows_along(obj.bed),
  ind.col = cols_along(obj.bed),
  fun.scaling = bed_scaleBinom,
  thr.r2 = 0.2,
  size = 100/thr.r2,
  k = 10,
  roll.size = 50,
  int.min.size = 20,
  alpha.tukey = 0.05,
  min.mac = 10,
  max.iter = 5,
  ncores = 1,
  verbose = TRUE
)

Value

A named list (an S3 class "big_SVD") of

d, the singular values,
u, the left singular vectors,
v, the right singular vectors,
niter, the number of the iteration of the algorithm,
nops, number of Matrix-Vector multiplications used,
center, the centering vector,
scale, the scaling vector.

Note that to obtain the Principal Components, you must use predict on the result. See examples.

Arguments

G: A FBM.code256 (typically <bigSNP>$genotypes).
You shouldn't have missing values. Also, remember to do quality control, e.g. some algorithms in this package won't work if you use SNPs with 0 MAF.
infos.chr: Vector of integers specifying each SNP's chromosome.
Typically <bigSNP>$map$chromosome.
infos.pos: Vector of integers specifying the physical position on a chromosome (in base pairs) of each SNP.
Typically <bigSNP>$map$physical.pos.
ind.row: An optional vector of the row indices (individuals) that are used. If not specified, all rows are used.
Don't use negative indices.
ind.col: An optional vector of the column indices (SNPs) that are used. If not specified, all columns are used.
Don't use negative indices.
fun.scaling: A function with parameters X (or obj.bed), ind.row and ind.col, and that returns a data.frame with $center and $scale for the columns corresponding to ind.col, to scale each of their elements such as followed: $$\frac{X_{i,j} - center_j}{scale_j}.$$ Default uses binomial scaling. You can also provide your own center and scale by using as_scaling_fun().
thr.r2: Threshold over the squared correlation between two SNPs. Default is 0.2. Use NA if you want to skip the clumping step.
size: For one SNP, window size around this SNP to compute correlations. Default is 100 / thr.r2 for clumping (0.2 -> 500; 0.1 -> 1000; 0.5 -> 200). If not providing infos.pos (NULL, the default), this is a window in number of SNPs, otherwise it is a window in kb (genetic distance). I recommend that you provide the positions if available.
k: Number of singular vectors/values to compute. Default is 10. This algorithm should be used to compute a few singular vectors/values.
roll.size: Radius of rolling windows to smooth log-p-values. Default is 50.
int.min.size: Minimum number of consecutive outlier SNPs in order to be reported as long-range LD region. Default is 20.
alpha.tukey: Default is 0.1. The type-I error rate in outlier detection (that is further corrected for multiple testing).
min.mac: Minimum minor allele count (MAC) for variants to be included. Default is 10.
max.iter: Maximum number of iterations of outlier detection. Default is 5.
is.size.in.bp: Deprecated.
ncores: Number of cores used. Default doesn't use parallelism. You may use nb_cores.
verbose: Output some information on the iterations? Default is TRUE.
obj.bed: Object of type bed, which is the mapping of some bed file. Use obj.bed <- bed(bedfile) to get this object.

Details

If you don't have any information about SNPs, you can try using

infos.chr = rep(1, ncol(G)),
size = ncol(G) (if SNPs are not sorted),
roll.size = 0 (if SNPs are not sorted).

Examples

Run this code

ex <- snp_attachExtdata()

obj.svd <- snp_autoSVD(G = ex$genotypes,
                       infos.chr = ex$map$chromosome,
                       infos.pos = ex$map$physical.position)

str(obj.svd)

Run the code above in your browser using DataLab