Learn R Programming

⚠️There's a newer version (1.12.18) of this package.Take me there.

bigsnpr

{bigsnpr} is an R package for the analysis of massive SNP arrays, primarily designed for human genetics. It enhances the features of package {bigstatsr} for the purpose of analyzing genotype data.

Quick demo

LIST OF FEATURES

Note that most of the algorithms of this package don't handle missing values. You can use snp_fastImpute() (taking a few hours for a chip of 15K x 300K) and snp_fastImputeSimple() (taking a few minutes only) to impute missing values of genotyped variants.

New! Package {bigsnpr} now provides functions that directly work on bed files with a few missing values. See new paper "Efficient toolkit implementing..".

Installation

In R, run

# install.packages("remotes")
remotes::install_github("privefl/bigsnpr")

or for the CRAN version

install.packages("bigsnpr")

Input formats

This package reads bed/bim/fam files (PLINK preferred format) using functions snp_readBed() and snp_readBed2(). Before reading into this package's special format, quality control and conversion can be done using PLINK, which can be called directly from R using snp_plinkQC() and snp_plinkKINGQC().

This package can also read UK Biobank BGEN files using function snp_readBGEN(). This function takes around 40 minutes to read 1M variants for 400K individuals using 15 cores.

This package uses a class called bigSNP for representing SNP data. A bigSNP object is a list with some elements:

  • genotypes: A FBM.code256. Rows are samples and columns are SNPs. This stores genotype calls or dosages (rounded to 2 decimal places).
  • fam: A data.frame with some information on the SNPs.
  • map: A data.frame with some information on the individuals.

New! Package {bigsnpr} now provides functions that directly work on bed files with a few missing values. See new paper "Efficient toolkit implementing..".

Polygenic scores

Polygenic scores are one of the main focus of this package. There are 3 main methods currently available:

  • Penalized regressions with individual-level data (see paper and tutorial)

  • Clumping and Thresholding (C+T) and Stacked C+T (SCT) with summary statistics and individual level data (see paper and tutorial).

  • LDpred2 with summary statistics (see preprint and tutorial)

Possible upcoming features

You can request some feature by opening an issue.

Bug report

How to make a great R reproducible example?

Please open an issue if you find a bug.

If you want help using {bigstatsr}, please open an issue on {bigstatsr}'s repo or post on Stack Overflow with the tag bigstatsr.

I will always redirect you to GitHub issues if you email me, so that others can benefit from our discussion.

References

Copy Link

Version

Install

install.packages('bigsnpr')

Monthly Downloads

1,725

Version

1.6.1

License

GPL-3

Maintainer

Florian Privé

Last Published

January 11th, 2021

Functions in bigsnpr (1.6.1)

bed_cprodVec

Cross-product with a vector
snp_ldpred2_inf

LDpred2
bed_projectPCA

Projecting PCA
LD.wiki34

Long-range LD regions
bed_counts

Counts
bed-methods

Methods for the bed class
SCT

Stacked C+T (SCT)
bed_prodVec

Product with a vector
bed-class

Class bed
bed_scaleBinom

Binomial(2, p) scaling
bigSNP-class

Class bigSNP
download_1000G

Download 1000G
coef_to_liab

Liability scale
snp_asGeneticPos

Interpolate to genetic positions
snp_attach

Attach a "bigSNP" from backing files
bed_randomSVD

Randomized partial SVD
reexports

Objects exported from other packages
same_ref

Determine reference divergence
bed_projectSelfPCA

Projecting PCA
bed_MAF

Allele frequencies
download_beagle

Download Beagle 4.1
seq_log

Sequence, evenly spaced on a logarithmic scale
download_plink

Download PLINK
bed_tcrossprodSelf

Tcrossprod
snp_gc

Genomic Control
bed_clumping

LD clumping
snp_beagleImpute

Imputation
snp_MAX3

MAX3 statistic
snp_match

Match alleles
CODE_012

CODE_012: code genotype calls (3) and missing values.
snp_cor

Correlation matrix
snp_PRS

PRS
snp_modifyBuild

Modify genome build
snp_plinkRmSamples

Remove samples
snp_attachExtdata

Attach a "bigSNP" for examples and tests
bigsnpr-package

bigsnpr: Analysis of Massive SNP Arrays
snp_fastImputeSimple

Fast imputation
snp_qq

Q-Q plot
snp_readBGI

Read variant info from one BGI file
snp_readBGEN

Read BGEN files into a "bigSNP"
snp_fst

Fixation index (Fst)
snp_ldsc

LD score regression
snp_manhattan

Manhattan plot
snp_simuPheno

Simulate phenotypes
snp_MAF

MAF
snp_fastImpute

Fast imputation
snp_scaleBinom

Binomial(n, p) scaling
snp_fake

Fake a "bigSNP"
snp_autoSVD

Truncated SVD while limiting LD
snp_plinkKINGQC

Relationship-based pruning
snp_thr_correct

Thresholding and correction
snp_getSampleInfos

Get sample information
snp_pcadapt

Outlier detection
snp_plinkIBDQC

Identity-by-descent
snp_save

Save modifications
snp_readBed

Read PLINK files into a "bigSNP"
snp_subset

Subset a bigSNP
snp_plinkQC

Quality Control
sub_bed

Replace extension '.bed'
snp_split

Split-parApply-Combine
snp_writeBed

Write PLINK files from a "bigSNP"