
Quality Control based on Identity-by-descent (IBD) computed by PLINK 1.9 using its method-of-moments.
snp_plinkIBDQC(
plink.path,
bedfile.in,
bedfile.out = NULL,
pi.hat = 0.08,
ncores = 1,
pruning.args = c(100, 0.2),
do.blind.QC = TRUE,
extra.options = "",
verbose = TRUE
)
The path of the new bedfile. If no sample is filter, no new bed/bim/fam files are created and then the path of the input bedfile is returned.
Path to the executable of PLINK 1.9.
Path to the input bedfile.
Path to the output bedfile. Default is created by
appending "_norel"
to prefix.in
(bedfile.in
without extension).
PI_HAT value threshold for individuals (first by pairs)
to be excluded. Default is 0.08
.
Number of cores used. Default doesn't use parallelism.
You may use bigstatsr::nb_cores()
.
A vector of 2 pruning parameters, respectively
the window size (in variant count) and the pairwise $r^2$ threshold
(the step size is fixed to 1). Default is c(100, 0.2)
.
Whether to do QC with pi.hat
without visual inspection.
Default is TRUE
. If FALSE
, return the data.frame
of the corresponding
".genome" file without doing QC. One could use
ggplot2::qplot(Z0, Z1, data = mydf, col = RT)
for visual inspection.
Other options to be passed to PLINK as a string (for the IBD part). More options can be found at https://www.cog-genomics.org/plink/1.9/ibd.
Whether to show PLINK log? Default is TRUE
.
Chang, Christopher C, Carson C Chow, Laurent CAM Tellier, Shashaank Vattikuti, Shaun M Purcell, and James J Lee. 2015. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4 (1): 7. tools:::Rd_expr_doi("10.1186/s13742-015-0047-8").
download_plink snp_plinkQC snp_plinkKINGQC
if (FALSE) {
bedfile <- system.file("extdata", "example.bed", package = "bigsnpr")
plink <- download_plink()
bedfile <- snp_plinkIBDQC(plink, bedfile,
bedfile.out = tempfile(fileext = ".bed"),
ncores = 2)
df_rel <- snp_plinkIBDQC(plink, bedfile, do.blind.QC = FALSE, ncores = 2)
str(df_rel)
library(ggplot2)
qplot(Z0, Z1, data = df_rel, col = RT)
qplot(y = PI_HAT, data = df_rel) +
geom_hline(yintercept = 0.2, color = "blue", linetype = 2)
snp_plinkRmSamples(plink, bedfile,
bedfile.out = tempfile(fileext = ".bed"),
df.or.files = subset(df_rel, PI_HAT > 0.2))
}
Run the code above in your browser using DataLab