perMarkerQC checks the markers in the plink dataset for their missingness
rates across samples, their deviation from Hardy-Weinberg-Equilibrium (HWE)
and their minor allele frequencies (MAF). Per default, it assumes that IDs of
individuals that have failed perIndividualQC have been written
to qcdir/name.fail.IDs and removes these individuals when computing
missingness rates, HWE p-values and MAF. If the qcdir/name.fail.IDs file does
not exist, a message is written to stdout but the analyses will continue for
all samples in the name.fam/name.bed/name.bim dataset.
Depicts i) SNP missingness rates (stratified by minor allele
frequency) as histograms, ii) p-values of HWE exact test (stratified by all
and low p-values) as histograms and iii) the minor allele frequency
distribution as a histogram.
perMarkerQC(indir, qcdir = indir, name,
do.check_snp_missingness = TRUE, lmissTh = 0.01,
do.check_hwe = TRUE, hweTh = 1e-05, do.check_maf = TRUE,
macTh = 20, mafTh = NULL, interactive = FALSE, verbose = TRUE,
path2plink = NULL, showPlinkOutput = TRUE)[character] /path/to/directory containing the basic PLINK data files name.bim, name.bed, name.fam files.
[character] /path/to/directory where results will be written to.
If perIndividualQC was conducted, this directory should be the
same as qcdir specified in perIndividualQC, i.e. it contains
name.fail.IDs with IIDs of individuals that failed QC. User needs writing
permission to qcdir. Per default, qcdir=indir.
[character] Prefix of PLINK files, i.e. name.bed, name.bim, name.fam.
[logical] If TRUE, run
check_snp_missingness.
[double] Threshold for acceptable variant missing rate across samples.
[logical] If TRUE, run check_hwe.
[double] Significance threshold for deviation from HWE.
[logical] If TRUE, run check_maf.
[double] Threshold for minor allele cut cut-off, if both mafTh and macTh are specified, macTh is used (macTh = mafTh\*2\*NrSamples).
[double] Threshold for minor allele frequency cut-off.
[logical] Should plots be shown interactively? When choosing this option, make sure you have X-forwarding/graphical interface available for interactive plotting. Alternatively, set interactive=FALSE and save the returned plot object (p_marker) via ggplot2::ggsave(p=p_marker, other_arguments) or pdf(outfile) print(p_marker) dev.off().
[logical] If TRUE, progress info is printed to standard out.
[character] Absolute path to PLINK executable
(https://www.cog-genomics.org/plink/1.9/) i.e.
plink should be accesible as path2plink -h. The full name of the executable
should be specified: for windows OS, this means path/plink.exe, for unix
platforms this is path/plink. If not provided, assumed that PATH set-up works
and PLINK will be found by exec_wait('plink').
[logical] If TRUE, plink log and error messages are printed to standard out.
Named [list] with i) fail_list, a named [list] with 1.
SNP_missingness, containing SNP IDs [vector] failing the missingness
threshold lmissTh, 2. hwe, containing SNP IDs [vector] failing the HWE exact
test threshold hweTh and 3. maf, containing SNPs Ids [vector] failing the MAF
threshold mafTh/MAC threshold macTh and ii) p_markerQC, a ggplot2-object
'containing' a sub-paneled plot with the QC-plots of
check_snp_missingness, check_hwe and
check_maf, which can be shown by print(p_markerQC).
List entries contain NULL if that specific check was not chosen.
perMarkerQC wraps around the marker QC functions
check_snp_missingness, check_hwe and
check_maf. For details on the parameters and outputs, check
these function documentations.
# NOT RUN {
indir <- system.file("extdata", package="plinkQC")
qcdir <- tempdir()
name <- "data"
path2plink <- '/path/to/plink'
# the following code is not run on package build, as the path2plink on the
# user system is not known.
# All quality control checks
# }
# NOT RUN {
fail_markers <- perMarkerQC(indir=indir, qcdir=qcdir, name=name,
interactive=FALSE, verbose=TRUE, path2plink=path2plink)
# }
Run the code above in your browser using DataLab