perMarkerQC checks the markers in the plink dataset for their missingness
rates across samples, their deviation from Hardy-Weinberg-Equilibrium (HWE)
and their minor allele frequencies (MAF). Per default, it assumes that IDs of
individuals that have failed perIndividualQC
have been written
to qcdir/name.fail.IDs and removes these individuals when computing
missingness rates, HWE p-values and MAF. If the qcdir/name.fail.IDs file does
not exist, a message is written to stdout but the analyses will continue for
all samples in the name.fam/name.bed/name.bim dataset.
Depicts i) SNP missingness rates (stratified by minor allele
frequency) as histograms, ii) p-values of HWE exact test (stratified by all
and low p-values) as histograms and iii) the minor allele frequency
distribution as a histogram.
perMarkerQC(indir, qcdir = indir, name,
do.check_snp_missingness = TRUE, lmissTh = 0.01,
do.check_hwe = TRUE, hweTh = 1e-05, do.check_maf = TRUE,
macTh = 20, mafTh = NULL, interactive = FALSE, verbose = TRUE,
path2plink = NULL, showPlinkOutput = TRUE)
[character] /path/to/directory containing the basic PLINK data files name.bim, name.bed, name.fam files.
[character] /path/to/directory where results will be written to.
If perIndividualQC
was conducted, this directory should be the
same as qcdir specified in perIndividualQC
, i.e. it contains
name.fail.IDs with IIDs of individuals that failed QC. User needs writing
permission to qcdir. Per default, qcdir=indir.
[character] Prefix of PLINK files, i.e. name.bed, name.bim, name.fam.
[logical] If TRUE, run
check_snp_missingness
.
[double] Threshold for acceptable variant missing rate across samples.
[logical] If TRUE, run check_hwe
.
[double] Significance threshold for deviation from HWE.
[logical] If TRUE, run check_maf
.
[double] Threshold for minor allele cut cut-off, if both mafTh and macTh are specified, macTh is used (macTh = mafTh\*2\*NrSamples).
[double] Threshold for minor allele frequency cut-off.
[logical] Should plots be shown interactively? When choosing this option, make sure you have X-forwarding/graphical interface available for interactive plotting. Alternatively, set interactive=FALSE and save the returned plot object (p_marker) via ggplot2::ggsave(p=p_marker, other_arguments) or pdf(outfile) print(p_marker) dev.off().
[logical] If TRUE, progress info is printed to standard out.
[character] Absolute path to PLINK executable
(https://www.cog-genomics.org/plink/1.9/) i.e.
plink should be accesible as path2plink -h. The full name of the executable
should be specified: for windows OS, this means path/plink.exe, for unix
platforms this is path/plink. If not provided, assumed that PATH set-up works
and PLINK will be found by exec_wait
('plink').
[logical] If TRUE, plink log and error messages are printed to standard out.
Named [list] with i) fail_list, a named [list] with 1.
SNP_missingness, containing SNP IDs [vector] failing the missingness
threshold lmissTh, 2. hwe, containing SNP IDs [vector] failing the HWE exact
test threshold hweTh and 3. maf, containing SNPs Ids [vector] failing the MAF
threshold mafTh/MAC threshold macTh and ii) p_markerQC, a ggplot2-object
'containing' a sub-paneled plot with the QC-plots of
check_snp_missingness
, check_hwe
and
check_maf
, which can be shown by print(p_markerQC).
List entries contain NULL if that specific check was not chosen.
perMarkerQC wraps around the marker QC functions
check_snp_missingness
, check_hwe
and
check_maf
. For details on the parameters and outputs, check
these function documentations.
# NOT RUN {
indir <- system.file("extdata", package="plinkQC")
qcdir <- tempdir()
name <- "data"
path2plink <- '/path/to/plink'
# the following code is not run on package build, as the path2plink on the
# user system is not known.
# All quality control checks
# }
# NOT RUN {
fail_markers <- perMarkerQC(indir=indir, qcdir=qcdir, name=name,
interactive=FALSE, verbose=TRUE, path2plink=path2plink)
# }
Run the code above in your browser using DataLab