Runs and evaluates results from plink --hardy. It calculates the observed and
expected heterozygote frequencies for all variants in the individuals that
passed the perIndividualQC
and computes the deviation of the
frequencies from Hardy-Weinberg equilibrium (HWE) by HWE exact test. The
p-values of the HWE exact test are displayed as histograms (stratified by
all and low p-values), where the hweTh is used to depict the quality control
cut-off for SNPs.
check_hwe(indir, name, qcdir = indir, hweTh = 1e-05,
interactive = FALSE, path2plink = NULL, verbose = FALSE,
showPlinkOutput = TRUE)
[character] /path/to/directory containing the basic PLINK data files name.bim, name.bed, name.fam files.
[character] Prefix of PLINK files, i.e. name.bed, name.bim, name.fam.
[character] /path/to/directory where results will be written to.
If perIndividualQC
was conducted, this directory should be the
same as qcdir specified in perIndividualQC
, i.e. it contains
name.fail.IDs with IIDs of individuals that failed QC. User needs writing
permission to qcdir. Per default, qcdir=indir.
[double] Significance threshold for deviation from HWE.
[logical] Should plots be shown interactively? When choosing this option, make sure you have X-forwarding/graphical interface available for interactive plotting. Alternatively, set interactive=FALSE and save the returned plot object (p_hwe) via ggplot2::ggsave(p=p_hwe, other_arguments) or pdf(outfile) print(p_hwe) dev.off().
[character] Absolute path to PLINK executable
(https://www.cog-genomics.org/plink/1.9/) i.e.
plink should be accesible as path2plink -h. The full name of the executable
should be specified: for windows OS, this means path/plink.exe, for unix
platforms this is path/plink. If not provided, assumed that PATH set-up works
and PLINK will be found by exec_wait
('plink').
[logical] If TRUE, progress info is printed to standard out and specifically, if TRUE, plink log will be displayed.
[logical] If TRUE, plink log and error messages are printed to standard out.
Named list with i) fail_hwe containing a [data.frame] with CHR (Chromosome code), SNP (Variant identifier), TEST (Type of test: one of 'ALL', 'AFF', 'UNAFF', 'ALL(QT)', 'ALL(NP)'), A1 (Allele 1; usually minor), A2 (Allele 2; usually major), GENO ('/'-separated genotype counts: A1 hom, het, A2 hom), O(HET) (Observed heterozygote frequency E(HET) (Expected heterozygote frequency), P (Hardy-Weinberg equilibrium exact test p-value) for all SNPs that failed the hweTh and ii) p_hwe, a ggplot2-object 'containing' the HWE p-value distribution histogram which can be shown by (print(p_hwe)).
check_hwe
uses plink --remove name.fail.IDs --hardy to
calculate the observed and expected heterozygote frequencies per SNP in the
individuals that passed the perIndividualQC
. It does so
without generating a new dataset but simply removes the IDs when calculating
the statistics.
For details on the output data.frame fail_hwe, check the original description on the PLINK output format page: https://www.cog-genomics.org/plink/1.9/formats#hwe.
# NOT RUN {
indir <- system.file("extdata", package="plinkQC")
qcdir <- tempdir()
name <- "data"
path2plink <- '/path/to/plink'
# the following code is not run on package build, as the path2plink on the
# user system is not known.
# }
# NOT RUN {
fail_hwe <- check_hwe(indir=indir, qcdir=qcdir, name=name, interactive=FALSE,
verbose=TRUE, path2plink=path2plink)
# }
Run the code above in your browser using DataLab