Learn R Programming

plinkQC (version 0.2.2)

check_maf: Identification of SNPs with low minor allele frequency

Description

Runs and evaluates results from plink --freq. It calculates the minor allele frequencies for all variants in the individuals that passed the perIndividualQC. The minor allele frequency distributions is plotted as a histogram.

Usage

check_maf(indir, name, qcdir = indir, macTh = 20, mafTh = NULL,
  verbose = FALSE, interactive = FALSE, path2plink = NULL,
  showPlinkOutput = TRUE)

Arguments

indir

[character] /path/to/directory containing the basic PLINK data files name.bim, name.bed, name.fam files.

name

[character] Prefix of PLINK files, i.e. name.bed, name.bim, name.fam.

qcdir

[character] /path/to/directory where results will be written to. If perIndividualQC was conducted, this directory should be the same as qcdir specified in perIndividualQC, i.e. it contains name.fail.IDs with IIDs of individuals that failed QC. User needs writing permission to qcdir. Per default, qcdir=indir.

macTh

[double] Threshold for minor allele cut cut-off, if both mafTh and macTh are specified, macTh is used (macTh = mafTh\*2\*NrSamples).

mafTh

[double] Threshold for minor allele frequency cut-off.

verbose

[logical] If TRUE, progress info is printed to standard out and specifically, if TRUE, plink log will be displayed.

interactive

[logical] Should plots be shown interactively? When choosing this option, make sure you have X-forwarding/graphical interface available for interactive plotting. Alternatively, set interactive=FALSE and save the returned plot object (p_hwe) via ggplot2::ggsave(p=p_maf, other_arguments) or pdf(outfile) print(p_maf) dev.off().

path2plink

[character] Absolute path to PLINK executable (https://www.cog-genomics.org/plink/1.9/) i.e. plink should be accesible as path2plink -h. The full name of the executable should be specified: for windows OS, this means path/plink.exe, for unix platforms this is path/plink. If not provided, assumed that PATH set-up works and PLINK will be found by exec_wait('plink').

showPlinkOutput

[logical] If TRUE, plink log and error messages are printed to standard out.

Value

Named list with i) fail_maf containing a [data.frame] with CHR (Chromosome code), SNP (Variant identifier), A1 (Allele 1; usually minor), A2 (Allele 2; usually major), MAF (Allele 1 frequency), NCHROBS (Number of allele observations) for all SNPs that failed the mafTh/macTh and ii) p_maf, a ggplot2-object 'containing' the MAF distribution histogram which can be shown by (print(p_maf)).

Details

check_maf uses plink --remove name.fail.IDs --freq to calculate the minor allele frequencies for all variants in the individuals that passed the perIndividualQC. It does so without generating a new dataset but simply removes the IDs when calculating the statistics.

For details on the output data.frame fail_maf, check the original description on the PLINK output format page: https://www.cog-genomics.org/plink/1.9/formats#frq.

Examples

Run this code
# NOT RUN {
indir <- system.file("extdata", package="plinkQC")
qcdir <- tempdir()
name <- "data"
path2plink <- '/path/to/plink'
# the following code is not run on package build, as the path2plink on the
# user system is not known.
# }
# NOT RUN {
fail_maf <- check_maf(indir=indir, qcdir=qcdir, name=name, macTh=15,
interactive=FALSE, verbose=TRUE, path2plink=path2plink)
# }

Run the code above in your browser using DataLab