Learn R Programming

qgg (version 1.1.1)

gfilter: Quality control of marker summary statistics

Description

Quality control is a critical step for working with summary statistics (in particular for external). Processing and quality control of GWAS summary statistics includes:

- map marker ids (rsids/cpra (chr, pos, ref, alt)) to LD reference panel data - check effect allele (flip EA, EAF, Effect) - check effect allele frequency - thresholds for MAF and HWE - exclude INDELS, CG/AT and MHC region - remove duplicated marker ids - check which build version - check for concordance between marker effect and LD data

External summary statistics format: marker, chr, pos, effect_allele, non_effect_allele, effect_allele_freq, effect, effect_se, stat, p, n

Internal summary statistics format: rsids, chr, pos, a1, a2, af, b, seb, stat, p, n

Usage

gfilter(
  Glist = NULL,
  excludeMAF = 0.01,
  excludeMISS = 0.05,
  excludeINFO = NULL,
  excludeCGAT = TRUE,
  excludeINDEL = TRUE,
  excludeDUPS = TRUE,
  excludeHWE = 1e-12,
  excludeMHC = FALSE,
  assembly = "GRCh37"
)

Arguments

Glist

list of information about genotype matrix stored on disk

excludeMAF

exclude marker if minor allele frequency (MAF) is below threshold (0.01 is default)

excludeMISS

exclude marker if missingness (MISS) is above threshold (0.05 is default)

excludeINFO

exclude marker if info score (INFO) is below threshold (0.8 is default)

excludeCGAT

exclude marker if alleles are ambigous (CG or AT)

excludeINDEL

exclude marker if it an insertion/deletion

excludeDUPS

exclude marker id if duplicated

excludeHWE

exclude marker if p-value for Hardy Weinberg Equilibrium test is below threshold (0.01 is default)

excludeMHC

exclude marker if located in MHC region

assembly

character name of assembly

Author

Peter Soerensen