Quality control is a fundamental step in GWAS summary statistics analysis. The function is equipped to handle various tasks including mapping marker ids, checking the effect allele and its frequency, determining build versions, and excluding data based on multiple criteria.
qcStat(
Glist = NULL,
stat = NULL,
excludeMAF = 0.01,
excludeMAFDIFF = 0.05,
excludeINFO = 0.8,
excludeCGAT = TRUE,
excludeINDEL = TRUE,
excludeDUPS = TRUE,
excludeMHC = FALSE,
excludeMISS = 0.05,
excludeHWE = 1e-12
)
A data frame with processed and quality-controlled summary statistics.
List containing information about genotype matrix stored on disk.
Data frame of marker summary statistics. It should either follow the "internal" or "external" format.
Numeric. Exclusion threshold for minor allele frequency. Default is 0.01.
Numeric. Threshold for excluding markers based on allele frequency difference. Default is 0.05.
Numeric. Exclusion threshold for info score. Default is 0.8.
Logical. Exclude ambiguous alleles (CG or AT). Default is TRUE.
Logical. Exclude insertion/deletion markers. Default is TRUE.
Logical. Exclude markers with duplicated ids. Default is TRUE.
Logical. Exclude markers located in MHC region. Default is FALSE.
Numeric. Exclusion threshold for sample missingness. Default is 0.05.
Numeric. Exclusion threshold for Hardy Weinberg Equilibrium test p-value. Default is 1e-12.
Peter Soerensen
Performs quality control on GWAS summary statistics, which includes: - Mapping marker ids to LD reference panel data. - Checking effect allele, frequency, and build version. - Excluding based on various criteria like MAF, HWE, INDELS, and more.
The function works with both "internal" and "external" formats of summary statistics. When the summary statistics format is "external", the function maps marker ids based on chr-pos-ref-alt information. It also aligns the effect allele with the LD reference panel and flips effect sizes if necessary. When allele frequencies are not provided, it uses the frequencies from the genotype data.
Required headers for external summary statistics: marker, chr, pos, ea, nea, eaf, b, seb, stat, p, n
Required headers for internal summary statistics: rsids, chr, pos, ea, nea, eaf, b, seb, stat, p, n