Learn R Programming

GWASTools (version 1.18.0)

batchTest: Batch Effects of Genotyping

Description

batchChisqTest calculates Chi-square values for batches from 2-by-2 tables of SNPs, comparing each batch with the other batches. batchFisherTest calculates Fisher's exact test values.

Usage

batchChisqTest(genoData, batchVar, snp.include = NULL, chrom.include = 1:22, sex.include = c("M", "F"), scan.exclude = NULL, return.by.snp = FALSE, correct = TRUE, verbose = TRUE)
batchFisherTest(genoData, batchVar, snp.include = NULL, chrom.include = 1:22, sex.include = c("M", "F"), scan.exclude = NULL, return.by.snp = FALSE, conf.int = FALSE, verbose = TRUE)

Arguments

genoData
batchVar
A character string indicating which annotation variable should be used as the batch.
snp.include
A vector containing the IDs of SNPs to include.
chrom.include
Integer vector with codes for chromosomes to include. Ignored if snp.include is not NULL. Default is 1:22 (autosomes). Use 23, 24, 25, 26, 27 for X, XY, Y, M, Unmapped respectively
sex.include
Character vector with sex to include. Default is c("M", "F"). If sex chromosomes are present in chrom.include, only one sex is allowed.
scan.exclude
A vector containing the IDs of scans to be excluded.
return.by.snp
Logical value to indicate whether snp-by-batch matrices should be returned.
conf.int
Logical value to indicate if a confidence interval should be computed.
correct
Logical value to specify whether to apply the Yates continuity correction.
verbose
Logical value specifying whether to show progress information.

Value

batchChisqTest returns a list with the following elements:
mean.chisq
a vector of mean chi-squared values for each batch.
lambda
a vector of genomic inflation factor computed as median(chisq) / 0.456 for each batch.
chisq
a matrix of chi-squared values with SNPs as rows and batches as columns. Only returned if return.by.snp=TRUE.
batchFisherTest returns a list with the following elements:
mean.or
a vector of mean odds-ratio values for each batch. mean.or is computed as 1/mean(pmin(or, 1/or)) since the odds ratio is >1 when the batch has a higher allele frequency than the other batches and <1 for="" the="" reverse.<="" dd="">
lambda
a vector of genomic inflation factor computed as median(-2*log(pval) / 1.39 for each batch.
Each of the following is a matrix with SNPs as rows and batches as columns, and is only returned if return.by.snp=TRUE:
pval
P value
oddsratio
Odds ratio
confint.low
Low value of the confidence interval for the odds ratio. Only returned if conf.int=TRUE.
confint.high
High value of the confidence interval for the odds ratio. Only returned if conf.int=TRUE.
batchChisqTest and batchFisherTest both also return the following if return.by.snp=TRUE:
allele.counts
matrix with total number of A and B alleles over all batches.
min.exp.freq
matrix of minimum expected allele frequency with SNPs as rows and batches as columns.

Details

Because of potential batch effects due to sample processing and genotype calling, batches are an important experimental design factor. batchChisqTest calculates the Chi square values from 2-by-2 table for each SNP, comparing each batch with the other batches. batchFisherTest calculates Fisher's Exact Test from 2-by-2 table for each SNP, comparing each batch with the other batches. For each SNP and each batch, batch effect is evaluated by a 2-by-2 table: # of A alleles, and # of B alleles in the batch, versus # of A alleles, and # of B alleles in the other batches. Monomorphic SNPs are set to NA for all batches. The default behavior is to combine allele frequencies from males and females and return results for autosomes only. If results for sex chromosomes (X or Y) are desired, use chrom.include with values 23 and/or 25 and sex.include="M" or "F".

If there are only two batches, the calculation is only performed once and the values for each batch will be identical.

See Also

GenotypeData, chisq.test, fisher.test

Examples

Run this code
library(GWASdata)
file <- system.file("extdata", "illumina_geno.gds", package="GWASdata")
gds <- GdsGenotypeReader(file)
data(illuminaScanADF)
genoData <-  GenotypeData(gds, scanAnnot=illuminaScanADF)

# autosomes only, sexes combined (default)
res.chisq <- batchChisqTest(genoData, batchVar="plate")
res.chisq$mean.chisq
res.chisq$lambda

# X chromosome for females
res.chisq <- batchChisqTest(genoData, batchVar="status",
  chrom.include=23, sex.include="F", return.by.snp=TRUE)
head(res.chisq$chisq)

# Fisher exact test of "status" on X chromosome for females
res.fisher <- batchFisherTest(genoData, batchVar="status",
  chrom.include=23, sex.include="F", return.by.snp=TRUE)
qqPlot(res.fisher$pval)

close(genoData)

Run the code above in your browser using DataLab