compareSNPs: Summarise genetic data by groups.

Description

This function provides an extensive summary range of your SNP data, allowing you to perform in-depth quality control of your genotyping results, and to explore your data before analysis. Summary measures include allele and genotype frequencies and counts, missingness rate, Hardy Weinberg equilibrium and more in the whole data set or stratified by other variables, such as case-control status. It can also test for differences in missingness between groups.

Usage

compareSNPs(formula, data, subset, na.action = NULL, sep = "", verbose = FALSE, ...)

Value

An object of class 'compareSNPs' which is a data.frame (when no groups are specified on the left of the '~' in the 'formula' argument) or a list of data.frames, otherwise. Each data.frame contains the following fields:

- Ntotal: Total number of samples for which genotyping was attempted

- Ntyped: Number of genotypes called

- Typed.p: Percentage genotyped

- Miss.t: Number of missing genotypes

- Miss.p: Proportion of missing genotypes

- Minor: Minor Allele

- MAF: Minor allele frequency

- A1: Allele 1

- A2: Allele 2

- A1.ct: Count Allele 1

- A2.ct: Count Allele 2

- A1.p: Frequency of Allele 1

- A2.p: Frequency of Allele 2

- Hom1: Allele 1 Homozygote

- Het: Heterozygote

- Hom2: Allele 2 Homozygote

- Hom1.ct: Allele 1 Homozygote count

- Het.ct: Heterozygote Count

- Hom2.ct: Allele 2 Homozygote count

- Hom1.p: Frequency of Allele 1 Homozygote

- Het.p: Heterozygote frequency

- Hom2.p: Frequency of Allele 2 Homozygote

- HWE.p: Hardy-Weinberg equilibrium p-value

Additionaly, when analysis is stratified by groups, the last component consists of a data.frame containing the p-values of missingness comparison among groups.

'print' returns a 'nice' format table for each group with the main results for each SNP (Ntotal, Ntyped, Minor, MAF, A1, A2, HWE.p), and the missingness test when group is considered.

Arguments

formula: an object of class "formula" (or one that can be coerced to that class). The right side of ~ must have the terms in an additive way, and these terms must refer to variables in 'data' must be of character or factor classes whose levels are the genotypes with the alleles written in their levels (e.g. A/A, A/T and T/T). The left side of ~ must contain the name of the grouping variable or can be left blank (in this case, summary data are provided for the whole sample, and no missingness test is performed).
data: an optional data frame, list or environment (or object coercible by 'as.data.frame' to a data frame) containing the variables in the model. If they are not found in 'data', the variables are taken from 'environment(formula)'.
subset: an optional vector specifying a subset of individuals to be used in the computation process (applied to all genetic variables).
na.action: a function which indicates what should happen when the data contain NAs. The default is NULL, and that is equivalent to na.pass, which means no action. Value na.exclude can be useful if it is desired to removed all individuals with some NA in any variable.
sep: character string indicating the separator between alleles (e.g. when using A/A, A/T and T/T genotype codification, 'sep' should be set to '/'. Default value is '' indicating that genotypes are coded as AA, AT and TT.
verbose: logical, print results from HWChisq function. Default value is FALSE.
...: currently ignored.

Author

Gavin Lucas (gavin.lucas<at>cleargenetics.com)

Isaac Subirana (isubirana<at>imim.es)

Examples

Run this code


require(compareGroups) 

# load example data
data(SNPs)

# visualize first rows
head(SNPs)

# select casco and all SNPs
myDat <- SNPs[,c(2,6:40)]

# QC of three SNPs by groups of cases and controls
res<-compareSNPs(casco ~ .-casco, myDat)
res

# QC of three SNPs of the whole data set
res<-compareSNPs( ~ .-casco, myDat)
res