Learn R Programming

SeqVarTools (version 1.10.0)

alternateAlleleDetection: alternateAlleleDetection

Description

Calculate rates of detecting minor alleles given a ``gold standard'' dataset

Usage

"alternateAlleleDetection"(gdsobj, gdsobj2, match.samples.on=c("subject.id", "subject.id"), verbose=TRUE)

Arguments

gdsobj
A SeqVarData object with VCF data.
gdsobj2
A SeqVarData object with VCF data to be used as the ``gold standard''.
match.samples.on
A length-2 character vector indicating the column to be used for matching in each dataset's sampleData annotation
verbose
A logical indicating whether to print progress messages.

Value

A data frame with the following columns:
variant.id.1
variant id from the first dataset
variant.id.2
matched variant id from the second dataset
n.samples
the number of samples with non-missing data for this variant
true.pos
the number of alleles that are true positives for this variant
true.neg
the number of alleles that are true negatives for this variant
false.pos
the number of alleles that are false positives for this variant
false.neg
the number of alleles that are false negatives for this variant

Details

Calculates the accuracy of detecting alternate alleles in one dataset (gdsobj) given a ``gold standard'' dataset (gdsobj2). Samples are matched using the match.samples.on argument. The first element of match.samples.on indicates the column to be used as the subject identifier for the first dataset, and the second element is the column to be used for the second dataset. Variants are matched on position and alleles using bi-allelic SNVs only. Genotype dosages are recoded to count the same allele if the reference allele in one dataset is the alternate allele in the other dataset. If a variant in one dataset matches to multiple variants in the second dataset, then only the first match will be used. If a variant is missing in either dataset for a given sample pair, that sample pair is ignored for that variant. To exclude certain variants or samples from the calculate, use seqSetFilter to set appropriate filters on each gds object. This test is positive if an alternate allele was been detected. Results are returned on an allele level, such that:

TP, TN, FP, and FN are calculated as follows:

genoData2
aa
Ra RR aa
2TP 1TP + 1FP 2FP genoData1
Ra 1TP + 1FN 1TN + 1TP 1TN + 1FP
RR 2FN 1FN + 1TN
2TN
where ``R'' indicates a reference allele and ``a'' indicates an alternate allele.

See Also

SeqVarGDSClass

Examples

Run this code
## Not run: 
# gds1 <- seqOpen(gdsfile.1) # dataset to test, e.g. sequencing
# sample1 <- data.frame(subject.id=c("a", "b", "c"), sample.id=c("A", "B", "C"), stringsAsFactors=F)
# seqData1 <- SeqVarData(gds1, sampleData=sample1)
# 
# gds2 <- seqOpen(gdsfile.2) # gold standard dataset, e.g. array genotyping
# sample2 <- data.frame(subject.id=c("b", "c", "d"), sample.id=c("B", "C", "D"), stringsAsFactors=F)
# seqData2 <- SeqVarData(gds2, sampleData=sample2)
# 
# res <- alleleDetectionAccuracy(seqData1, seqData2)
# ## End(Not run)

Run the code above in your browser using DataLab