Learn R Programming

SeqVarTools (version 1.10.0)

getGenotype: Get genotype data

Description

Get matrix of genotype values from a GDS object as VCF-style character strings

Usage

"getGenotype"(gdsobj, use.names=TRUE) "getGenotypeAlleles"(gdsobj, use.names=TRUE, sort=FALSE) "refDosage"(gdsobj, use.names=TRUE) "altDosage"(gdsobj, use.names=TRUE) "alleleDosage"(gdsobj, n=0, use.names=TRUE) "alleleDosage"(gdsobj, n, use.names=TRUE)

Arguments

gdsobj
A SeqVarGDSClass object with VCF data.
use.names
A logical indicating whether to assign sample and variant IDs as dimnames of the resulting matrix.
sort
Logical for whether to sort alleles lexographically ("G/T" instead of "T/G").
n
An integer, vector, or list indicating which allele(s) to return dosage for. n=0 is the reference allele, n=1 is the first alternate allele, and so on.

Value

getGenotype and getGenotypeAlleles return a character matrix with dimensions [sample,variant] containing diploid genotypes.getGenotype returns alleles as "0", "1", "2", etc. indicating refernence and alternate alleles.getGenotypeAlleles returns alleles as "A", "C", "G", "T". sort=TRUE sorts lexographically, which may be useful for comparing genotypes with data generated using a different reference sequence.refDosage returns an integer matrix with the dosage of the reference allele: 2 for two copies of the reference allele ("0/0"), 1 for one copy of the reference allele, and 0 for two alternate alleles.altDosage returns an integer matrix with the dosage of any alternate allele: 2 for two alternate alleles ("1/1", "1/2", etc.), 1 for one alternate allele, and 0 for no alternate allele (homozygous reference).alleleDosage with an integer argument returns an integer matrix with the dosage of the specified allele only: 2 for two copies of the allele ("0/0" if n=0, "1/1" if n=1, etc.), 1 for one copy of the specified allele, and 0 for no copies of the allele.alleleDosage with a list argument returns a list of sample x allele matrices with the dosage of each specified allele for each variant.

Details

In getGenotype, genotypes are coded as in the VCF file, where "0/0" is homozygous reference, "0/1" is heterozygous for the first alternate allele, "0/2" is heterozygous for the second alternate allele, etc.

Separators are "/" for unphased and "|" for phased. If sort=TRUE, all returned genotypes will be unphased. Missing genotypes are coded as NA. Only diploid genotypes (the first two alleles at a given site) are returned.

If the argument n toalleleDosage is a single integer, the same allele is counted for all variants. If n is a vector with length=number of variants in the current filter, a different allele is counted for each variant. If n is a list, more than one allele can be counted for each variant. For example, if n[[1]]=c(1,3), genotypes "0/1" and "0/3" will each have a dosage of 1 and genotype "1/3" will have a dosage of 2.

See Also

SeqVarGDSClass, applyMethod, seqGetData, seqSetFilter, alleleFrequency

Examples

Run this code
gds <- seqOpen(seqExampleFileName("gds"))
variant.id <- seqGetData(gds, "variant.id")
sample.id <- seqGetData(gds, "sample.id")
seqSetFilter(gds, variant.id=variant.id[1:5],
             sample.id=sample.id[1:10])
getGenotype(gds)
getGenotypeAlleles(gds)
refDosage(gds)
altDosage(gds)
alleleDosage(gds, n=0)
alleleDosage(gds, n=1)
alleleDosage(gds, n=c(0,1,0,1,0))
alleleDosage(gds, n=list(0,c(0,1),0,c(0,1),1))
seqClose(gds)

Run the code above in your browser using DataLab