Learn R Programming

SeqArray (version 1.12.5)

seqSummary: Summarize a SeqArray GDS File

Description

Gets the summary of SeqArray GDS file.

Usage

seqSummary(gdsfile, varname=NULL, check=c("default", "none", "full"), verbose=TRUE)

Arguments

gdsfile
varname
if NULL, check the whole GDS file; or a character specifying variable name, and return a description of that variable. See details
check
should be one of "default", "none", "full"
verbose
if TRUE, display information

Value

If varname=NULL, the function returns a list:
filename
the file name
version
the version of SeqArray format
reference
genome reference, a character vector (0-length for undefined)
ploidy
the number of sets of chromosomes
num.sample
the total number of samples
num.variant
the total number of variants
allele
allele information, see seqSummary(gdsfile, "allele")
annot_qual
the total number of "annotation/qual" if check="none", or a summary object including min, max, median, mean
filter
filter information, see seqSummary(gdsfile, "annotation/filter")
info
a data.frame of INFO field: ID, Number, Type, Description, Source and Version
format
a data.frame of FORMAT field: ID, Number, Type and Description
sample.annot
a data.frame of sample annotation with ID, Type and Description
--- seqSummary(gdsfile, "genotype", check="none", verbose=FALSE) returns a list with components:
dim
an integer vector: ploidy, # of samples, # of variants
seldim
an integer vector: ploidy, # of selected samples, # of selected variants
--- seqSummary(gdsfile, "allele") returns a data.frame with ID and descriptions (check="none"), or a list with components:
value
a data.frame with ID and Description
table
cross tabulation for the number of alleles per site
--- seqSummary(gdsfile, "$alt") returns a data.frame with ID and Description for describing the alternative alleles.--- seqSummary(gdsfile, "annotation/filter") or seqSummary(gdsfile, "$filter") returns a data.frame with ID and description (check="none"), or a list with components:
value
a data.frame with ID and Description
table
cross tabulation for the variable 'filter'
--- seqSummary(gdsfile, "annotation/info") or seqSummary(gdsfile, "$info") returns a data.frame describing the variables in the folder "annotation/info" with ID, Number, Type, Description, Source and Version.--- seqSummary(gdsfile, "annotation/format") returns a data.frame describing the variables in the folder "annotation/format" with ID, Number, Type and Description.--- seqSummary(gdsfile, "sample.annotation") returns a data.frame describing sample annotation with ID, Type and Description.--- seqSummary(gdsfile, "$reference") returns the genome reference if it is defined (a 0-length character vector if undefined).--- seqSummary(gdsfile, "$contig") returns the contig information, a data.frame including ID.--- seqSummary(gdsfile, "$format") returns a data.frame describing VCF FORMAT header with ID, Number, Type and Description. The first row is used for genotypes.--- seqSummary(gdsfile, "$digest") returns a data.frame with the full names of GDS variables, digest codes and validation (FALSE/TRUE).

Details

If check="default", the function performs regular checking, like variable dimensions. If check="full", it performs more checking, e.g., unique sample id, unique variant id, whether genotypic data are in a valid range or not.

See Also

seqGetData, seqApply

Examples

Run this code
# the GDS file
(gds.fn <- seqExampleFileName("gds"))

seqSummary(gds.fn)

ans <- seqSummary(gds.fn, check="full")
ans

seqSummary(gds.fn, "genotype")
seqSummary(gds.fn, "allele")
seqSummary(gds.fn, "annotation/filter")
seqSummary(gds.fn, "annotation/info")
seqSummary(gds.fn, "annotation/format")
seqSummary(gds.fn, "sample.annotation")

seqSummary(gds.fn, "$reference")
seqSummary(gds.fn, "$contig")
seqSummary(gds.fn, "$filter")
seqSummary(gds.fn, "$alt")
seqSummary(gds.fn, "$info")
seqSummary(gds.fn, "$format")
seqSummary(gds.fn, "$digest")


# open a GDS file
f <- seqOpen(gds.fn)

# get 'sample.id
samp.id <- seqGetData(f, "sample.id")
# get 'variant.id'
variant.id <- seqGetData(f, "variant.id")

# set sample and variant filters
seqSetFilter(f, sample.id=samp.id[c(2,4,6,8,10)])
set.seed(100)
seqSetFilter(f, variant.id=sample(variant.id, 10))

seqSummary(f, "genotype")

# close a GDS file
seqClose(f)

Run the code above in your browser using DataLab