GDS -- Genomic Data Structures used for storing genetic array-oriented
data, and the file format defined in the gdsfmt package. VCF -- The Variant Call Format (VCF), which is a generic format for
storing DNA polymorphism data such as SNPs, insertions, deletions and
structural variants, together with rich annotations.
If there are more than one files in vcf.fn
, seqVCF2GDS
will
merge all VCF files together if they contain the same samples. It is useful
to merge multiple VCF files if data are divided by chromosomes.
The real numbers in the VCF file(s) are stored in 32-bit floating-point
format by default. Users can set
storage.option=seqStorageOption(float.mode="float64")
to switch to 64-bit floating point format. Or packed real numbers can be
adopted by setting
storage.option=seqStorageOption(float.mode="packedreal16:scale=0.0001")
.
By default, the compression method is "ZIP_RA" (zlib algorithm with default
compression level + independent data blocks). Users can maximize the
compression ratio by storage.option="ZIP_RA.max"
or
storage.option=seqStorageOption("ZIP_RA.max")
.
LZ4 (http://cyan4973.github.io/lz4/) is an option via
storage.option="LZ4_RA"
or
storage.option=seqStorageOption("LZ4_RA")
.
LZMA (xz, http://tukaani.org/xz/) is another option via
storage.option="LZMA_RA"
or
storage.option=seqStorageOption("LZMA_RA")
, and it is known to have
higher compression ratio than zlib.