Learn R Programming

SeqArray (version 1.12.5)

seqMerge: Merge Multiple SeqArray GDS Files

Description

Merges multiple SeqArray GDS files.

Usage

seqMerge(gds.fn, out.fn, storage.option="ZIP_RA", info.var=NULL, fmt.var=NULL, samp.var=NULL, optimize=TRUE, digest=TRUE, verbose=TRUE)

Arguments

gds.fn
the file names of multiple GDS files
out.fn
the output file name
storage.option
specify the storage and compression options, by default seqStorageOption("ZIP_RA"); or "LZMA_RA" to use LZMZ compression algorithm with higher compression ratio
info.var
characters, the variable name(s) in the INFO field; NULL for all variables, or character() excludes all INFO variables
fmt.var
characters, the variable name(s) in the FORMAT field; NULL for all variables, or character() excludes all FORMAT variables
samp.var
characters, the variable name(s) in 'sample.annotation'; or NULL for all variables
optimize
if TRUE, optimize the access efficiency by calling cleanup.gds
digest
a logical value (TRUE/FALSE) or a character ("md5", "sha1", "sha256", "sha384" or "sha512"); add hash codes to the GDS file if TRUE or a digest algorithm is specified
verbose
if TRUE, show information

Value

Return the file name of GDS format with an absolute path.

Details

The function merges multiple SeqArray GDS files. Users can specify the compression method and level for the new GDS file. If gds.fn contains one file, users can change the storage type to create a new file.

See Also

seqVCF2GDS, seqExport

Examples

Run this code
# the VCF file
vcf.fn <- seqExampleFileName("vcf")

# the number of variants
total.count <- seqVCF_Header(vcf.fn, getnum=TRUE)$num.variant

split.cnt <- 5
start <- integer(split.cnt)
count <- integer(split.cnt)

s <- (total.count+1) / split.cnt
st <- 1L
for (i in 1:split.cnt)
{
    z <- round(s * i)
    start[i] <- st
    count[i] <- z - st
    st <- z
}

fn <- paste0("tmp", 1:split.cnt, ".gds")

# convert to 5 gds files
for (i in 1:split.cnt)
    seqVCF2GDS(vcf.fn, fn[i], start=start[i], count=count[i])

# merge
seqMerge(fn, "tmp.gds")
seqSummary("tmp.gds")


####

vcf.fn <- seqExampleFileName("gds")
file.copy(vcf.fn, "test.gds", overwrite=TRUE)

# modify 'sample.id'
f <- openfn.gds("test.gds", FALSE)
sid <- read.gdsn(index.gdsn(f, "sample.id"))
add.gdsn(f, "sample.id", paste("S", 1:length(sid)), replace=TRUE)
closefn.gds(f)

# merging
seqMerge(c(vcf.fn, "test.gds"), "output.gds")



# delete the temporary files
unlink(c("tmp.gds", "test.gds", "output.gds"), force=TRUE)
unlink(fn, force=TRUE)

Run the code above in your browser using DataLab