Learn R Programming

gdsfmt: R Interface to CoreArray Genomic Data Structure (GDS) files

GNU Lesser General Public License, LGPL-3

Features

This package provides a high-level R interface to CoreArray Genomic Data Structure (GDS) data files, which are portable across platforms with hierarchical structure to store multiple scalable array-oriented data sets with metadata information. It is suited for large-scale datasets, especially for data which are much larger than the available random-access memory. The gdsfmt package offers the efficient operations specifically designed for integers of less than 8 bits, since a single genetic/genomic variant, like single-nucleotide polymorphism, usually occupies fewer bits than a byte. Data compression and decompression are available with relatively efficient random access. It is also allowed to read a GDS file in parallel with multiple R processes supported by the parallel package.

Bioconductor:

Release Version: v1.8.3

http://www.bioconductor.org/packages/release/bioc/html/gdsfmt.html

Help Documents

Development Version: v1.9.3

http://www.bioconductor.org/packages/devel/bioc/html/gdsfmt.html

Help Documents

Package Vignettes

http://corearray.sourceforge.net/tutorials/gdsfmt/

http://www.bioconductor.org/packages/devel/bioc/vignettes/gdsfmt/inst/doc/gdsfmt_vignette.html

Citation

Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS (2012). A High-performance Computing Toolset for Relatedness and Principal Component Analysis of SNP Data. Bioinformatics. DOI: 10.1093/bioinformatics/bts606.

Package Maintainer

Dr. Xiuwen Zheng (zhengx@u.washington.edu)

URL

http://github.com/zhengxwen/gdsfmt

http://www.bioconductor.org/packages/gdsfmt

Installation

  • Bioconductor repository:
source("http://bioconductor.org/biocLite.R")
biocLite("gdsfmt")
  • Development version from Github:
library("devtools")
install_github("zhengxwen/gdsfmt")

The install_github() approach requires that you build from source, i.e. make and compilers must be installed on your system -- see the R FAQ for your operating system; you may also need to install dependencies manually.

Copyright Notice

  • CoreArray C++ library, LGPL-3 License, 2007-2016, Xiuwen Zheng
  • zlib, zlib License, 1995-2016, Jean-loup Gailly and Mark Adler
  • LZ4, BSD 2-clause License, 2011-2016, Yann Collet
  • liblzma, public domain, 2005-2016, Lasse Collin and other xz contributors
  • README

GDS Command-line Tools

In the R environment,

install.packages("getopt", repos="http://cran.r-project.org")
install.packages("optparse", repos="http://cran.r-project.org")
install.packages("crayon", repos="http://cran.r-project.org")

source("http://bioconductor.org/biocLite.R")
biocLite("gdsfmt")

See More...

viewgds

viewgds is a shell script written in R (viewgds.R), to view the contents of a GDS file. The R packages gdsfmt, getopt and optparse should be installed before running viewgds, and the package crayon is optional.

Usage: viewgds [options] file

Installation with command line,

echo '#!' `which Rscript` '--vanilla' > viewgds
curl -L https://raw.githubusercontent.com/zhengxwen/Documents/master/Program/viewgds.R >> viewgds
chmod +x viewgds

## Or
echo '#!' `which Rscript` '--vanilla' > viewgds
wget -qO- --no-check-certificate https://raw.githubusercontent.com/zhengxwen/Documents/master/Program/viewgds.R >> viewgds
chmod +x viewgds

diffgds

diffgds is a shell script written in R (diffgds.R), to compare two files GDS files. The R packages gdsfmt, getopt and optparse should be installed before running diffgds.

Usage: diffgds [options] file1 file2

Installation with command line,

echo '#!' `which Rscript` '--vanilla' > diffgds
curl -L https://raw.githubusercontent.com/zhengxwen/Documents/master/Program/diffgds.R >> diffgds
chmod +x diffgds

## Or
echo '#!' `which Rscript` '--vanilla' > diffgds
wget -qO- --no-check-certificate https://raw.githubusercontent.com/zhengxwen/Documents/master/Program/diffgds.R >> diffgds
chmod +x diffgds

Examples

library(gdsfmt)

# create a GDS file
f <- createfn.gds("test.gds")

add.gdsn(f, "int", val=1:10000)
add.gdsn(f, "double", val=seq(1, 1000, 0.4))
add.gdsn(f, "character", val=c("int", "double", "logical", "factor"))
add.gdsn(f, "logical", val=rep(c(TRUE, FALSE, NA), 50))
add.gdsn(f, "factor", val=as.factor(c(NA, "AA", "CC")))
add.gdsn(f, "bit2", val=sample(0:3, 1000, replace=TRUE), storage="bit2")

# list and data.frame
add.gdsn(f, "list", val=list(X=1:10, Y=seq(1, 10, 0.25)))
add.gdsn(f, "data.frame", val=data.frame(X=1:19, Y=seq(1, 10, 0.5)))

folder <- addfolder.gdsn(f, "folder")
add.gdsn(folder, "int", val=1:1000)
add.gdsn(folder, "double", val=seq(1, 100, 0.4))

# show the contents
f

# close the GDS file
closefn.gds(f)
File: test.gds (1.1K)
+    [  ]
|--+ int   { Int32 10000, 39.1K }
|--+ double   { Float64 2498, 19.5K }
|--+ character   { Str8 4, 26B }
|--+ logical   { Int32,logical 150, 600B } *
|--+ factor   { Int32,factor 3, 12B } *
|--+ bit2   { Bit2 1000, 250B }
|--+ list   [ list ] *
|  |--+ X   { Int32 10, 40B }
|  \--+ Y   { Float64 37, 296B }
|--+ data.frame   [ data.frame ] *
|  |--+ X   { Int32 19, 76B }
|  \--+ Y   { Float64 19, 152B }
\--+ folder   [  ]
   |--+ int   { Int32 1000, 3.9K }
   \--+ double   { Float64 248, 1.9K }

Copy Link

Version

Monthly Downloads

264

Version

1.8.3

License

LGPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Last Published

February 15th, 2017

Functions in gdsfmt (1.8.3)

compression.gdsn

Modify compression mode
cnt.gdsn

Return the number of child nodes
sync.gds

Synchronize a GDS file
system.gds

Get the parameters in the GDS system
getfile.gdsn

Output a file from a stream container
getfolder.gdsn

Get the folder
readex.gdsn

Read data field of a GDS node with a selection
readmode.gdsn

Switch to read mode in the compression settings
cleanup.gds

Clean up fragments
cache.gdsn

Caching variable data
delete.attr.gdsn

Delete attribute(s)
delete.gdsn

Delete a GDS node
moveto.gdsn

Relocate a GDS node
name.gdsn

Return the variable name of a node
print.gds.class

Show the information of class "gds.class" and "gdsn.class"
permdim.gdsn

Array Transposition
closefn.gds

Close a GDS file
clusterApply.gdsn

Apply functions over matrix margins in parallel
lasterr.gds

Return the last error message
ls.gdsn

Return the names of child nodes
apply.gdsn

Apply functions over margins
diagnosis.gds

Diagnose the GDS file
assign.gdsn

Assign/append data to a GDS node
addfolder.gdsn

Add a folder to the GDS node
append.gdsn

Append data to a specified variable
copyto.gdsn

Copy GDS nodes
rename.gdsn

Rename a GDS node
createfn.gds

Create a GDS file
setdim.gdsn

Set the dimension of data field
add.gdsn

Add a new GDS node
addfile.gdsn

Add a GDS node with a file
gds.class

the class of GDS file
gdsfmt-package

R Interface to CoreArray Genomic Data Structure (GDS) files
index.gdsn

Return the specified node
is.element.gdsn

whether the elements are in a set
read.gdsn

Read data field of a GDS node
put.attr.gdsn

Add an attribute into a GDS node
showfile.gds

Enumerate opened GDS files
summarize.gdsn

GDS object Summaries
digest.gdsn

create hash function digests
gdsn.class

the class of variable node in the GDS file
get.attr.gdsn

Get attributes
objdesp.gdsn

Variable description
openfn.gds

Open a GDS file
write.gdsn

Write data to a GDS node