import
and export
load and save
objects from and to particular file formats. The rtracklayer package
implements support for a number of annotation and sequence formats.
export(object, con, format, ...)
import(con, format, text, ...)
RTLFile
derivative, the data is loaded from or saved to the underlying
resource. If missing, the function will return the output as a
character vector, rather than writing to a connection.
con
is
a filename, the format is derived from the file extension. This
argument is unnecessary when con
is a derivative of
RTLFile
.
con
is missing, this can be a character vector
directly providing the string data to import. con
is missing, a character vector containing the string
output. Otherwise, nothing is returned.
RTLFile
. Below,
we list the major supported formats, with some advice for when a
particular file format is appropriate:
export.ucsc(subformat =
"gff1")
. The BED format is typically preferred over GFF for
interaction with UCSC. GFF files can be indexed with the tabix
utility for fast range-based queries via rtracklayer and
Rsamtools.
BigWig
is
preferred.
bedGraph
. For large
data, consider BigWig
.
bedGraph
and WIG
(which are
now somewhat obsolete). A BigWig file contains a spatial index for
fast range-based queries and also embeds summary statistics of the
scores at several zoom levels. Thus, it is ideal for visualization
of and parallel computing on genome-scale vectors, like the
coverage from a high-throughput sequencing experiment.
In summary, for the typical use case of combining gene models with
experimental data, GFF is preferred for gene models and
BigWig
is preferred for quantitative score vectors. Note that
the Rsamtools package provides support for the
BAM
file format (for representing
read alignments), among others. Based on this, the rtracklayer package
provides an export
method for writing GAlignments
and GappedReads
objects as BAM
. For variants, consider
VCF, supported by the VariantAnnotation package.
There is also support for reading and writing biological sequences,
including the UCSC TwoBit
format for
compactly storing a genome sequence along with a mask. The files are
binary, so they are efficiently queried for particular ranges. A
similar format is FA
, supported by
Rsamtools.
track <- import(system.file("tests", "v1.gff", package = "rtracklayer"))
## Not run: export(track, "my.gff", version = "3")
## equivalently,
## Not run: export(track, "my.gff3")
## or
## Not run:
# con <- file("my.gff3")
# export(track, con, "gff3")
# close(con)
# ## End(Not run)
## or as a string
export(track, format = "gff3")
Run the code above in your browser using DataLab