UCSCData
object is
automatically exported in this format, if the targeted format is known
to be compatible. The BED and WIG import methods check for a track
line, and delegate to these functions if one is found. Thus, calling
this API directly is only necessary when importing embedded GFF
(rare), or when one wants to create the track line during the export
process.
"import"(con, format, text, subformat = "auto", drop = FALSE, genome = NA, ...)
import.ucsc(con, ...)
"export"(object, con, format, ...)
"export"(object, con, format, ...)
"export"(object, con, format, append = FALSE, index = FALSE, ...)
"export"(object, con, format, subformat = "auto", append = FALSE, index = FALSE, ...)
export.ucsc(object, con, ...)
UCSCFile
object. For the
functions ending in .ucsc
, the file format is indicated by
the function name. For the base export
and import
functions, ucsc must be passed as the format
argument.
GRanges
or
something coercible to a GRanges
. For exporting multiple
tracks pass a GenomicRangesList
, or something coercible to one.
con
is missing, a character vector to use as the
input
RTLFile
subclass other than UCSCFile
is passed as con
to
import.ucsc
or export.ucsc
, the subformat is assumed
to be the corresponding format of con
. Otherwise it defaults
to auto. The following describes the logic of the
auto mode. For import, the subformat is taken as
the type
field in the track line. If none, the file
extension is consulted. For export, if object
is a
UCSCData
, the subformat is taken as the type
in its track line, if present. Otherwise, the subformat is chosen
based on whether object
contains a score column. If
there is a score, the target is either BEDGraph
or
WIG
, depending on the structure of the ranges. Otherwise,
BED
is the target.
NA
if
unknown. Typically, this is a UCSC identifier like hg19. An
attempt will be made to derive the seqinfo
on the return
value using either an installed BSgenome package or UCSC, if network
access is available. This defaults to the db
BED track line
parameter, if any.
TRUE
, and there is only one track in the file,
return the track object directly, rather than embedding it in a list.
TRUE
, and con
points to a file path,
the data is appended to the file. Obviously, if con
is a
connection, the data is always appended.
TRUE
, automatically compress and index the
output file with bgzf and tabix. Note that tabix indexing will
sort the data by chromosome and start. Tabix supports a
single track in a file.
GenomicRangesList
unless drop
is TRUE
and there is only a single track in the file. In that case, the first and
only object is extracted from the list and returned.
The structure of that object depends on the format of the
data. The GenomicRangesList
contains UCSCData
objects.
UCSCFile
class extends RTLFile
and is a
formal represention of a resource in the UCSC format.
To cast a path, URL or connection to a UCSCFile
, pass it to
the UCSCFile
constructor.key=value
pairs encoding metadata, most related to
visualization. The standard fields in a track depend on the type of
track being annotated. See TrackLine
and its
derivatives for how these lines are represented in R. The
class UCSCData
is an extension
of GRanges
with a formal slot for a TrackLine
.
Each GRanges
in the returned GenomicRangesList
has the
track line stored in its metadata, under the trackLine
key. For each track object to be exported, if the object is not a
UCSCData
, and there is no trackLine
element in the
metadata, then a new track line needs to be generated. This happens
through the coercion of object
to UCSCData
. The track line
is initialized to have the appropriate type
parameter for the
subformat, and the required name
parameter is taken from the
name of the track in the input list (if any). Otherwise, the default
is simply R Track. The db
parameter (specific to BED
track lines) is taken as genome(object)
if not
NA
. Additional arguments passed to the export routines
override parameters in the provided track line.
If the subformat is either WIG or BEDGraph, and the features are stranded, a separate track will be output in the file for each strand. Neither of those formats encodes the strand and disallow overlapping features (which might occur upon destranding).