UCSCFile-class: UCSCFile objects

Description

These functions support the import and export of tracks emucscded within the UCSC track line metaformat, whereby multiple tracks may be concatenated within a single file, along with metadata mostly oriented towards visualization. Any UCSCData object is automatically exported in this format, if the targeted format is known to be compatible. The BED and WIG import methods check for a track line, and delegate to these functions if one is found. Thus, calling this API directly is only necessary when importing embedded GFF (rare), or when one wants to create the track line during the export process.

Usage

"import"(con, format, text, subformat = "auto", drop = FALSE, genome = NA, ...)
import.ucsc(con, ...)
"export"(object, con, format, ...)
"export"(object, con, format, ...)
"export"(object, con, format, append = FALSE, index = FALSE, ...)
"export"(object, con, format, subformat = "auto", append = FALSE, index = FALSE, ...)
export.ucsc(object, con, ...)

Arguments

con

A path, URL, connection or UCSCFile object. For the functions ending in .ucsc, the file format is indicated by the function name. For the base export and import functions, “ucsc” must be passed as the format argument.

object

The object to export, should be a GRanges or something coercible to a GRanges. For exporting multiple tracks pass a GenomicRangesList, or something coercible to one.

format

If not missing, should be “ucsc”.

text

If con is missing, a character vector to use as the input

subformat

The file format to use for the actual features, between the track lines. Must be a text-based format that is compatible with track lines (most are). If an RTLFile subclass other than UCSCFile is passed as con to import.ucsc or export.ucsc, the subformat is assumed to be the corresponding format of con. Otherwise it defaults to “auto”. The following describes the logic of the “auto” mode. For import, the subformat is taken as the type field in the track line. If none, the file extension is consulted. For export, if object is a UCSCData, the subformat is taken as the type in its track line, if present. Otherwise, the subformat is chosen based on whether object contains a “score” column. If there is a score, the target is either BEDGraph or WIG, depending on the structure of the ranges. Otherwise, BED is the target.

genome

The identifier of a genome, or NA if unknown. Typically, this is a UCSC identifier like “hg19”. An attempt will be made to derive the seqinfo on the return value using either an installed BSgenome package or UCSC, if network access is available. This defaults to the db BED track line parameter, if any.

drop

If TRUE, and there is only one track in the file, return the track object directly, rather than embedding it in a list.

append

If TRUE, and con points to a file path, the data is appended to the file. Obviously, if con is a connection, the data is always appended.

index

If TRUE, automatically compress and index the output file with bgzf and tabix. Note that tabix indexing will sort the data by chromosome and start. Tabix supports a single track in a file.

...

Should either specify track line parameters or arguments to pass down to the import and export routine for the subformat.

Value

A GenomicRangesList unless drop is TRUE and there is only a single track in the file. In that case, the first and only object is extracted from the list and returned. The structure of that object depends on the format of the data. The GenomicRangesList contains UCSCData objects.

UCSCFile objects

The UCSCFile class extends RTLFile and is a formal represention of a resource in the UCSC format. To cast a path, URL or connection to a UCSCFile, pass it to the UCSCFile constructor.

Details

The UCSC track line permits the storage of multiple tracks in a single file by separating them with a so-called “track line”, a line belonging with the word “track” and containing various key=value pairs encoding metadata, most related to visualization. The standard fields in a track depend on the type of track being annotated. See TrackLine and its derivatives for how these lines are represented in R. The class UCSCData is an extension of GRanges with a formal slot for a TrackLine. Each GRanges in the returned GenomicRangesList has the track line stored in its metadata, under the trackLine key.

For each track object to be exported, if the object is not a UCSCData, and there is no trackLine element in the metadata, then a new track line needs to be generated. This happens through the coercion of object to UCSCData. The track line is initialized to have the appropriate type parameter for the subformat, and the required name parameter is taken from the name of the track in the input list (if any). Otherwise, the default is simply “R Track”. The db parameter (specific to BED track lines) is taken as genome(object) if not NA. Additional arguments passed to the export routines override parameters in the provided track line.

If the subformat is either WIG or BEDGraph, and the features are stranded, a separate track will be output in the file for each strand. Neither of those formats encodes the strand and disallow overlapping features (which might occur upon destranding).

References

http://genome.ucsc.edu/goldenPath/help/customTrack.html