DataTrack-class: DataTrack class and methods

Description

A class to store numeric data values along genomic coordinates. Multiple samples as well as sample groupings are supported, with the restriction of equal genomic coordinates for a single observation across samples.

Usage

DataTrack(range=NULL, start=NULL, end=NULL, width=NULL, data, chromosome, strand, genome,
name="DataTrack", importFunction, stream=FALSE, ...)

Arguments

range

An optional meta argument to handle the different input types. If the range argument is missing, all the relevant information to create the object has to be provided as individual function arguments (see below).

The different input options for range are:

[object Object],[object Object],[object Object],[object Object]

start, end, width

Integer vectors, giving the start and the end end coordinates for the individual track items, or their width. Two of the three need to be specified, and have to be of equal length or of length one, in which case the single value will be recycled accordingly. Otherwise, the usual R recycling rules for vectors do not apply and the function will cast an error.

data

A numeric matrix of data points with the number of columns equal to the number of coordinates in range, or a numeric vector of appropriate length that will be coerced into such a one-row matrix. Each individual row is supposed to contain data for a given sample, where the coordinates for each single observation are constant across samples. Depending on the plotting type of the data (see 'Details' and 'Display Parameters' sections), sample grouping or data aggregation may be available. Alternatively, this can be a character vector of column names that point into the element metadata of the range object for subsetting. Naturally, this is only supported when the range argument is of class GRanges.

strand

Character vector, the strand information for the individual track items. Currently this has to be unique for the whole track and doesn't really have any visible consequences, but we might decide to make DataTracks strand-specific at a later stage.

chromosome

The chromosome on which the track's genomic ranges are defined. A valid UCSC chromosome identifier if options(ucscChromosomeNames=TRUE). Please note that in this case only syntactic checking takes place, i.e., the argument value needs to be an integer, numeric character or a character of the form chrx, where x may be any possible string. The user has to make sure that the respective chromosome is indeed defined for the the track's genome. If not provided here, the constructor will try to construct the chromosome information based on the available inputs, and as a last resort will fall back to the value chrNA. Please note that by definition all objects in the Gviz package can only have a single active chromosome at a time (although internally the information for more than one chromosome may be present), and the user has to call the chromosome<- replacement method in order to change to a different active chromosome.

genome

The genome on which the track's ranges are defined. Usually this is a valid UCSC genome identifier, however this is not being formally checked at this point. If not provided here the constructor will try to extract this information from the provided input, and eventually will fall back to the default value of NA.

name

Character scalar of the track's name used in the title panel when plotting.

importFunction

A user-defined function to be used to import the data from a file. This only applies when the range argument is a character string with the path to the input data file. The function needs to accept an argument file containing the file path and has to return a proper GRanges object with the data part attached as numeric metadata columns. Essentially the process is equivalent to constructing a DataTrack directly from a GRanges object in that non-numeric columns will be dropped, and further subsetting can be archived by means of the data argument. A set of default import functions is already implemented in the package for a number of different file types, and one of these defaults will be picked automatically based on the extension of the input file name. If the extension can not be mapped to any of the existing import function, an error is raised asking for a user-defined import function. Currently the following file types can be imported with the default functions: wig, bigWig/bw, bedGraph and bam.

Some file types support indexing by genomic coordinates (e.g., bigWig and bam), and it makes sense to only load the part of the file that is needed for plotting. To this end, the Gviz package defines the derived ReferenceDataTrack class, which supports streaming data from the file system. The user typically does not have to deal with this distinction but may rely on the constructor function to make the right choice as long as the default import functions are used. However, once a user-defined import function has been provided and if this function adds support for indexed files, you will have to make the constructor aware of this fact by setting the stream argument to TRUE. Please note that in this case the import function needs to accept a second mandatory argument selection which is a GRanges object containing the dimensions of the plotted genomic range. As before, the function has to return an appropriate GRanges object.

stream

A logical flag indicating that the user-provided import function can deal with indexed files and knows how to process the additional selection argument when accessing the data on disk. This causes the constructor to return a ReferenceDataTrack object which will grab the necessary data on the fly during each plotting operation.

...

Additional items which will all be interpreted as further display parameters.

Value

The return value of the constructor function is a new object of class DataTrack or ReferenceDataTrack.

Objects from the class

Objects can be created using the constructor function DataTrack.

Extends

Class "NumericTrack", directly.

Class "RangeTrack", by class "NumericTrack", distance 2.

Class "GdObject", by class "NumericTrack", distance 3.

Details

Depending on the setting of the type display parameter, the data can be plotted in various different forms as well as combinations thereof. Supported plotting types are:

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

For some of the above plotting-types the groups display parameter can be used to indicate sample sub-groupings. Its value is supposed to be a factor vector of similar length as the number of samples. In most cases, the groups are shown in different plotting colors and data aggregation operations are done in a stratified fashion.

The window display parameter can be used to aggregate the data prior to plotting. Its value is taken as the number of equal-sized windows along the genomic coordinates of the track for which to compute average values. The special value auto can be used to automatically determine a reasonable number of windows which can be particularly useful when plotting very large genomic regions with many data points.

The aggregation parameter can be set to define the aggregation function to be used when averaging in windows or across collapsed items. It takes the form of either a function which should condense a numeric vector into a single number, or one of the predefined options as character scalars "mean", "median" or "sum" for mean, median or summation, respectively. Defaults to computing mean values for each sample. Note that the predefined options can be much faster because they are optimized to work on large numeric tables.

Examples

Run this code

## Object construction:

## An empty object
DataTrack()

## from individual arguments
dat <- matrix(runif(400), nrow=4)
dtTrack <- DataTrack(start=seq(1,1000, len=100), width=10, data=dat,
chromosome=1, genome="mm9", name="random data")

## from GRanges
library(GenomicRanges)
gr <- GRanges(seqnames="chr1", ranges=IRanges(seq(1,1000, len=100),
width=10))
values(gr) <- t(dat)
dtTrack <- DataTrack(range=gr, genome="mm9", name="random data")

## from IRanges
dtTrack <- DataTrack(range=ranges(gr), data=dat, genome="mm9",
name="random data", chromosome=1)

## from a data.frame
df <- as.data.frame(gr)
colnames(df)[1] <- "chromosome"
dtTrack <- DataTrack(range=df,  genome="mm9", name="random data")

## For some annoying reason the postscript device does not know about
## the sans font
if(!interactive())
{
font <- ps.options()$family
displayPars(dtTrack) <- list(fontfamily=font, fontfamily.title=font)
}

## Plotting
plotTracks(dtTrack)

## Track names
names(dtTrack)
names(dtTrack) <- "foo"
plotTracks(dtTrack)

## Subsetting and splitting
subTrack <- subset(dtTrack, from=100, to=300)
length(subTrack)
subTrack[1:2,]
subTrack[,1:2]
split(dtTrack, rep(1:2, each=50))

## Accessors
start(dtTrack)
end(dtTrack)
width(dtTrack)
position(dtTrack)
width(subTrack) <- width(subTrack)-5

strand(dtTrack)
strand(subTrack) <- "-"

chromosome(dtTrack)
chromosome(subTrack) <- "chrX"

genome(dtTrack)
genome(subTrack) <- "mm9"

range(dtTrack)
ranges(dtTrack)

## Data
values(dtTrack)
score(dtTrack)

## coercion
as(dtTrack, "data.frame")

Run the code above in your browser using DataLab