
seqinfo(x)
seqinfo(x, new2old=NULL, force=FALSE) <- value
seqnames(x)
seqnames(x) <- value
seqlevels(x)
seqlevels(x, force=FALSE) <- value
sortSeqlevels(x, X.is.sexchrom=NA)
seqlevelsInUse(x)
seqlevels0(x)
seqlengths(x)
seqlengths(x) <- value
isCircular(x)
isCircular(x) <- value
genome(x)
genome(x) <- value
new2old
argument allows the user to rename, drop, add and/or
reorder the "sequence levels" in x
. new2old
can be NULL
or an integer vector with one element
per row in Seqinfo object value
(i.e. new2old
and
value
must have the same length) describing how the "new" sequence
levels should be mapped to the "old" sequence levels, that is, how the
rows in value
should be mapped to the rows in seqinfo(x)
.
The values in new2old
must be >= 1 and <= length(seqinfo(x)).
NA
s are allowed and indicate sequence levels that are being added.
Old sequence levels that are not represented in new2old
will be
dropped, but this will fail if those levels are in use (e.g. if x
is a GRanges object with ranges defined on those
sequence levels) unless force=TRUE
is used (see below).
If new2old=NULL
, then sequence levels can only be added to the
existing ones, that is, value
must have at least as many rows
as seqinfo(x)
(i.e. length(values) >= length(seqinfo(x))
)
and also seqlevels(values)[seq_len(length(seqlevels(x)))]
must be
identical to seqlevels(x)
.
x
where those levels are used (hence
typically reducing the length of x
). Note that if x
is a list-like object (e.g.
GRangesList,
GAlignmentPairs, or
GAlignmentsList), then any list element in
x
where at least one of the sequence levels to drop is used is
fully dropped. In other words, the seqlevels
setter always
keeps or drops full list elements and never tries to change their
content. This guarantees that the geometry of the list elements is
preserved, which is a desirable property when they represent compound
features (e.g. exons grouped by transcript or paired-end reads).
See below for an example.
seqinfo
setter. Either a named or unnamed character vector for the seqlevels
setter.
A vector containing the sequence information to store for the other setters.
NA
, sortSeqlevels
does its best to "guess".
seqinfo
, seqlevelsInUse
,
and seqlevels0
) work on a Seqinfo object.
seqinfo
getter should
return a Seqinfo object.
seqlevels
, seqlengths
, isCircular
,
and genome
getters and setters are provided.
By default, seqlevels(x)
does seqlevels(seqinfo(x))
,
seqlengths(x)
does seqlengths(seqinfo(x))
,
isCircular(x)
does isCircular(seqinfo(x))
,
and genome(x)
does genome(seqinfo(x))
.
So any class with a seqinfo
getter will have all the above
getters work out-of-the-box. If, in addition, the class defines
a seqinfo
setter, then all the corresponding setters will
also work out-of-the-box. Examples of containers that have a seqinfo
getter and setter:
the GRanges and GRangesList
classes in the GenomicRanges package;
the SummarizedExperiment class in the
SummarizedExperiment package;
the GAlignments,
GAlignmentPairs,
and GAlignmentsList classes in the
GenomicAlignments package;
the TxDb class in the
GenomicFeatures package;
the BSgenome class in the
BSgenome package; etc...
The GenomicRanges package defines seqinfo
and
seqinfo<-
methods for these low-level data types:
List
, RangesList
and RangedData
. Those
objects do not have the means to formally store sequence
information. Thus, the wrappers simply store the Seqinfo
object within metadata(x)
. Initially, the metadata
is empty, so there is some effort to generate a reasonable
default Seqinfo
. The names of any List
are
taken as the seqnames
, and the universe
of
RangesList
or RangedData
is taken as the
genome
.
seqlevels
getter and setter.
rankSeqlevels
, on which sortSeqlevels
is
based.
## ---------------------------------------------------------------------
## A. MODIFY THE SEQLEVELS OF AN OBJECT
## ---------------------------------------------------------------------
## Overlap and matching operations between objects require matching
## seqlevels. Often the seqlevels in one must be modified to match
## the other. The seqlevels() function can rename, drop, add and reorder
## seqlevels of an object. Examples below are shown on TxDb
## and GRanges but the approach is the same for all objects that have
## a 'Seqinfo' class.
library(TxDb.Dmelanogaster.UCSC.dm3.ensGene)
txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene
seqlevels(txdb)
## Rename:
seqlevels(txdb) <- sub("chr", "", seqlevels(txdb))
seqlevels(txdb)
seqlevels(txdb) <- paste0("CH", seqlevels(txdb))
seqlevels(txdb)
seqlevels(txdb)[seqlevels(txdb) == "CHM"] <- "M"
seqlevels(txdb)
gr <- GRanges(rep(c("chr2", "chr3", "chrM"), 2), IRanges(1:6, 10))
## Add:
seqlevels(gr) <- c("chr1", seqlevels(gr), "chr4")
seqlevels(gr)
seqlevelsInUse(gr)
## Reorder:
seqlevels(gr) <- rev(seqlevels(gr))
seqlevels(gr)
## Drop all unused seqlevels:
seqlevels(gr) <- seqlevelsInUse(gr)
## Drop some seqlevels in use:
seqlevels(gr, force=TRUE) <- setdiff(seqlevels(gr), "chr3")
gr
## Rename/Add/Reorder:
seqlevels(gr) <- c("chr1", chr2="chr2", chrM="Mitochondrion")
seqlevels(gr)
## ---------------------------------------------------------------------
## B. DROP SEQLEVELS OF A LIST-LIKE OBJECT
## ---------------------------------------------------------------------
grl0 <- GRangesList(GRanges("chr2", IRanges(3:2, 5)),
GRanges("chr5", IRanges(11, 18)),
GRanges(c("chr4", "chr2"), IRanges(7:6, 15)))
grl0
grl1 <- grl0
seqlevels(grl1, force=TRUE) <- c("chr2", "chr5")
grl1 # grl0[[3]] was fully dropped!
## If what is desired is to drop the first range in grl0[[3]] only, or,
## more generally speaking, to drop the ranges within each list element
## that are located on one of the seqlevels to drop, then do:
grl2 <- grl0[seqnames(grl0) %in% c("chr2", "chr5")]
grl2
## Note that the above subsetting doesn't drop any seqlevel:
seqlevels(grl2)
## To drop them (no need to use 'force=TRUE' anymore):
seqlevels(grl2) <- c("chr2", "chr5")
seqlevels(grl2)
## ---------------------------------------------------------------------
## C. SORT SEQLEVELS IN "NATURAL" ORDER
## ---------------------------------------------------------------------
sortSeqlevels(c("11", "Y", "1", "10", "9", "M", "2"))
seqlevels <- c("chrXI", "chrY", "chrI", "chrX", "chrIX", "chrM", "chrII")
sortSeqlevels(seqlevels)
sortSeqlevels(seqlevels, X.is.sexchrom=TRUE)
sortSeqlevels(seqlevels, X.is.sexchrom=FALSE)
seqlevels <- c("chr2RHet", "chr4", "chrUextra", "chrYHet",
"chrM", "chrXHet", "chr2LHet", "chrU",
"chr3L", "chr3R", "chr2R", "chrX")
sortSeqlevels(seqlevels)
gr <- GRanges()
seqlevels(gr) <- seqlevels
sortSeqlevels(gr)
## ---------------------------------------------------------------------
## D. SUBSET OBJECTS BY SEQLEVELS
## ---------------------------------------------------------------------
tx <- transcripts(txdb)
seqlevels(tx)
## Drop 'M', keep all others.
seqlevels(tx, force=TRUE) <- seqlevels(tx)[seqlevels(tx) != "M"]
seqlevels(tx)
## Drop all except 'ch3L' and 'ch3R'.
seqlevels(tx, force=TRUE) <- c("ch3L", "ch3R")
seqlevels(tx)
## ---------------------------------------------------------------------
## E. RESTORE ORIGINAL SEQLEVELS OF A TxDb OBJECT
## ---------------------------------------------------------------------
## Applicable to TxDb objects only.
## Not run:
# seqlevels(txdb) <- seqlevels0(txdb)
# seqlevels(txdb)
# ## End(Not run)
## ---------------------------------------------------------------------
## F. FINDING METHODS
## ---------------------------------------------------------------------
showMethods("seqinfo")
showMethods("seqinfo<-")
showMethods("seqnames")
showMethods("seqnames<-")
showMethods("seqlevels")
showMethods("seqlevels<-")
if (interactive()) {
library(GenomicRanges)
?`GRanges-class`
}
Run the code above in your browser using DataLab