"transcripts"(x, columns=c("TXID", "TXNAME"), filter=NULL)
"exons"(x, columns="EXONID", filter=NULL)
"cds"(x, columns="CDSID", filter=NULL)
"genes"(x, columns="GENEID", filter=NULL)
"transcriptsBy"(x, by, columns, use.names=FALSE, outerMcols=FALSE)
"exonsBy"(x, by, columns, use.names=FALSE, outerMcols=FALSE)
"cdsBy"(x, by, columns, use.names=FALSE, outerMcols=FALSE)
"getTxDbIfAvailable"(x, ...)
"asBED"(x)
"asGFF"(x)
"disjointExons"(x, aggregateGenes=FALSE, includeTranscripts=TRUE, ...)
"microRNAs"(x)
"tRNAs"(x)
"promoters"(x, upstream=2000, downstream=200, ...)
"distance"(x, y, ignore.strand=FALSE, ..., id, type=c("gene", "tx", "exon", "cds"))
"extractTranscriptSeqs"(x, transcripts, strand = "+")
"extractUpstreamSeqs"(x, genes, width=1000, exclude.seqlevels=NULL)
"intronsByTranscript"(x, use.names=FALSE)
"fiveUTRsByTranscript"(x, use.names=FALSE)
"threeUTRsByTranscript"(x, use.names=FALSE)
"isActiveSeq"(x)
"gene"
, "exon"
, "cds"
or "tx"
.
Determines the grouping.
columns
method.
NULL
or a named list of vectors to be used to
restrict the output. Valid names for this list are:
"gene_id"
,
"tx_id"
, "tx_name"
,
"tx_chrom"
, "tx_strand"
,
"exon_id"
, "exon_name"
,
"exon_chrom"
, "exon_strand"
,
"cds_id"
, "cds_name"
,
"cds_chrom"
, "cds_strand"
and "exon_rank"
.
use.names
is FALSE
), the
names of this GRangesList object
(aka the group names) are the internal ids of the features
used for grouping (aka the grouping features), which are
guaranteed to be unique.
If use.names
is TRUE
, then the names of the
grouping features are used instead of their internal ids.
For example, when grouping by transcript (by="tx"
),
the default group names are the transcript internal ids
("tx_id"
). But, if use.names=TRUE
, the group
names are the transcript names ("tx_name"
).
Note that, unlike the feature ids, the feature names are not
guaranteed to be unique or even defined (they could be all
NA
s). A warning is issued when this happens.
See ?id2name
for more information about
feature internal ids and feature external names and how
to map the formers to the latters. Finally, use.names=TRUE
cannot be used when grouping
by gene by="gene"
. This is because, unlike for the
other features, the gene ids are external ids (e.g. Entrez
Gene or Ensembl ids) so the db doesn't have a "gene_name"
column for storing alternate gene names.
promoters
: An integer(1)
value indicating the
number of bases upstream from the transcription start site. For
additional details see ?`promoters,GRanges-method`
.
promoters
: An integer(1)
value indicating the
number of bases downstream from the transcription start site. For
additional details see ?`promoters,GRanges-method`
.
disjointExons
: A logical
. When FALSE
(default) exon fragments that overlap multiple genes are dropped.
When TRUE
, all fragments are kept and the gene_id
metadata column includes all gene ids that overlap the exon fragment.
disjointExons
: A logical
. When TRUE
(default) a tx_name
metadata column is included that
lists all transcript names that overlap the exon fragment.
character
vector the same length as x
.
The id
must be identifiers in the MultiDb object.
type
indicates what type of identifier id
is.
character(1)
describing the id
.
Must be one of gene, tx, exon or
cds.
logical
indicating if the strand of the ranges
should be ignored. When TRUE
, strand is set to '+'
.
logical
indicating if the the 'outer' mcols (metadata
columns) should be populated for some range based accesors which
return a GRangesList object. By default this is FALSE, but if TRUE
then the outer list object will also have it's metadata columns
(mcols) populated as well as the mcols for the 'inner' GRanges
objects.
x
is a BSgenome object.
Internally, it's turned into a GRangesList
object with exonsBy(transcripts, by="tx", use.names=TRUE)
.
x
is a DNAString object. Can be an atomic vector, a factor, or an Rle object,
in which case it indicates the strand of each transcript (i.e. all the
exons in a transcript are considered to be on the same strand).
More precisely: it's turned into a factor (or factor-Rle)
that has the "standard strand levels" (this is done by calling the
strand
function on it). Then it's recycled
to the length of RangesList object transcripts
if needed. In the resulting object, the i-th element is interpreted
as the strand of all the exons in the i-th transcript.
strand
can also be a list-like object, in which case it indicates
the strand of each exon, individually. Thus it must have the same
shape as RangesList object transcripts
(i.e. same length plus strand[[i]]
must have the same length
as transcripts[[i]]
for all i
).
strand
can only contain "+"
and/or "-"
values.
"*"
is not allowed.
genes
function on the MultiDb
object internally.
MultiDb
object.
transcripts
method and related methods.
transcriptsBy
method and related methods.
## extracting all transcripts from Homo.sapiens with some extra metadata
library(Homo.sapiens)
cols = c("TXNAME","SYMBOL")
res <- transcripts(Homo.sapiens, columns=cols)
## extracting all transcripts from Homo.sapiens, grouped by gene and
## with extra metadata
res <- transcriptsBy(Homo.sapiens, by="gene", columns=cols)
## list possible values for columns argument:
columns(Homo.sapiens)
## Get the TxDb from an MultiDb object (if it's available)
getTxDbIfAvailable(Homo.sapiens)
## Other functions listed above should work in way similar to their TxDb
## counterparts. So for example:
promoters(Homo.sapiens)
## Should give the same value as:
promoters(getTxDbIfAvailable(Homo.sapiens))
Run the code above in your browser using DataLab