makeTxDb
is a low-level constructor for making
a TxDb object from user supplied transcript annotations.
See ?makeTxDbFromUCSC
and
?makeTxDbFromBiomart
for higher-level
functions that feed data from the UCSC or BioMart sources
to makeTxDb
.
makeTxDb(transcripts, splicings, genes=NULL, chrominfo=NULL, metadata=NULL, reassign.ids=FALSE)
"name"
and "value"
and their type must be character.reassign.ids
is FALSE
and if the ids are supplied, then
they are used as the internal ids, otherwise the internal ids are assigned
in a way that is compatible with the order defined by ordering the
features first by chromosome, then by strand, then by start, and finally
by end.transcripts
(required), splicings
(required)
and genes
(optional) arguments must be data frames that
describe a set of transcripts and the genomic features related
to them (exons, cds and genes at the moment).
The chrominfo
(optional) argument must be a data frame
containing chromosome information like the length of each chromosome. transcripts
must have 1 row per transcript and the following
columns:
tx_id
: Transcript ID. Integer vector. No NAs. No duplicates.
tx_name
: [optional] Transcript name. Character vector (or
factor). NAs and/or duplicates are ok.
tx_type
: [optional] Transcript type (e.g. mRNA, ncRNA, snoRNA,
etc...). Character vector (or factor). NAs and/or duplicates are ok.
tx_chrom
: Transcript chromosome. Character vector (or factor)
with no NAs.
tx_strand
: Transcript strand. Character vector (or factor)
with no NAs where each element is either "+"
or "-"
.
tx_start
, tx_end
: Transcript start and end.
Integer vectors with no NAs.
Other columns, if any, are ignored (with a warning).
splicings
must have N rows per transcript, where N is the nb
of exons in the transcript. Each row describes an exon plus, optionally,
the cds contained in this exon. Its columns must be:
tx_id
: Foreign key that links each row in the splicings
data frame to a unique row in the transcripts
data frame.
Note that more than 1 row in splicings
can be linked to the
same row in transcripts
(many-to-one relationship).
Same type as transcripts$tx_id
(integer vector). No NAs.
All the values in this column must be present in
transcripts$tx_id
.
exon_rank
: The rank of the exon in the transcript.
Integer vector with no NAs. (tx_id
, exon_rank
)
pairs must be unique.
exon_id
: [optional] Exon ID.
Integer vector with no NAs.
exon_name
: [optional] Exon name. Character vector (or factor).
NAs and/or duplicates are ok.
exon_chrom
: [optional] Exon chromosome.
Character vector (or factor) with no NAs.
If missing then transcripts$tx_chrom
is used.
If present then exon_strand
must also be present.
exon_strand
: [optional] Exon strand.
Character vector (or factor) with no NAs.
If missing then transcripts$tx_strand
is used
and exon_chrom
must also be missing.
exon_start
, exon_end
: Exon start and end.
Integer vectors with no NAs.
cds_id
: [optional] cds ID. Integer vector.
If present then cds_start
and cds_end
must also
be present.
NAs are allowed and must match NAs in cds_start
and cds_end
.
cds_name
: [optional] cds name. Character vector (or factor).
If present then cds_start
and cds_end
must also be
present. NAs and/or duplicates are ok. Must be NA if corresponding
cds_start
and cds_end
are NAs.
cds_start
, cds_end
: [optional] cds start and end.
Integer vectors.
If one of the 2 columns is missing then all cds_*
columns
must be missing.
NAs are allowed and must occur at the same positions in
cds_start
and cds_end
.
Other columns, if any, are ignored (with a warning).
genes
must have N rows per transcript, where N is the nb
of genes linked to the transcript (N will be 1 most of the time).
Its columns must be:
tx_id
: [optional] genes
must have either a
tx_id
or a tx_name
column but not both.
Like splicings$tx_id
, this is a foreign key that
links each row in the genes
data frame to a unique
row in the transcripts
data frame.
tx_name
: [optional]
Can be used as an alternative to the genes$tx_id
foreign key.
gene_id
: Gene ID. Character vector (or factor). No NAs.
Other columns, if any, are ignored (with a warning).
chrominfo
must have 1 row per chromosome and the following
columns:
chrom
: Chromosome name.
Character vector (or factor) with no NAs and no duplicates.
length
: Chromosome length.
Integer vector with either all NAs or no NAs.
is_circular
: [optional] Chromosome circularity flag.
Logical vector. NAs are ok.
Other columns, if any, are ignored (with a warning).
makeTxDbFromUCSC
, makeTxDbFromBiomart
,
makeTxDbFromGRanges
, and makeTxDbFromGFF
,
for convenient ways to make a TxDb object from UCSC or BioMart
online resources, or from a GRanges object,
or from a GFF or GTF file.
saveDb
and
loadDb
in the AnnotationDbi
package for saving and loading a TxDb object as an SQLite
file.
transcripts <- data.frame(
tx_id=1:3,
tx_chrom="chr1",
tx_strand=c("-", "+", "+"),
tx_start=c(1, 2001, 2001),
tx_end=c(999, 2199, 2199))
splicings <- data.frame(
tx_id=c(1L, 2L, 2L, 2L, 3L, 3L),
exon_rank=c(1, 1, 2, 3, 1, 2),
exon_start=c(1, 2001, 2101, 2131, 2001, 2131),
exon_end=c(999, 2085, 2144, 2199, 2085, 2199),
cds_start=c(1, 2022, 2101, 2131, NA, NA),
cds_end=c(999, 2085, 2144, 2193, NA, NA))
txdb <- makeTxDb(transcripts, splicings)
Run the code above in your browser using DataLab