Learn R Programming

refGenome (version 1.7.7)

ensemblGenome-class: Class "ensemblGenome"

Description

ensemblGenome represents ensembl genomic annotation data.

Arguments

Objects from the Class

Objects can be created by calls of the form ensemblGenome(dbfile). 'dbfile' represents SQLite database file.

Slots

basedir:

Object of class "character" Directory where SQLite database is written.

%
ev:

Object of class "environment" Environment that contains data structures. Optionally, there are gtf and attr data.frames.

%

Methods

show

signature(object = "refGenome"): Creates a sensible printout.

%
getGtf

signature(object = "refGenome"): Returns content of gtf table.

%
setGtf

signature(object = "refGenome"): Writes content of gtf table.

%
getAttr

signature(object = "refGenome"): Returns content of attribute table.

%
getGeneTable

signature(object = "refGenome"): Returns content of genes table when table exists. Otherwise NULL is returned.

%
setAttr

signature(object = "refGenome"): Writes content of attribute table.

%
read.gtf

signature(object, filename="transcripts.gtf", sep = "\t", useBasedir=TRUE, comment.char = "#", progress=100000L, ...): Imports content of gtf file. This is the basic mechanism for data import. It works the same way for ucscGenome and for ensemblGenome.

%
extractPaGenes

signature(object="ensemblGenome"): Extracts all annotations on primary assembly. The function returns a data.frame. Used as shortcut to directly extract a table from gtf files.

%
extractFeature

signature(object="ensemblGenome"): Extracts annotated positions which are classified as given 'feature' argument. Returns an 'ensemblGenome' object.

%
extractByGeneName

signature(object="ensemblGenome", geneNames="character"): Extracts ensemblGenome object which contains table subsets. When none of the geneNames matches, the function returns NULL.

%
extractTranscript

signature(object="ensemblGenome", transcripts="character"): Extracts ensemblGenome object which contains table subsets

%
getGenePositions

signature(object="ucscGenome", force="logical"): Extracts table with position data for whole genes (smallest exon start position and largest exon end position. A copy of the table will be placed inside the internal environment. Upon subsequent call only a copy of the contained table is returned unless force=TRUE is given. Upon force=TRUE new gene positions are calculated regardless of existing tables.)

%
getGeneTable

signature(object="ucscGenome"): Returns data.frame containing gene-specific data.

%
tableTranscript.name

signature(object="ensemblGenome"): Extracts table object which contains tabled 'transcript_name' column of gtf table

%
tableTranscript.id

signature(object="ensemblGenome"): Extracts table object which contains tabled 'transcript_id' column of gtf table

%
writeDB

signature(object = "refGenome"): Copies content of gtf, attr and xref table to database.

References

http://www.ensembl.org/info/data/ftp/index.html http://mblab.wustl.edu/GTF22.html#fields

Examples

Run this code
# NOT RUN {
##-------------------------------------##
## Create an instance from scratch
## Real data:
## ftp://ftp.ensembl.org/pub/release-80/gtf/homo_sapiens/Homo_sapiens.GRCh38.80.gtf.gz
##-------------------------------------##
ens <- ensemblGenome()
basedir(ens) <- system.file("extdata",package="refGenome")
ens_gtf <- "hs.ensembl.62.small.gtf"
read.gtf(ens,ens_gtf)
# Load a previously saved genome:
ensfile <- system.file("extdata", "hs.ensembl.62.small.RData", package="refGenome")
ens <- loadGenome(ensfile)

##-------------------------------------##
## Saving and loading
## Save as R-image (fast loading)
##-------------------------------------##
# }
# NOT RUN {
basedir(ens) <- getwd()
saveGenome(ens, "hs.ensembl.62.small.RData", useBasedir=FALSE)
enr <- loadGenome("hs.ensembl.62.small.RData")
# }
# NOT RUN {
## Save as SQLite database
##-------------------------------------##
## Commented out because RSQlite
## seems to produce memory leaks
##-------------------------------------##
# }
# NOT RUN {
writeDB(ens, filename="ens62.db3", useBasedir=FALSE)
edb <- loadGenomeDb(filename="ens62.db3")
# }
# NOT RUN {
##-------------------------------------##
##Extract data for Primary Assembly seqids
##-------------------------------------##
enpa <- extractSeqids(ens,ensPrimAssembly())
# Tables all features in 'gtf' table
tableFeatures(enpa)
# Extract Coding sequences for Primary Assemblys
enpafeat <- extractFeature(enpa, "exon")
# Shortcut. Returns a data.frame
engen <- extractPaGenes(ens)

##-------------------------------------##
## Extract data for indival Genes
##-------------------------------------##
ddx <- extractByGeneName(ens, "DDX11L1")
ddx
tableTranscript.id(ddx)
tableTranscript.name(ddx)
fam <- extractTranscript(ens, "ENST00000417324")
fam
# Extract range limits of entire Genes
gp <- getGenePositions(ens)
gp
# }

Run the code above in your browser using DataLab