Learn R Programming

easyRNASeq (version 2.8.2)

createSyntheticTranscripts,AnnotParamCharacter-method: Methods to create synthetic transcripts

Description

This function create a set of synthetic transcripts from a provided annotation file in "gff3" or "gtf" format. As detailed in http://www.epigenesys.eu/en/protocols/bio-informatics/1283-guidelines-for-rna-seq-data-analysis, one major caveat of estimating gene expression using aligned RNA-Seq reads is that a single read, which originated from a single mRNA molecule, might sometimes align to several features (e.g. transcripts or genes) with alignments of equivalent quality. This, for example, might happen as a result of gene duplication and the presence of repetitive or common domains. To avoid counting unique mRNA fragments multiple times, the stringent approach is to keep only uniquely mapping reads - being aware of potential consequences. Not only can "multiple counting" arise from a biological reason, but also from technical artifacts, introduced mostly by poorly formatted gff3/gtf annotation files. To avoid this, it is best practice to adopt a conservative approach by collapsing all existing transcripts of a single gene locus into a "synthetic" transcript containing every exon of that gene. In the case of overlapping exons, the longest genomic interval is kept, i.e. an artificial exon is created. This process results in a flattened transcript - a gene structure with a one (gene) to one (transcript) relationship.

Usage

## S3 method for class 'AnnotParamCharacter':
createSyntheticTranscripts(obj,
  features = c("mRNA", "miRNA", "tRNA", "transcript"), verbose = TRUE)

## S3 method for class 'character': createSyntheticTranscripts(obj, features = c("mRNA", "miRNA", "tRNA", "transcript"), verbose = TRUE, output = c("Genome_intervals", "GRanges"), input = c("gff3", "gtf"))

Arguments

obj
a AnnotParamCharacter object or the annotation filename as a character string
features
one or more of 'mRNA', 'miRNA', 'tRNA', 'transcript'
verbose
increase the verbosity (default TRUE)
output
the output type, one of 'Genome_intervals' or 'GRanges'
input
the type of input, one of 'gff3' or 'gtf'
...
If obj is a character string, input and output - see below

Value

  • Depending on the obj class.
    • AnnotParamCharacter: aAnnotParamObjectobject
    • acharacterfilename: depending on the selectedoutputvalue, aGenome_intervalsor aGRangesobject.

Details

The createSyntheticTranscripts function implements this, taking advantage of the hierarchical structure of the gff3/gtf file. Exon features are related to their transcript (parent), which themselves derives from their gene parents. Using this relationship, exons are combined per gene into a flattened transcript structure. Note that this might not avoid multiple counting if genes overlap on opposing strands. There, only strand specific sequencing data has the power to disentangle these situations.

As gff3/gtf file can contain a large number of feature types, the createSyntheticTranscripts currently only supports: mRNA, miRNA, tRNA and transcript. Please contact me if you need additional features to be considered. Note however, that I will only add features that are part of the sequenceontology.org SOFA (SO_Feature_Annotation) ontology.

See Also

  • For the input:
    • AnnotParam
For the output:

Examples

Run this code
## the data
  library("RnaSeqTutorial")

  ## get the example file
  library(curl)
  curl_download(paste0("https://microasp.upsc.se/root/upscb-public/raw/",
  "master/tutorial/easyRNASeq/Drosophila_melanogaster.BDGP5.77.with-chr.gtf.gz"),
             "Drosophila_melanogaster.BDGP5.77.with-chr.gtf.gz")

  ## create the AnnotParam
  annotParam <- AnnotParam(
    datasource="Drosophila_melanogaster.BDGP5.77.with-chr.gtf.gz",
    type="gtf")

  ## create the synthetic transcripts
  annotParam <- createSyntheticTranscripts(annotParam,verbose=FALSE)

Run the code above in your browser using DataLab