Learn R Programming

easyRNASeq (version 2.8.2)

easyRNASeq,character-method: easyRNASeq method

Description

This function is a wrapper around the more low level functionalities of the package. Is the easiest way to get a count matrix from a set of read files. It does the following: fetch the annotations depending on the provided arguments get the reads coverage from the provided file(s) summarize the reads according to the selected summarization features optionally apply a data correction (i.e. generating RPKM). use edgeR methods for post-processing the data or use DESeq methods for post-processing the data (either of them being recommended over RPKM).

Usage

## S3 method for class 'character':
easyRNASeq(filesDirectory = getwd(),
  organism = character(1), chr.sizes = c("auto"), readLength = integer(1),
  annotationMethod = c("biomaRt", "env", "gff", "gtf", "rda"),
  annotationFile = character(1), annotationObject = RangedData(),
  format = c("bam", "aln"), gapped = FALSE, count = c("exons", "features",
  "genes", "islands", "transcripts"), outputFormat = c("matrix",
  "SummarizedExperiment", "DESeq", "edgeR", "RNAseq"), pattern = character(1),
  filenames = character(0), nbCore = 1, filter = srFilter(),
  type = "SolexaExport", chr.sel = c(), summarization = c("bestExons",
  "geneModels"), normalize = FALSE, max.gap = integer(1), min.cov = 1L,
  min.length = integer(1), plot = TRUE, conditions = c(),
  validity.check = TRUE, chr.map = data.frame(), ignoreWarnings = FALSE,
  silent = FALSE, ...)

Arguments

filesDirectory
The directory where the files to be used are located. Defaults to the current directory.
organism
A character string describing the organism
chr.sizes
A vector or a list containing the chromosomes' size of the selected organism or simply the string "auto". See details.
readLength
The read length in bp
annotationMethod
The method to fetch the annotation, one of "biomaRt","env","gff","gtf" or "rda". All methods but "biomaRt" and "env" require the annotationFile to be set. The "env" method requires the annotationObject to be set.
annotationFile
The location (full path) of the annotation file
annotationObject
A RangedData or GRangesList object containing the annotation.
format
The format of the reads, one of "aln","bam". If not "bam", all the types supported by the ShortRead package are supported too. As of version 1.3.5, it defaults to bam.
gapped
Is the bam file provided containing gapped alignments?
count
The feature used to summarize the reads. One of 'exons','features','genes','islands' or 'transcripts'. See details.
outputFormat
By default, easyRNASeq returns a matrix. If one of DESeq,edgeR,RNAseq, SummarizedExperiment is provided then the respective object is returned.
pattern
For easyRNASeq, the pattern of file to look for, e.g. "bam$"
filenames
The name, not the path, of the files to use
nbCore
defines how many CPU core to use when computing the geneModels. Use the default parallel library
filter
The filter to be applied when loading the data using the "aln" format
type
The type of data when using the "aln" format. See the ShortRead library.
chr.sel
A vector of chromosome names to subset the final results.
summarization
A character defining which method to use when summarizing reads by genes. So far, only "geneModels" is available.
normalize
A boolean to convert the returned counts in RPKM. Valid when the outputFormat is left undefined (i.e. when a matrix is returned) and when it is DESeq or edgeR. Note that it is not advised to normalize the data prior DESeq or edgeR usage!
max.gap
When computing read islands, the maximal gap size allowed between two islands to merge them
min.cov
When computing read islands, the minimal coverage to take into account for calling an island
min.length
The minimal size an island should have to be kept
plot
Whether or not to plot assessment graphs.
conditions
A vector of descriptor, each sample must have a descriptor if you use outputFormat DESeq or edgeR. The size of this list must be equal to the number of sample. In addition the vector should be named with the filename of the corresponding samples.
validity.check
Shall UCSC chromosome name convention be enforced? This is only supported for a set of organisms, which are Dmelanogaster, Hsapiens, Mmusculus and Rnorvegicus; otherwise the argument 'chr.map' can be used to complement it.
chr.map
A data.frame describing the mapping of original chromosome names towards wished chromosome names. See details.
ignoreWarnings
set to TRUE (bad idea! they have a good reason to be there) if you do not want warning messages.
silent
set to TRUE if you do not want messages to be printed out.
...
additional arguments. See details

Value

  • Returns a count table (a matrix of m features x n samples). If the outputFormat option has been set, a corresponding object is returned: a RangedSummarizedExperiment, a DESeq:newCountDataset, a edgeR:DGEList or RNAseq.

item

  • the annotationObject When the annotationMethods is set to env or rda, a properly formatted RangedData or GRangesList object need to be provided. Check the paragraph RangedData in the vignette or the examples at the bottom of this page for examples. The data.frame-like structure of these objects is where easyRNASeq will look for the exon, feature, transcript, or gene identifier. Depending on the count method selected, it is essential that the akin column name is present in the annotationObject. E.g. when counting "features", the annotationObject has to contain a "feature" field.
  • the chr.map The chr.map argument for the easyRNASeq function only works for an "organismName" of value 'custom' with the "validity.check" parameter set to 'TRUE'. This data.frame should contain two columns named 'from' and 'to'. The row should represent the chromosome name in your original data and the wished name in the output of the function.
  • count The count can be summarized by exons, features, genes, islands or transcripts. While exons, genes and transcripts are obvious, "features" describes any features provided by the user, e.g. enhancer loci. These are processed as the exons are. For "islands", it is for an under development function that identifies de-novo expression loci and count the number of reads overlapping them.
  • chr.sizes If set to "auto", then the format has to be "bam", in which case the chromosome names and size are extracted from the BAM header

Details

  • ...
Additional arguments for different functions:
  • For thebiomaRtgetBMfunction
For the readGffGtf internal function that takes an optional arguments: annotation.type that default to "exon" (used to select the proper rows of the gff or gtf file) For the DESeq estimateDispersions method For to the list.files function used to locate the read files.

See Also

RNAseq RangedSummarizedExperiment edgeR:DGEList DESeq:newCountDataset ShortRead:readAligned

Examples

Run this code
library("RnaSeqTutorial")
	library(BSgenome.Dmelanogaster.UCSC.dm3)

	## creating a count table from 4 bam files
	count.table <- easyRNASeq(filesDirectory=
		    			system.file(
					"extdata",
					package="RnaSeqTutorial"),
					pattern="[A,C,T,G]{6}\\.bam$",
				format="bam",
				readLength=36L,
				organism="Dmelanogaster",
				chr.sizes=as.list(seqlengths(Dmelanogaster)),
				annotationMethod="rda",
				annotationFile=system.file(
				                            "data",
							    "gAnnot.rda",
							    package="RnaSeqTutorial"),
				count="exons")

	## an example of a chr.map
	chr.map <- data.frame(from=c("2L","2R","MT"),to=c("chr2L","chr2R","chrMT"))

	## an example of a RangedData annotation
	gAnnot <- RangedData(
                     IRanges(
                             start=c(10,30,100),
                             end=c(21,53,123)),
                          space=c("chr01","chr01","chr02"),
                          strand=c("+","+","-"),
                          transcript=c("trA1","trA2","trB"),
                          gene=c("gA","gA","gB"),
                          exon=c("e1","e2","e3"),
                          universe = "Hs19"
                          )

	## an example of a GRangesList annotation
	grngs <- as(gAnnot,"GRanges")
	grngsList<-split(grngs,seqnames(grngs))

Run the code above in your browser using DataLab