Learn R Programming

DEGseq (version 1.26.0)

DEGseq: DEGseq: Identify Differentially Expressed Genes from RNA-seq data

Description

This function is used to identify differentially expressed genes from RNA-seq data. It takes uniquely mapped reads from RNA-seq data for the two samples with a gene annotation as input. So users should map the reads (obtained from sequencing libraries of the samples) to the corresponding genome in advance.

Usage

DEGseq(mapResultBatch1, mapResultBatch2, fileFormat="bed", readLength=32,
       strandInfo=FALSE, refFlat, groupLabel1="group1", groupLabel2="group2",
       method=c("LRT", "CTR", "FET", "MARS", "MATR", "FC"), 
       pValue=1e-3, zScore=4, qValue=1e-3, foldChange=4, thresholdKind=1,
       outputDir="none", normalMethod=c("none", "loess", "median"),
       depthKind=1, replicate1="none", replicate2="none",
       replicateLabel1="replicate1", replicateLabel2="replicate2")

Arguments

mapResultBatch1
vector containing uniquely mapping result files for technical replicates of sample1 (or replicate1 when method="CTR").
mapResultBatch2
vector containing uniquely mapping result files for technical replicates of sample2 (or replicate2 when method="CTR").
fileFormat
file format: "bed" or "eland". example of "bed" format: chr12 7 38 readID 2 + example of "eland" format: readID chr12.fa 7 U2 F Note: The field separator character is TAB. And the files must follow the format as one of the examples.
readLength
the length of the reads (only used if fileFormat="eland").
strandInfo
whether the strand information was retained during the cloning of the cDNAs.
  • "TRUE": retained,
  • "FALSE": not retained.
refFlat
gene annotation file in UCSC refFlat format. See http://genome.ucsc.edu/goldenPath/gbdDescriptionsOld.html#RefFlat.
groupLabel1
label of group1 on the plots.
groupLabel2
label of group2 on the plots.
method
method to identify differentially expressed genes. Possible methods are:
  • "LRT": Likelihood Ratio Test (Marioni et al. 2008),
  • "CTR": Check whether the variation between two Technical Replicates can be explained by the random sampling model (Wang et al. 2009),
  • "FET": Fisher's Exact Test (Joshua et al. 2009),
  • "MARS": MA-plot-based method with Random Sampling model (Wang et al. 2009),
  • "MATR": MA-plot-based method with Technical Replicates (Wang et al. 2009),
  • "FC": Fold-Change threshold on MA-plot.
pValue
pValue threshold (for the methods: LRT, FET, MARS, MATR). only used when thresholdKind=1.
zScore
zScore threshold (for the methods: MARS, MATR). only used when thresholdKind=2.
qValue
qValue threshold (for the methods: LRT, FET, MARS, MATR). only used when thresholdKind=3 or thresholdKind=4.
thresholdKind
the kind of threshold. Possible kinds are:
  • 1: pValue threshold,
  • 2: zScore threshold,
  • 3: qValue threshold (Benjamini et al. 1995),
  • 4: qValue threshold (Storey et al. 2003),
  • 5: qValue threshold (Storey et al. 2003) and Fold-Change threshold on MA-plot are both required (can be used only whenmethod="MARS").
foldChange
fold change threshold on MA-plot (for the method: FC).
outputDir
the output directory.
normalMethod
the normalization method: "none", "loess", "median" (Yang,Y.H. et al. 2002). recommend: "none".
depthKind
1: take the total number of reads uniquely mapped to genome as the depth for each replicate, 0: take the total number of reads uniquely mapped to all annotated genes as the depth for each replicate. We recommend taking depthKind=1, especially when the genes in annotation file are part of all genes.
replicate1
files containing uniquely mapped reads obtained from replicate batch1 (only used when method="MATR").
replicate2
files containing uniquely mapped reads obtained from replicate batch2 (only used when method="MATR").
replicateLabel1
label of replicate batch1 on the plots (only used when method="MATR").
replicateLabel2
label of replicate batch2 on the plots (only used when method="MATR").

References

Benjamini,Y. and Hochberg,Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57, 289-300.

Jiang,H. and Wong,W.H. (2009) Statistical inferences for isoform expression in RNA-seq. Bioinformatics, 25, 1026-1032.

Bloom,J.S. et al. (2009) Measuring differential gene expression by short read sequencing: quantitative comparison to 2-channel gene expression microarrays. BMC Genomics, 10, 221.

Marioni,J.C. et al. (2008) RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res., 18, 1509-1517.

Storey,J.D. and Tibshirani,R. (2003) Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. 100, 9440-9445. Wang,L.K. and et al. (2010) DEGseq: an R package for identifying differentially expressed genes from RNA-seq data, Bioinformatics 26, 136 - 138. Yang,Y.H. et al. (2002) Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Research, 30, e15.

See Also

DEGexp, getGeneExp, readGeneExp, kidneyChr21.bed, liverChr21.bed, refFlatChr21.

Examples

Run this code
kidneyR1L1 <- system.file("extdata", "kidneyChr21.bed.txt", package="DEGseq")
  liverR1L2  <- system.file("extdata", "liverChr21.bed.txt", package="DEGseq")
  refFlat    <- system.file("extdata", "refFlatChr21.txt", package="DEGseq")
  mapResultBatch1 <- c(kidneyR1L1)  ## only use the data from kidneyR1L1 and liverR1L2
  mapResultBatch2 <- c(liverR1L2)
  outputDir <- file.path(tempdir(), "DEGseqExample")
  DEGseq(mapResultBatch1, mapResultBatch2, fileFormat="bed", refFlat=refFlat,
         outputDir=outputDir, method="LRT")
  cat("outputDir:", outputDir, "")

Run the code above in your browser using DataLab