Learn R Programming

TCC (version 1.12.1)

calcNormFactors: Calculate normalization factors

Description

This function calculates normalization factors using a specified multi-step normalization method from a TCC-class object. The procedure can generally be described as the $STEP1-(STEP2-STEP3)n$ pipeline.

Usage

"calcNormFactors"(tcc, norm.method = NULL, test.method = NULL, iteration = TRUE, FDR = NULL, floorPDEG = NULL, increment = FALSE, ...)

Arguments

tcc
TCC-class object.
norm.method
character specifying a normalization method used in both the $STEP1$ and $STEP3$. Possible values are "tmm" for the TMM normalization method implemented in the edgeR package, "edger" (same as "tmm"), "deseq2" and "deseq" for the method implemented in the DESeq package. The default is "tmm" when analyzing the count data with multiple replicates (i.e., min(table(tcc$group[, 1])) > 1) and "deseq" when analyzing the count data without replicates (i.e., min(table(tcc$group[, 1])) == 1).
test.method
character specifying a method for identifying differentially expressed genes (DEGs) used in $STEP2$: one of "edger", "deseq", "deseq2", "bayseq", "samseq", "voom" and "wad". See the "Details" filed in estimateDE for detail. The default is "edger" when analyzing the count data with multiple replicates (i.e., min(table(tcc$group[, 1])) > 1), and "deseq" (2 group) and "deseq2" (more than 2 group) when analyzing the count data without replicates (i.e., min(table(tcc$group[, 1])) == 1.)
iteration
logical or numeric value specifying the number of iteration ($n$) in the proposed normalization pipeline: the $STEP1-(STEP2-STEP3)n$ pipeline. If FALSE or 0 is specified, the normalization pipeline is performed only by the method in $STEP1$. If TRUE or 1 is specified, the three-step normalization pipeline is performed. Integers higher than 1 indicate the number of iteration in the pipeline.
FDR
numeric value (between 0 and 1) specifying the threshold for determining potential DEGs after $STEP2$.
floorPDEG
numeric value (between 0 and 1) specifying the minimum value to be eliminated as potential DEGs before performing $STEP3$.
increment
logical value. if increment = TRUE, the DEGES pipeline will perform again from the current iterated result.
...
arguments to identify potential DEGs at $STEP2$. See the "Arguments" field in estimateDE for details.

Value

After performing the calcNormFactors function, the calculated normalization factors are populated in the norm.factors field (i.e., tcc$norm.factors). Parameters used for DEGES normalization (e.g., potential DEGs identified in $STEP2$, execution times for the identification, etc.) are stored in the DEGES field (i.e., tcc$DEGES) as follows:
iteration
the iteration number $n$ for the $STEP1 - (STEP2 - STEP3)_{n}$ pipeline.
pipeline
the DEGES normalization pipeline.
threshold
it stores (i) the type of threshold (threshold$type), (ii) the threshold value (threshold$input), and (iii) the percentage of potential DEGs actually used (threshold$PDEG). These values depend on whether the percentage of DEGs identified in $STEP2$ is higher or lower to the value indicated by floorPDEG. Consider, for example, the execution of calcNormFactors function with "FDR = 0.1 and floorPDEG = 0.05". If the percentage of DEGs identified in $STEP2$ satisfying FDR = 0.1 was 0.14 (i.e., higher than the floorPDEG of 0.05), the values in the threshold fields will be threshold$type = "FDR", threshold$input = 0.1, and threshold$PDEG = 0.14. If the percentage (= 0.03) was lower than the predefined floorPDEG value of 0.05, the values in the threshold fields will be threshold$type = "floorPDEG", threshold$input = 0.05, and threshold$PDEG = 0.05.
potDEG
numeric binary vector (0 for non-DEG or 1 for DEG) after the evaluation of the percentage of DEGs identified in $STEP2$ with the predefined floorPDEG value. If the percentage (e.g., 2%) is lower than the floorPDEG value (e.g., 17%), 17% of elements become 1 as DEG.
prePotDEG
numeric binary vector (0 for non-DEG or 1 for DEG) before the evaluation of the percentage of DEGs identified in $STEP2$ with the predefined floorPDEG value. Regardless of the floorPDEG value, the percentage of elements with 1 is always the same as that of DEGs identified in $STEP2$.
execution.time
computation time required for normalization.

Details

The calcNormFactors function is the main function in the TCC package. Since this pipeline employs the DEG identification method at $STEP2$, our multi-step strategy can eliminate the negative effect of potential DEGs before the second normalization at $STEP3$. To fully utilize the DEG elimination strategy (DEGES), we strongly recommend not to use iteration = 0 or iteration = FALSE. This function internally calls functions implemented in other R packages according to the specified value.

  • norm.method = "tmm" The calcNormFactors function implemented in edgeR is used for obtaining the TMM normalization factors at both $STEP1$ and $STEP3$.
  • norm.method = "deseq2" The estimateSizeFactors function implemented in DESeq2 is used for obetaining the size factors at both $STEP1$ and $STEP3$. The size factors are internally converted to normalization factors that are comparable to the TMM normalization factors.
  • norm.method = "deseq" The estimateSizeFactors function implemented in DESeq is used for obetaining the size factors at both $STEP1$ and $STEP3$. The size factors are internally converted to normalization factors that are comparable to the TMM normalization factors.

Examples

Run this code
data(hypoData)
group <- c(1, 1, 1, 2, 2, 2)

# Calculating normalization factors using the DEGES/edgeR method 
# (the TMM-edgeR-TMM pipeline).
tcc <- new("TCC", hypoData, group)
tcc <- calcNormFactors(tcc, norm.method = "tmm", test.method = "edger",
                       iteration = 1, FDR = 0.1, floorPDEG = 0.05)
tcc$norm.factors

# Calculating normalization factors using the iterative DEGES/edgeR method 
# (iDEGES/edgeR) with n = 3.
tcc <- new("TCC", hypoData, group)
tcc <- calcNormFactors(tcc, norm.method = "tmm", test.method = "edger",
                       iteration = 3, FDR = 0.1, floorPDEG = 0.05)
tcc$norm.factors

# Calculating normalization factors for simulation data without replicates.
tcc <- simulateReadCounts(replicates = c(1, 1))
tcc <- calcNormFactors(tcc, norm.method = "deseq", test.method = "deseq",
                       iteration = 1, FDR = 0.1, floorPDEG = 0.05)
tcc$norm.factors

Run the code above in your browser using DataLab