CCcorrect: Dimensional Reduction by PCA or ICA

Description

This functions performs dimensional reduction by PCA or ICA and removes components enriched for particular gene sets, e.g. cell cycle related genes genes associated with technical batch effects.

Usage

CCcorrect(
  object,
  vset = NULL,
  CGenes = NULL,
  ccor = 0.4,
  pvalue = 0.01,
  quant = 0.01,
  nComp = NULL,
  dimR = FALSE,
  mode = "pca",
  logscale = FALSE,
  FSelect = TRUE
)

Value

The function returns an updated SCseq object with the principal or independent component matrix written to the slot dimRed$x of the SCseq

object. Additional information on the PCA or ICA is stored in slot dimRed.

Arguments

object: SCseq class object.
vset: List of vectors with genes sets. The loadings of each component are tested for enrichment in any of these gene sets and if the lower quant or upper 1 - quant fraction of genes ordered by loading is enriched at a p-value < pvalue the component is discarded. Default is NULL.
CGenes: Vector of gene names. If this argument is given, gene sets to be tested for enrichment in PCA- or ICA-components are defined by all genes with a Pearson's correlation of >ccor to a gene in CGenes. The loadings of each component are tested for enrichment in any of these gene sets and if the lower quant or upper 1 - quant fraction of genes ordered by loading is enriched at a p-value < pvalue the component is discarded. Default is NULL.
ccor: Positive number between 0 and 1. Correlation threshold used to detrmine correlating gene sets for all genes in CGenes. Default is 0.4.
pvalue: Positive number between 0 and 1. P-value cutoff for determining enriched components. See vset or CGenes. Default is 0.01.
quant: Positive number between 0 and 1. Upper and lower fraction of gene loadings used for determining enriched components. See vset or CGenes. Default is 0.01.
nComp: Number of PCA- or ICA-components to use. Default is NULL and the maximal number of components is computed.
dimR: logical. If TRUE, then the number of principal components to use for downstream analysis is derived from a saturation criterion. See function plotdimsat. Default is FALSE and all nComp components are used.
mode: "pca" or "ica" to perform either principal component analysis or independent component analysis. Default is pca.
logscale: logical. If TRUE data are log-transformed prior to PCA or ICA. Default is FALSE.
FSelect: logical. If TRUE, then PCA or ICA is performed on the filtered expression matrix using only the features stored in slotcluster$features as computed in the function filterdata. See FSelect for function filterdata. Default is TRUE.

Examples

Run this code

sc <- SCseq(intestinalDataSmall)
sc <- filterdata(sc)
sc <- CCcorrect(sc,dimR=TRUE,nComp=3)

Run the code above in your browser using DataLab