preprocess_cds: Preprocess a cds to prepare for trajectory inference

Description

Most analyses (including trajectory inference, and clustering) in Monocle3, require various normalization and preprocessing steps. preprocess_cds executes and stores these preprocessing steps.

Specifically, depending on the options selected, preprocess_cds first normalizes the data by log and size factor to address depth differences, or by size factor only. Next, preprocess_cds calculates a lower dimensional space that will be used as the input for further dimensionality reduction like tSNE and UMAP.

Usage

preprocess_cds(
  cds,
  method = c("PCA", "LSI"),
  num_dim = 50,
  norm_method = c("log", "size_only", "none"),
  use_genes = NULL,
  pseudo_count = NULL,
  scaling = TRUE,
  verbose = FALSE,
  ...
)

Arguments

cds

the cell_data_set upon which to perform this operation

method

a string specifying the initial dimension method to use, currently either PCA or LSI. For LSI (latent semantic indexing), it converts the (sparse) expression matrix into tf-idf matrix and then performs SVD to decompose the gene expression / cells into certain modules / topics. Default is "PCA".

num_dim

the dimensionality of the reduced space.

norm_method

Determines how to transform expression values prior to reducing dimensionality. Options are "log", "size_only", and "none". Default is "log". Users should only use "none" if they are confident that their data is already normalized.

use_genes

NULL or a list of gene IDs. If a list of gene IDs, only this subset of genes is used for dimensionality reduction. Default is NULL.

pseudo_count

NULL or the amount to increase expression values before normalization and dimensionality reduction. If NULL (default), a pseudo_count of 1 is added for log normalization and 0 is added for size factor only normalization.

scaling

When this argument is set to TRUE (default), it will scale each gene before running trajectory reconstruction. Relevant for method = PCA only.

verbose

Whether to emit verbose output during dimensionality reduction

...

additional arguments to pass to limma::lmFit if residual_model_formula is not NULL

Value

an updated cell_data_set object