Most analyses (including trajectory inference, and clustering)
in Monocle3, require various normalization and preprocessing steps.
preprocess_cds
executes and stores these preprocessing steps.
Specifically, depending on the options selected, preprocess_cds
first
normalizes the data by log and size factor to address depth differences, or
by size factor only. Next, preprocess_cds
calculates a lower
dimensional space that will be used as the input for further dimensionality
reduction like tSNE and UMAP.
preprocess_cds(
cds,
method = c("PCA", "LSI"),
num_dim = 50,
norm_method = c("log", "size_only", "none"),
use_genes = NULL,
pseudo_count = NULL,
scaling = TRUE,
verbose = FALSE,
...
)
the cell_data_set upon which to perform this operation
a string specifying the initial dimension method to use, currently either PCA or LSI. For LSI (latent semantic indexing), it converts the (sparse) expression matrix into tf-idf matrix and then performs SVD to decompose the gene expression / cells into certain modules / topics. Default is "PCA".
the dimensionality of the reduced space.
Determines how to transform expression values prior to reducing dimensionality. Options are "log", "size_only", and "none". Default is "log". Users should only use "none" if they are confident that their data is already normalized.
NULL or a list of gene IDs. If a list of gene IDs, only this subset of genes is used for dimensionality reduction. Default is NULL.
NULL or the amount to increase expression values before normalization and dimensionality reduction. If NULL (default), a pseudo_count of 1 is added for log normalization and 0 is added for size factor only normalization.
When this argument is set to TRUE (default), it will scale each gene before running trajectory reconstruction. Relevant for method = PCA only.
Whether to emit verbose output during dimensionality reduction
additional arguments to pass to limma::lmFit if residual_model_formula is not NULL
an updated cell_data_set object