Scales and centers genes in the dataset. If variables are provided in vars.to.regress, they are individually regressed against each gene, and the resulting residuals are then scaled and centered.
ScaleData(object, genes.use = NULL, data.use = NULL, vars.to.regress,
model.use = "linear", use.umi = FALSE, do.scale = TRUE,
do.center = TRUE, scale.max = 10, block.size = 1000,
min.cells.to.block = 3000, display.progress = TRUE, assay.type = "RNA",
do.cpp = TRUE, check.for.norm = TRUE, do.par = FALSE, num.cores = 1)
Seurat object
Vector of gene names to scale/center. Default is all genes in object@data.
Can optionally pass a matrix of data to scale, default is object@data[genes.use, ]
Variables to regress out (previously latent.vars in RegressOut). For example, nUMI, or percent.mito.
Use a linear model or generalized linear model (poisson, negative binomial) for the regression. Options are 'linear' (default), 'poisson', and 'negbinom'
Regress on UMI count data. Default is FALSE for linear modeling, but automatically set to TRUE if model.use is 'negbinom' or 'poisson'
Whether to scale the data.
Whether to center the data.
Max value to return for scaled data. The default is 10. Setting this can help reduce the effects of genes that are only expressed in a very small number of cells. If regressing out latent variables and using a non-linear model, the default is 50.
Default size for number of genes to scale at in a single computation. Increasing block.size may speed up calculations but at an additional memory cost.
If object contains fewer than this number of cells, don't block for scaling calculations.
Displays a progress bar for scaling procedure
Assay to scale data for. Default is RNA. Can be changed for multimodal analyses.
By default (TRUE), most of the heavy lifting is done in c++. We've maintained support for our previous implementation in R for reproducibility (set this to FALSE) as results can change slightly due to differences in numerical precision which could affect downstream calculations.
Check to see if data has been normalized, if not, output a warning (TRUE by default)
use parallel processing for regressing out variables faster. If set to TRUE, will use half of the machines available cores (FALSE by default)
If do.par = TRUE, specify the number of cores to use.
Returns a seurat object with object@scale.data updated with scaled and/or centered data.
ScaleData now incorporates the functionality of the function formerly known as RegressOut (which regressed out given the effects of provided variables and then scaled the residuals). To make use of the regression functionality, simply pass the variables you want to remove to the vars.to.regress parameter.
Setting center to TRUE will center the expression for each gene by subtracting the average expression for that gene. Setting scale to TRUE will scale the expression level for each gene by dividing the centered gene expression levels by their standard deviations if center is TRUE and by their root mean square otherwise.
# NOT RUN {
pbmc_small <- ScaleData(object = pbmc_small)
# }
# NOT RUN {
# To regress out certain effects
pbmc_small = ScaleData(object = pbmc_small, vars.to.regress = effects_list)
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab