ScaleData: Scale and center the data.

Description

Scales and centers genes in the dataset. If variables are provided in vars.to.regress, they are individually regressed against each gene, and the resulting residuals are then scaled and centered.

Usage

ScaleData(object, genes.use = NULL, data.use = NULL, vars.to.regress,
  model.use = "linear", use.umi = FALSE, do.scale = TRUE,
  do.center = TRUE, scale.max = 10, block.size = 1000,
  min.cells.to.block = 3000, display.progress = TRUE, assay.type = "RNA",
  do.cpp = TRUE, check.for.norm = TRUE, do.par = FALSE, num.cores = 1)

Arguments

object

Seurat object

genes.use

Vector of gene names to scale/center. Default is all genes in object@data.

data.use

Can optionally pass a matrix of data to scale, default is object@data[genes.use, ]

vars.to.regress

Variables to regress out (previously latent.vars in RegressOut). For example, nUMI, or percent.mito.

model.use

Use a linear model or generalized linear model (poisson, negative binomial) for the regression. Options are 'linear' (default), 'poisson', and 'negbinom'

use.umi

Regress on UMI count data. Default is FALSE for linear modeling, but automatically set to TRUE if model.use is 'negbinom' or 'poisson'

do.scale

Whether to scale the data.

do.center

Whether to center the data.

scale.max

Max value to return for scaled data. The default is 10. Setting this can help reduce the effects of genes that are only expressed in a very small number of cells. If regressing out latent variables and using a non-linear model, the default is 50.

block.size

Default size for number of genes to scale at in a single computation. Increasing block.size may speed up calculations but at an additional memory cost.

min.cells.to.block

If object contains fewer than this number of cells, don't block for scaling calculations.

display.progress

Displays a progress bar for scaling procedure

assay.type

Assay to scale data for. Default is RNA. Can be changed for multimodal analyses.

do.cpp

By default (TRUE), most of the heavy lifting is done in c++. We've maintained support for our previous implementation in R for reproducibility (set this to FALSE) as results can change slightly due to differences in numerical precision which could affect downstream calculations.

check.for.norm

Check to see if data has been normalized, if not, output a warning (TRUE by default)

do.par

use parallel processing for regressing out variables faster. If set to TRUE, will use half of the machines available cores (FALSE by default)

num.cores

If do.par = TRUE, specify the number of cores to use.

Value

Returns a seurat object with object@scale.data updated with scaled and/or centered data.

Details

ScaleData now incorporates the functionality of the function formerly known as RegressOut (which regressed out given the effects of provided variables and then scaled the residuals). To make use of the regression functionality, simply pass the variables you want to remove to the vars.to.regress parameter.

Setting center to TRUE will center the expression for each gene by subtracting the average expression for that gene. Setting scale to TRUE will scale the expression level for each gene by dividing the centered gene expression levels by their standard deviations if center is TRUE and by their root mean square otherwise.

Examples

Run this code

# NOT RUN {
pbmc_small <- ScaleData(object = pbmc_small)
# }
# NOT RUN {
# To regress out certain effects
pbmc_small = ScaleData(object = pbmc_small, vars.to.regress = effects_list)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab