- cell_protein_matrix
Raw protein ADT UMI count data to be normalized. Cells = columns, protein antibody = rows.
- empty_drop_matrix
Raw empty droplet / background ADT UMI count data used for background correction.
Cells - columns and Proteins (ADTs) - rows. See vignettes for how to define background matrix from the
raw_feature_bc_matrix output from Cell Ranger or from other alignment tools such as kallisto and Cite-Seq-Count.
For datasets without access to empty drops use dsb::ModelNegativeADTnorm.
- denoise.counts
Recommended function default `denoise.counts = TRUE` and `use.isotype.control = TRUE`.
This removes remove cell to cell technical noise as described as "step II" in Mulè et al 2022.
- use.isotype.control
Recommended function default `denoise.counts = TRUE` and `use.isotype.control = TRUE`.
This includes isotype controls in defining the dsb technical component.
- isotype.control.name.vec
A vector of the names of the isotype control proteins in the rows of the cells
and background matrix e.g. `isotype.control.name.vec = c('isotype1', 'isotype2')`.
- define.pseudocount
`FALSE` (default) uses the value 10 optimized for protein ADT data.
Any pseudocount can be used by setting this argument to `FALSE` and specifying `pseudocount.use`.
- pseudocount.use
Must be defined if `define.pseudocount = TRUE`. This is the pseudocount to be added to
raw ADT UMI counts. Otherwise the default pseudocount used.
- quantile.clipping
FALSE (default), if outliers or a large range of values for some proteins are observed
(e.g. -50 to 50) these are often from rare outlier cells. re-running the function with `quantile.clipping = TRUE`
will adjust by applying 0.001 and 0.998th quantile value clipping to trim values to those max and min values. If
range of normalized values are still very broad and high (e.g. above 40) try setting `scale.factor = mean.subtract`.
- quantile.clip
if `quantile.clipping = TRUE`, a vector of the lowest and highest quantiles to clip. These can
be tuned to the dataset size. The default c(0.001, 0.9995) optimized to clip only a few of the most extreme outliers.
- fast.km
Recommended to set this parameter to `TRUE` for large datasets. If `fast.km = TRUE`, the function defines
cell level background for step II with a a k=2 k-means cluster instead of a 2 component gaussian mixture. Increases speed
~10x on 1 million cell benchmark with minimal impact on results.
- scale.factor
one of `standardize` or `mean.subtract`.
Scale factor specifies how to implement protein level denoising. The recommended default is `standardize` which is the method
described in Mulè et al 2022 as "Step I". For each protein, this subtracts the mean and divides by the standard deviation
of that protein observed in the empty droplets, making the resulting value interpretable as the number of standard deviations
above the average of the background for that protein observed in empty droplets.
`mean.subtract`, subtracts the mean without dividing by the standard deviation; can be used in scenarios where low
background levels are detected systematically for most proteins in the dataset background standard deviation may be unstable.
- return.stats
if TRUE, returns a list, element 1 $dsb_normalized_matrix is the normalized adt matrix element 2
$dsb_stats is the internal stats used by dsb during denoising (the background mean, isotype control values, and the
final dsb technical component that is regressed out of the counts)