This function performs computation of the local gene expression variability across the pruned k nearest neighbours at given link probability cutoff. The estimated variance is corrected for the mean dependence utilizing the baseline model of gene expression variance.
compNoise(
x,
res,
pvalue = 0.01,
genes = NULL,
regNB = FALSE,
batch = NULL,
regVar = NULL,
offsetModel = TRUE,
thetaML = FALSE,
theta = 10,
ngenes = NULL,
span = 0.75,
step = 0.01,
thr = 0.05,
no_cores = NULL,
seed = 12345
)
List object of three components:
the baseline noise model as computed by the noiseBaseFit
function.
matrix with local gene expression variability estimates, corrected for the mean dependence.
If regNB=TRUE
this argument contains a list of four components: component pearsonRes
contains a matrix of the Pearson Residual computed from the negative binomial regression, component nbRegr
contains a matrix with the regression coefficients, component nbRegrSmooth
contains a matrix with the smoothed regression coefficients, and log_umi
is a vector with the total log UMI count for each cell. The regression coefficients comprise the dispersion parameter theta, the intercept, the regression coefficient beta for the log UMI count, and the regression coefficients of the batches (if batch
is not NULL
).
Matrix of gene expression values with genes as rows and cells as columns. The matrix need to contain the same cell IDs as columns like the input matrix used to derive the pruned k nearest neighbours with the pruneKnn
function. However, it may contain a different set of genes.
List object with k nearest neighbour information returned by pruneKnn
function.
Positive real number between 0 and 1. All nearest neighbours with link probability < pvalue
are discarded. Default is 0.01.
Vector of gene names corresponding to a subset of rownames of x
. Only for these genes local gene expression variability is computed. Default is NULL
and values for all genes are returned.
logical. If TRUE
then gene expression variability is derived from the pearson residuals obtained from a negative binomial regression to eliminate the dependence of the expression variance on the mean. If FALSE
then the mean dependence is regressed out from the raw variance using the baseline variance estimate. Default is FALSE
.
vector of batch variables. Component names need to correspond to valid cell IDs, i.e. column names of expData
. If regNB
is TRUE
, than the batch variable will be regressed out simultaneously with the log UMI count per cell. An interaction term is included for the log UMI count with the batch variable. Default value is NULL
.
data.frame with additional variables to be regressed out simultaneously with the log UMI count and the batch variable (if batch
is TRUE
). Column names indicate variable names (name beta
is reserved for the coefficient of the log UMI count), and rownames need to correspond to valid cell IDs, i.e. column names of expData
. Interaction terms are included for each variable in regVar
with the batch variable (if batch
is TRUE
). Default value is NULL
.
Logical parameter. Only considered if regNB
is TRUE
. If TRUE
then the beta
(log UMI count) coefficient is set to 1 and the intercept is computed analytically as the log ration of UMI counts for a gene and the total UMI count across all cells. Batch variables and additional variables in regVar
are regressed out with an offset term given by the sum of the intercept and the log UMI count. Default is TRUE
.
Logical parameter. Only considered if offsetModel
equals TRUE
. If TRUE
then the dispersion parameter is estimated by a maximum likelihood fit. Otherwise, it is set to theta
. Default is FALSE
.
Positive real number. Fixed value of the dispersion parameter. Only considered if theaML
equals FALSE
.
Positive integer number. Randomly sampled number of genes (from rownames of expData
) used for predicting regression coefficients (if regNB=TRUE
). Smoothed coefficients are derived for all genes. Default is NULL
and all genes are used.
Positive real number. Parameter for loess-regression (see regNB
) controlling the degree of smoothing. Default is 0.75.
Positive real number between 0 and 1. See function noiseBaseFit
. Default is 0.01.
Positive real number between 0 and 1. See function noiseBaseFit
. Default is 0.05.
Positive integer number. Number of cores for multithreading. If set to NULL
then the number of available cores minus two is used. Default is NULL
.
Integer number. Random number to initialize stochastic routines. Default is 12345.
res <- pruneKnn(intestinalDataSmall,knn=10,alpha=1,no_cores=1,FSelect=FALSE)
noise <- compNoise(intestinalDataSmall,res,pvalue=0.01,genes = NULL,no_cores=1)
Run the code above in your browser using DataLab