compMean: Function for computing local gene expression averages

Description

This function performs computation of locally averaged gene expression across the pruned k nearest neighbours at given link probability cutoff.

Usage

compMean(
  x,
  res,
  pvalue = 0.01,
  genes = NULL,
  regNB = FALSE,
  batch = NULL,
  regVar = NULL,
  offsetModel = TRUE,
  thetaML = FALSE,
  theta = 10,
  ngenes = NULL,
  span = 0.75,
  no_cores = NULL,
  seed = 12345
)

Value

List object of three components:

mean: matrix with local gene expression averages, computed from Pearson residuals (if regNB=TRUE) or normalized UMI counts (if regNB=FALSE). In the latter case, the average UMI count for a local neighbourhood is normalized to one and rescaled by the median UMI count across neighborhoods.
regData: If regNB=TRUE this argument contains a list of four components: component pearsonRes contains a matrix of the Pearson Residual computed from the negative binomial regression, component nbRegr contains a matrix with the regression coefficients, component nbRegrSmooth contains a matrix with the smoothed regression coefficients, and log_umi is a vector with the total log UMI count for each cell. The regression coefficients comprise the dispersion parameter theta, the intercept, the regression coefficient beta for the log UMI count, and the regression coefficients of the batches (if batch is not NULL).

Arguments

x: Matrix of gene expression values with genes as rows and cells as columns. The matrix need to contain the same cell IDs as columns like the input matrix used to derive the pruned k nearest neighbours with the pruneKnn function. However, it may contain a different set of genes.
res: List object with k nearest neighbour information returned by pruneKnn function.
pvalue: Positive real number between 0 and 1. All nearest neighbours with link probability < pvalue are discarded. Default is 0.01.
genes: Vector of gene names corresponding to a subset of rownames of x. Only for these genes local gene expression averages are computed. Default is NULL and values for all genes are returned.
regNB: logical. If TRUE then gene expression averages are computed from the pearson residuals obtained from a negative binomial regression to eliminate the dependence of the expression variance on the mean. If FALSE then averages are computed from raw UMI counts. Default is FALSE.
batch: vector of batch variables. Component names need to correspond to valid cell IDs, i.e. column names of expData. If regNB is TRUE, than the batch variable will be regressed out simultaneously with the log UMI count per cell.An interaction term is included for the log UMI count with the batch variable. Default value is NULL.
regVar: data.frame with additional variables to be regressed out simultaneously with the log UMI count and the batch variable (if batch is TRUE). Column names indicate variable names (name beta is reserved for the coefficient of the log UMI count), and rownames need to correspond to valid cell IDs, i.e. column names of expData. Interaction terms are included for each variable in regVar with the batch variable (if batch is TRUE). Default value is NULL.
offsetModel: Logical parameter. Only considered if regNB is TRUE. If TRUE then the beta (log UMI count) coefficient is set to 1 and the intercept is computed analytically as the log ration of UMI counts for a gene and the total UMI count across all cells. Batch variables and additional variables in regVar are regressed out with an offset term given by the sum of the intercept and the log UMI count. Default is TRUE.
thetaML: Logical parameter. Only considered if offsetModel equals TRUE. If TRUE then the dispersion parameter is estimated by a maximum likelihood fit. Otherwise, it is set to theta. Default is FALSE.
theta: Positive real number. Fixed value of the dispersion parameter. Only considered if theaML equals FALSE.
ngenes: Positive integer number. Randomly sampled number of genes (from rownames of expData) used for predicting regression coefficients (if regNB=TRUE). Smoothed coefficients are derived for all genes. Default is NULL and all genes are used.
span: Positive real number. Parameter for loess-regression (see regNB) controlling the degree of smoothing. Default is 0.75.
no_cores: Positive integer number. Number of cores for multithreading. If set to NULL then the number of available cores minus two is used. Default is NULL.
seed: Integer number. Random number to initialize stochastic routines. Default is 12345.

Examples

Run this code

res <- pruneKnn(intestinalDataSmall,knn=10,alpha=1,no_cores=1,FSelect=FALSE)
mexp <- compMean(intestinalDataSmall,res,pvalue=0.01,genes = NULL,no_cores=1)

Run the code above in your browser using DataLab