selectGenes: Select a subset of informative genes

Description

This function identifies highly variable genes from each dataset and combines these gene sets (either by union or intersection) for use in downstream analysis. Assuming that gene expression approximately follows a Poisson distribution, this function identifies genes with gene expression variance above a given variance threshold (relative to mean gene expression). It also provides a log plot of gene variance vs gene expression (with a line indicating expected expression across genes and cells). Selected genes are plotted in green.

Usage

selectGenes(
  object,
  var.thresh = 0.1,
  alpha.thresh = 0.99,
  num.genes = NULL,
  tol = 1e-04,
  datasets.use = 1:length(object@raw.data),
  combine = "union",
  capitalize = FALSE,
  do.plot = FALSE,
  cex.use = 0.3,
  chunk = 1000,
  unshared = FALSE,
  unshared.datasets = NULL,
  unshared.thresh = NULL
)

Value

liger object with var.genes slot set.

Arguments

object: liger object. Should have already called normalize.
var.thresh: Variance threshold. Main threshold used to identify variable genes. Genes with expression variance greater than threshold (relative to mean) are selected. (higher threshold -> fewer selected genes). Accepts single value or vector with separate var.thresh for each dataset. (default 0.1)
alpha.thresh: Alpha threshold. Controls upper bound for expected mean gene expression (lower threshold -> higher upper bound). (default 0.99)
num.genes: Number of genes to find for each dataset. Optimises the value of var.thresh for each dataset to get this number of genes. Accepts single value or vector with same length as number of datasets (optional, default=NULL).
tol: Tolerance to use for optimization if num.genes values passed in (default 0.0001).
datasets.use: List of datasets to include for discovery of highly variable genes. (default 1:length(object@raw.data))
combine: How to combine variable genes across experiments. Either "union" or "intersection". (default "union")
capitalize: Capitalize gene names to match homologous genes (ie. across species) (default FALSE)
do.plot: Display log plot of gene variance vs. gene expression for each dataset. Selected genes are plotted in green. (default FALSE)
cex.use: Point size for plot.
chunk: size of chunks in hdf5 file. (default 1000)
unshared: Whether to consider unshared features (Default FALSE)
unshared.datasets: A list of the datasets to consider unshared features for, i.e. list(2), to use the second dataset
unshared.thresh: A list of threshold values to apply to each unshared dataset. If only one value is provided, it will apply to all unshared datasets. If a list is provided, it must match the length of the unshared datasets submitted.

Examples

Run this code

ligerex <- createLiger(list(ctrl = ctrl, stim = stim))
ligerex <- normalize(ligerex)
ligerex <- selectGenes(ligerex)

Run the code above in your browser using DataLab