Learn R Programming

rliger (version 1.0.1)

suggestK: Visually suggest appropiate k value

Description

This can be used to select appropriate value of k for factorization of particular dataset. Plots median (across cells in all datasets) K-L divergence from uniform for cell factor loadings as a function of k. This should increase as k increases but is expected to level off above sufficiently high number of factors (k). This is because cells should have factor loadings which are not uniformly distributed when an appropriate number of factors is reached.

Depending on number of cores used, this process can take 10-20 minutes.

Usage

suggestK(
  object,
  k.test = seq(5, 50, 5),
  lambda = 5,
  thresh = 1e-04,
  max.iters = 100,
  num.cores = 1,
  rand.seed = 1,
  gen.new = FALSE,
  nrep = 1,
  plot.log2 = TRUE,
  return.data = FALSE,
  return.raw = FALSE,
  verbose = TRUE
)

Value

Matrix of results if indicated or ggplot object. Plots K-L divergence vs. k to console.

Arguments

object

liger object. Should normalize, select genes, and scale before calling.

k.test

Set of factor numbers to test (default seq(5, 50, 5)).

lambda

Lambda to use for all foctorizations (default 5).

thresh

Convergence threshold. Convergence occurs when |obj0-obj|/(mean(obj0,obj)) < thresh

max.iters

Maximum number of block coordinate descent iterations to perform

num.cores

Number of cores to use for optimizing factorizations in parallel (default 1)

rand.seed

Random seed for reproducibility (default 1).

gen.new

Do not use optimizeNewK in factorizations. Results in slower factorizations. (default FALSE).

nrep

Number restarts to perform at each k value tested (increase to produce smoother curve if results unclear) (default 1).

plot.log2

Plot log2 curve for reference on K-L plot (log2 is upper bound and con sometimes help in identifying "elbow" of plot). (default TRUE)

return.data

Whether to return list of data matrices (raw) or dataframe (processed) instead of ggplot object (default FALSE).

return.raw

If return.results TRUE, whether to return raw data (in format described below), or dataframe used to produce ggplot object. Raw data is list of matrices of K-L divergences (length(k.test) by n_cells). Length of list corresponds to nrep. (default FALSE)

verbose

Print progress bar/messages (TRUE by default)

Examples

Run this code
# \donttest{
ligerex <- createLiger(list(ctrl = ctrl, stim = stim))
ligerex <- normalize(ligerex)
ligerex <- selectGenes(ligerex)
ligerex <- scaleNotCenter(ligerex)
suggestK(ligerex, k.test = c(5,6), max.iters = 1)
# }

Run the code above in your browser using DataLab