nmfEstimateRank: Estimate optimal rank for Nonnegative Matrix Factorization (NMF) models

Description

A critical parameter in NMF algorithms is the factorization rank $r$. It defines the number of basis effects used to approximate the target matrix. Function nmfEstimateRank helps in choosing an optimal rank by implementing simple approaches proposed in the litterature.

Usage

nmfEstimateRank(x, range, method = nmf.getOption("default.algorithm"), nrun = 30, conf.interval = FALSE, ...)
plot.NMF.rank(x, what = c("all", "cophenetic", "rss", "residuals", "dispersion"), ...)

Arguments

conf.interval

a single logical specifying if confidence intervals should be estimated for all the computed consensus measures. For each rank in range, the confidence intervals are estimated by bootstrap, resampling 5nrun

method

A single NMF algorithm, in one of the format accepted by interface nmf.

nrun

a numeric giving the number of run to perform for each value in range.

range

a numeric vector containing the ranks of factorization to try.

what

a character string that partially matches one of the following item: 'all', 'cophenetic', 'rss', 'residuals' , 'dispersion'. It specifies which measure must be plotted (

For nmfEstimateRank a target object to be estimated, in one of the format accepted by interface nmf. For plot.NMF.rank an object of class NMF.rank as returned by

...

For nmfEstimateRank, these are extra parameters passed to interface nmf. Note that the same parameters are used for each value of the rank. See nmf. For plot.NMF.ran

Value

A S3 object (i.e. a list) of class NMF.rank with the following slots:
measuresa data.frame containing the quality measures for each rank of factorizations in range. Each row correspond to a measure, each column to a rank.
consensusa list of consensus matrices, indexed by the rank of factorization (as a character string).

Details

Given a NMF algorithm and the target matrix, a common way of estimating $r$ is to try different values, compute some quality measures of the results, and choose the best value according to this quality criteria. See Brunet et al. (2004) and Hutchins et al. (2008).

The function nmfEstimateRank allow to launch this estimation procedure. It performs multiple NMF runs for a range of rank of factorization and, for each, returns a set of quality measures together with the associated consensus matrice.

References

Metagenes and molecular pattern discovery using matrix factorization Brunet, J.~P., Tamayo, P., Golub, T.~R., and Mesirov, J.~P. (2004) Proc Natl Acad Sci U S A 101(12), 4164--4169.

Examples

Run this code

set.seed(123456)
n <- 50; r <- 3; m <- 20
V <- syntheticNMF(n, r, m, noise=TRUE)

# Use a seed that will be set before each first run
res.estimate <- nmfEstimateRank(V, seq(2,5), method='brunet', nrun=10, seed=123456)

# plot all the measures
plot(res.estimate)
# or only one: e.g. the cophenetic correlation coefficient
plot(res.estimate, 'cophenetic')

Run the code above in your browser using DataLab