Let \(V\) be a \(n \times m\) non-negative matrix and
\(r\) a positive integer. In its standard form (see
references below), a NMF of \(V\) is commonly defined
as a pair of matrices \((W, H)\) such that:
$$V \equiv W H,$$
where:
\(W\) and \(H\) are \(n
\times r\) and \(r \times m\) matrices respectively with
non-negative entries;
\(\equiv\) is to be
understood with respect to some loss function. Common
choices of loss functions are based on Frobenius norm or
Kullback-Leibler divergence.
Integer \(r\) is called the factorization rank.
Depending on the context of application of NMF, the
columns of \(W\) and \(H\) are given different names:
- columns of
W
basis vector,
metagenes, factors, source, image basis
- columns of
H
mixture coefficients, metagene sample
expression profiles, weights
- rows of
H
basis profiles, metagene expression profiles
NMF approaches have been successfully applied to several
fields. The package NMF was implemented trying to use
names as generic as possible for objects and methods.
The following terminology is used:
- samples
the columns of the target matrix \(V\)
- features
the rows of the target matrix \(V\)
- basis matrix
the first matrix factor \(W\)
- basis vectors
the columns of first matrix factor
\(W\)
- mixture matrix
the second matrix factor
\(H\)
- mixtures coefficients
the columns of
second matrix factor \(H\)
However, because the package NMF was primarily
implemented to work with gene expression microarray data,
it also provides a layer to easily and intuitively work
with objects from the Bioconductor base framework. See
bioc-NMF for more details.