predict: Clustering and Prediction

Description

The methods predict for NMF models return the cluster membership of each sample or each feature. Currently the classification/prediction of new data is not implemented.

Usage

predict(object, ...)
  # S4 method for NMF
predict(object,
    what = c("columns", "rows", "samples", "features"),
    prob = FALSE, dmatrix = FALSE)
  # S4 method for NMFfitX
predict(object,
    what = c("columns", "rows", "samples", "features", "consensus", "chc"),
    dmatrix = FALSE, ...)

Arguments

object: an NMF model
what: a character string that indicates the type of cluster membership should be returned: ‘columns’ or ‘rows’ for clustering the colmuns or the rows of the target matrix respectively. The values ‘samples’ and ‘features’ are aliases for ‘colmuns’ and ‘rows’ respectively.
prob: logical that indicates if the relative contributions of/to the dominant basis component should be computed and returned. See Details.
dmatrix: logical that indicates if a dissimiliarity matrix should be attached to the result. This is notably used internally when computing NMF clustering silhouettes.
...: additional arguments affecting the predictions produced.

Methods

predict

signature(object = "NMF"): Default method for NMF models

predict

signature(object = "NMFfitX"): Returns the cluster membership index from an NMF model fitted with multiple runs.

Besides the type of clustering available for any NMF models ('columns', 'rows', 'samples', 'features'), this method can return the cluster membership index based on the consensus matrix, computed from the multiple NMF runs.

Argument what accepts the following extra types:

'chc': returns the cluster membership based on the hierarchical clustering of the consensus matrix, as performed by consensushc.
'consensus': same as 'chc' but the levels of the membership index are re-labeled to match the order of the clusters as they would be displayed on the associated dendrogram, as re-ordered on the default annotation track in consensus heatmap produced by consensusmap.

Details

The cluster membership is computed as the index of the dominant basis component for each sample (what='samples' or 'columns') or each feature (what='features' or 'rows'), based on their corresponding entries in the coefficient matrix or basis matrix respectively.

For example, if what='samples', then the dominant basis component is computed for each column of the coefficient matrix as the row index of the maximum within the column.

If argument prob=FALSE (default), the result is a factor. Otherwise a list with two elements is returned: element predict contains the cluster membership index (as a factor) and element prob contains the relative contribution of the dominant component to each sample (resp. the relative contribution of each feature to the dominant basis component):

Samples: $$p_j = x_{k_0} / \sum_k x_k$$, for each sample $1\leq j \leq p$, where $x_k$ is the contribution of the $k$-th basis component to $j$-th sample (i.e. H[k ,j]), and $x_{k_0}$ is the maximum of these contributions.
Features: $$p_i = y_{k_0} / \sum_k y_k$$, for each feature $1\leq i \leq p$, where $y_k$ is the contribution of the $k$-th basis component to $i$-th feature (i.e. W[i, k]), and $y_{k_0}$ is the maximum of these contributions.

References

Brunet J, Tamayo P, Golub TR and Mesirov JP (2004). "Metagenes and molecular pattern discovery using matrix factorization." _Proceedings of the National Academy of Sciences of the United States of America_, *101*(12), pp. 4164-9. ISSN 0027-8424, <URL: http://dx.doi.org/10.1073/pnas.0308531101>, <URL: http://www.ncbi.nlm.nih.gov/pubmed/15016911>.

Pascual-Montano A, Carazo JM, Kochi K, Lehmann D and Pascual-marqui RD (2006). "Nonsmooth nonnegative matrix factorization (nsNMF)." _IEEE Trans. Pattern Anal. Mach. Intell_, *28*, pp. 403-415.

Examples

Run this code

# roxygen generated flag
options(R_CHECK_RUNNING_EXAMPLES_=TRUE)


# random target matrix
v <- rmatrix(20, 10)
# fit an NMF model
x <- nmf(v, 5)

# predicted column and row clusters
predict(x)
predict(x, 'rows')

# with relative contributions of each basis component
predict(x, prob=TRUE)
predict(x, 'rows', prob=TRUE)

Run the code above in your browser using DataLab