Learn R Programming

albatross (version 0.3-8)

cmf: Implementation notes for constrained matrix factorisation

Description

cmf

Compute a low-rank matrix factorisation \( \min_{\mathbf A, \mathbf B} || (\mathbf X - \mathbf A \mathbf{B}^\top ) \circ \mathbf W ||_\mathrm F \) subject to weights \(\mathbf W\) (set to \(0\) where \(\mathbf X\) is not defined) and constraints on rows of \(\mathbf{A}, \mathbf{B}\).

wcmls

Solve the weighted multivariate least squares problem \( \min_\mathbf{B} || (\mathbf X - \mathbf A \mathbf{B}^\top) \circ \mathbf W ||_\mathrm F \) subject to constraints on rows of \(\mathbf B\).

This is not a public interface. Subject to change without further notice. Please do not call from outside albatross.

Usage

cmf(
    X, nfac = 1,
    const = list(list(const = "nonneg"), list(const = "nonneg")),
    start = c("svd", "random"), ctol = 1e-04, maxit = 10
  )
  # S3 method for cmf
fitted(object, ...)
  wcmls(X, A, W, ..., struc = NULL)

Value

cmf

An list of class cmf containing the \(\mathbf A, \mathbf B\) matrices.

wcmls

The \(\mathbf B\) matrix solving the constrained weighted multivariate least squares problem.

fitted.cmf

A matrix reconstructed from its nfac-rank decomposition.

Arguments

X

The matrix for a low-rank approximation.

nfac

The rank of the factorisation; the number of columns in matrices \(\mathbf A, \mathbf B\).

const

Constraints on the two matrices: a list of two lists of arguments to pass to wcmls when computing the corresponding matrix.

start

A cmf object to take the starting values from. Alternatively, a string:

svd

Compute a truncated SVD \( \mathbf X = \mathbf U \, \mathrm{diag}(\sigma_1, \dots, \sigma_k) \, \mathbf{V}^\top \). Use \( \mathbf A = \mathbf U \, \mathrm{diag}(\sqrt{\sigma_1}, \dots, \sqrt{\sigma_k}) \), \( \mathbf B = \mathbf V \, \mathrm{diag}(\sqrt{\sigma_1}, \dots, \sqrt{\sigma_k}) \) as the starting values.

random

Use uniformly distributed nonnegative starting values rescaled to be of comparable norms.

ctol

Given \(L = || (\mathbf X - \mathbf A \mathbf{B}^\top ) \circ \mathbf W ||_\mathrm F\), stop when \( \frac{|\Delta L|}{L} \le \mathtt{ctol} \).

maxit

Iteration number limit.

object

An object of class cmf.

A

The predictor matrix in the weighted multivariate least squares problem.

W

The weights matrix.

..., struc
wcmls

Passed to cmls.

fitted.cmf

Ignored.

Details

The CMLS package function cmls can solve constrained multivariate least squares problems of the form:

$$ \min_\mathbf{B} || \mathbf X - \mathbf A \mathbf B ||_\mathrm F = L(\mathbf X, \mathbf A, \mathbf B) $$

We use it to solve a weighted problem. Let \(\mathbf X, \mathbf W\) be \((m \times n)\) matrices, \(\mathbf A\) be an \((m \times k)\) matrix, \(\mathbf B\) be an \((n \times k)\) matrix, \(\mathbf{J}_{p,q}\) be a \((p \times q)\) matrix of ones:

$$ \min_\mathbf{B} || \mathbf W \circ (\mathbf X - \mathbf A \mathbf B^\top) ||_\mathrm F = \sum_{i,j} ( w_{i,j} x_{i,j} - w_{i,j} \mathbf{a}_{i,\cdot} \mathbf{b}_{j,\cdot}^\top )^2 = {} $$ $$ {} = \sum_j || \mathbf{w}_{\cdot,j} \circ \mathbf{x}_{\cdot,j} - ( (\mathbf{w}_{\cdot,j} \mathbf{J}_{1,k}) \circ \mathbf A ) \mathbf{b}_{j,\cdot}^\top ||_\mathrm F = \sum_j L( \mathbf{w}_{\cdot,j} \circ \mathbf{x}_{\cdot,j}, (\mathbf{w}_{\cdot,j} \mathbf{J}_{1,k}) \circ \mathbf A, \mathbf{b}_{j,\cdot}^\top ) $$

Here, \(\mathbf{w}_{\cdot,j}\) and \(\mathbf{x}_{\cdot,j}\) are columns of \(\mathbf W\) and \(\mathbf X\), while \(\mathbf{a}_{i,\cdot}\) and \(\mathbf{b}_{j,\cdot}\) are rows of \(\mathbf A\) and \(\mathbf B\), respectively. Thus, in the weighted case, the \(\mathbf B\) matrix is determined row by row by calling the cmls function for pre-processed \(\mathbf A\) matrix and columns of \(\mathbf X\).

The problem we're actually interested in is a low-rank approximation of \(\mathbf X\). It doesn't have a unique solution, especially if the rank is more than \(1\), unless we apply constraints and some luck. We solve it by starting with (typically) SVD and refining the solution with alternating least squares until it satisfies the constraints: \( \min_\mathbf{B} || (\mathbf X - \mathbf A \mathbf{B}^\top) \circ \mathbf W ||_\mathrm F \) and \( \min_\mathbf{A} || (\mathbf{X}^\top - \mathbf B \mathbf{A}^\top) \circ \mathbf{W}^\top ||_\mathrm F \).

References

albatross:::.Rdreference('deJuan2014')

See Also

cmls; the ALS package.

Examples

Run this code
  data(feems)
  z <- feemscatter(feems$a, rep(25, 4), 'omit')
  str(zf <- albatross:::cmf(unclass(z)))
  str(albatross:::fitted.cmf(zf))

Run the code above in your browser using DataLab