feemcorcondia: Core consistency diagnostic for PARAFAC models

Description

Compute the core consistency diagnostic (“CORCONDIA”) by fitting a “Tucker3” core array to the existing PARAFAC loadings.

Usage

feemcorcondia(
    model, divisor = c("nfac", "core"),
    kind = c('pinv', 'iterative', 'vec'), ...
  )
  # S3 method for feemcorcondia
print(x, ...)

Value

A numeric scalar of class feemcorcondia with the following attributes:

divisor: The divisor argument, expanded to one of the valid options.
core: A three-way array containing the least-squares Tucker core for the given PARAFAC model.

Arguments

model

A PARAFAC model returned by feemparafac.

divisor

The divisor used in computation of the CORCONDIA value, see Details.

kind

A string choosing the method used to compute the least squares Tucker3 core. Defaults to “pinv” for PARAFAC models without missing data and “iterative” for models where missing data is present. See Details.

x

An object returned by feemcorcondia.

...

feemcorcondia: For kind = 'iterative', forwarded to optim (see Details). Otherwise, not allowed.

print.feemcorcondia

Ignored.

Details

The “Tucker3” model uses three loading matrices and a small three-way “core array” to describe a larger three-way array:

$$ X_{i,j,k} = \sum_r \sum_s \sum_t A_{i,r} B_{j,s} C_{k,t} G_{r,s,t} $$

It's easy to show that constraining $ G_{r,s,t} = 1_{r = s = t}$ makes the Tucker3 model equivalent to a PARAFAC model. The core consistency diagnostic works by constraining the loading matrices of a Tucker3 model to the existing loading matrices from a PARAFAC model and estimating the core array. The closer the resulting $\mathbf{G}$ tensor is to a diagonal one, the better.

Given the least-squares estimated core tensor $\mathbf{G}$, the ideal core tensor $ T_{r,s,t} = 1_{r = s = t}$ and the denominator $D$, the CORCONDIA metric is defined as follows:

$$ \left( 1 - \frac{ \sum_r \sum_s \sum_t (G_{r,s,t} - T_{r,s,t})^2 }{D} \right) \cdot 100\% $$

The denominator can be chosen to be either $ \sum_r \sum_s \sum_t T_{r,s,t}^2 $, which is equal to the number of factors in the model (divisor = 'nfac'), or $ \sum_r \sum_s \sum_t G_{r,s,t}^2 $, which will avoid resulting negative values (divisor = 'core').

There are multiple ways how the least squares Tucker3 core can be computed. When no data is missing, the matricised form of the model can be used to derive the expression (kind = 'pinv', the default in such cases):

$$ \mathbf{X} = \mathbf{A} \mathbf{G} (\mathbf{C} \otimes \mathbf{B})^\top + \epsilon $$ $$ \hat{\mathbf{G}} = \mathbf{A}^{+} \mathbf{X} ( (\mathbf{C}^\top)^{+} \otimes (\mathbf{B}^\top)^{+} ) $$

With missing data present, a binary matrix of weights $\mathbf{W}$ appears:

$$ \min_\mathbf{G} \left\| \mathbf W \circ ( \mathbf{A} \mathbf{G} (\mathbf{C} \otimes \mathbf{B})^\top - \mathbf{X} ) \right\|^2 $$

A gradient-based method can be used to solve this problem iteratively without allocating too much memory, but care must be taken to ensure convergence. For kind = 'iterative' (which is the default for models with missing data), optim is used with parameters method = 'L-BFGS-B', control = list(maxit = 1000, factr = 1). Warnings will be produced if the method doesn't indicate successful convergence.

The problem can also be solved exactly by unfolding the tensor into a vector and skipping the elements marked as missing:

$$ \min_\mathbf{G} \left\| \mbox{vec}( \mathbf{A} \mathbf{G} (\mathbf{C} \otimes \mathbf{B})^\top )_{\mbox{non-missing}} - \mbox{vec}(\mathbf{X})_{\mbox{non-missing}} \right\|^2 $$ $$ \mbox{vec}(\mathbf{C} \otimes \mathbf{B} \otimes \mathbf{A}) _{\mbox{non-missing}} \times \mbox{vec}\:\mathbf{G} = \mbox{vec}(\mathbf{X})_{\mbox{non-missing}} $$

Unfortunately, when this method is used (kind = 'vec'), the left-hand side of the equation has the size of $\mathbf{X}$ times the number of components cubed, which grows very quickly for models with large numbers of components.

References

albatross:::.Rdreference('Bro2003-CORCONDIA')

Examples

Run this code

  data(feems)
  cube <- feemscale(feemscatter(cube, c(20, 14)), na.rm = TRUE)
  # kind = 'vec' is exact but may take a lot of memory
  feemcorcondia(feemparafac(cube, nfac = 3, ctol = 1e-4), kind = 'vec')
  # kind = 'iterative' used by default for models with missing data
  feemcorcondia(feemparafac(cube, nfac = 4, ctol = 1e-4))

Run the code above in your browser using DataLab