feemjackknife: Jack-knife outlier detection in PARAFAC models

Description

Perform leave-one-out fitting + validation of PARAFAC models on a given FEEM cube.

Usage

feemjackknife(cube, ..., progress = TRUE)
  # S3 method for feemjackknife
plot(
    x, kind = c('estimations', 'RIP', 'IMP'), ...
  )
  # S3 method for feemjackknife
coef(
    object, kind = c('estimations', 'RIP', 'IMP'), ...
  )

Value

feemjackknife

A list of class feemjackknife containing the following entries:

overall: Result of fitting the overall cube with feemparafac.
leaveone: A list of length dim(cube)[3] containing the reduced dataset components. Every feemparafac object in the list has an additional Chat attribute containing the result of fitting the excluded spectrum back to the loadings of the reduced model.

plot.feemjackknife

A lattice plot object. Its print or plot method will draw the plot on an appropriate plotting device.

coef.feemjackknife

A data.frame containing various columns, depending on the value of the kind argument:

estimations

loading: Values of the loadings.

mode

The axis of the loadings, “Emission” or “Excitation”.

wavelength

Emission or excitation wavelength the loading values correspond to.

factor

The component number.

omitted

The sample (name if cube had names, integer if it didn't) that was omitted to get the resulting loading values.

RIP

msq.resid: Mean squared residual value for the model with a given sample omitted.

Emission

Mean squared difference in emission mode loadings between the overall model and the model with a given sample omitted.

Excitation

Mean squared difference in excitation mode loadings between the overall model and the model with a given sample omitted.

omitted

The sample (name if cube had names, integer if it didn't) that was omitted from a given model.

IMP

score.overall: Score values for the overall model.

score.predicted

Score values estimated from the loadings of the model missing a given sample.

factor

The component number.

omitted

The sample (name if cube had names, integer if it didn't) that was omitted from a given model.

Arguments

cube

A feemcube object.

progress

Set to FALSE to disable the progress bar.

x, object

An object returned by feemjackknife.

kind

Chooses what to plot (when called as plot(...)) or return as a data.frame (when called as coef(...)):

estimations: Produce the loadings from every leave-one-out model.

RIP

Produce a Resample Influence Plot, i.e. mean squared difference between loadings in overall and leave-one-out models plotted against mean squared residuals in leave-one-out models.

IMP

Produce an Identity Match Plot, i.e. scores in leave-one-out models plotted against scores in the overall model.

...

feemjackknife: Passed as-is to feemparafac and, eventually, to multiway function parafac.

plot.feemjackknife

When kind is “RIP” or “IMP”, pass a q argument to specify the quantile of residual values (for RIP) or absolute score differences (IMP) above which sample names (or numbers) should be plotted. Default value for q is $0.9$.

Remaining arguments are passed as-is to xyplot.

coef.feemjackknife

No further parameters are allowed.

Details

The function takes each sample out of the dataset, fits a PARAFAC model without it, then fits the outstanding sample to the model with emission and excitation factors fixed:

$$ \hat{\mathbf{c}} = (\mathbf{A} \ast \mathbf{B})^{+} \times \mathrm{vec}(\mathbf{X}) $$

The individual leave-one-out models (fitted loadings $\mathbf A$, $\mathbf B$ and scores $\mathbf C$) are reordered according to best Tucker's congruence coefficient match and rescaled by minimising $ || \mathbf A \, \mathrm{diag}(\mathbf s_\mathrm A) - \mathbf A^\mathrm{orig} ||^2 $ and $ || \mathbf{B} \, \mathrm{diag}(\mathbf s_\mathrm B) - \mathbf B^\mathrm{orig} ||^2 $ over $\mathbf s_\mathrm A$ and $\mathbf s_\mathrm B$, subject to $ \mathrm{diag}(\mathbf s_\mathrm A) \times \mathrm{diag}(\mathbf s_\mathrm B) \times \mathrm{diag}(\mathbf s_\mathrm C) = \mathbf I $, to make them comparable.

Once the models are fitted, resample influence plots and identity match plots can be produced from resulting data to detect outliers.

To conserve memory, feemjackknife puts the user-provided cube in an environment and passes it via envir and subset options of feemparafac. This means that, unlike in feemparafac, the cube argument has to be a feemcube object and passing envir and subset options to feemjackknife is not supported. It is recommended to fully name the parameters to be passed to feemparafac to avoid problems.

plot.feemjackknife provides sane defaults for xyplot parameters xlab, ylab, scales, as.table, but they can be overridden.

References

albatross:::.Rdreference('Riu2003')

Examples

Run this code

# \donttest{
  data(feems)
  cube <- feemscale(feemscatter(cube, rep(14, 4)), na.rm = TRUE)
  # takes a long time; the stopping criterion is weaked for speed
  jk <- feemjackknife(cube, nfac = 3, ctol = 1e-4)
  # feemparafac methods should be able to use the environment and subset
  plot(jk$leaveone[[1]])
  plot(jk)
  plot(jk, 'IMP')
  plot(jk, 'RIP')
  head(coef(jk))
# }

Run the code above in your browser using DataLab