lsbclust: Least-squares Bilinear Clustering of Three-way Data

Description

This function clusters along one way of a three-way array (as specified by margin) while decomposing along the other two dimensions. Four types of clusterings are allowed based on the respective two-way slices of the array: on the overall means, row margins, column margins and the interactions between rows and columns. Which clusterings can be fit is determined by the vector delta, with four binary elements. All orthogonal models are fitted. The nonorthogonal case delta = (1, 1, 0, 0) returns an error. See the reference for further details.

Usage

lsbclust(data, margin = 3L, delta = c(1L, 1L, 1L, 1L), nclust,
  ndim = 2L, fixed = c("none", "rows", "columns"), nstart = 20L,
  starts = NULL, nstart.kmeans = 500L, alpha = 0.5,
  parallel = FALSE, maxit = 100L, verbose = 1, method = "diag",
  type = NULL, sep.nclust = TRUE, ...)

Arguments

data

A three-way array representing the data.

margin

An integer giving the single subscript of data over which the clustering will be applied.

delta

A four-element binary vector (logical or numeric) indicating which sum-to-zero constraints must be enforced.

nclust

A vector of length four giving the number of clusters for the overall mean, the row margins, the column margins and the interactions (in that order) respectively. Alternatively, a vector of length one, in which case all components will have the same number of clusters.

ndim

The required rank for the approximation of the interactions (a scalar).

fixed

One of "none", "rows" or "columns" indicating whether to fix neither sets of coordinates, or whether to fix the row or column coordinates across clusters respectively. If a vector is supplied, only the first element will be used (passed to int.lsbclust).

nstart

The number of random starts to use for the interaction clustering.

starts

A list containing starting configurations for the cluster membership vector. If not supplied, random initializations will be generated (passed to int.lsbclust).

nstart.kmeans

The number of random starts to use in kmeans.

alpha

Numeric value in [0, 1] which determines how the singular values are distributed between rows and columns (passed to int.lsbclust).

parallel

Logical indicating whether to parallel over different starts or not (passed to int.lsbclust).

maxit

The maximum number of iterations allowed in the interaction clustering.

verbose

Integer controlling the amount of information printed: 0 = no information, 1 = Information on random starts and progress, and 2 = information is printed after each iteration for the interaction clustering.

method

The method for calculating cluster agreement across random starts, passed on to cl_agreement (passed to int.lsbclust).

type

One of "rows", "columns" or "overall" (or a unique abbreviation of one of these) indicating whether clustering should be done on row margins, column margins or the overall means of the two-way slices respectively. If more than one opion are supplied, the algorithm is run for all (unique) options supplied (passed to orc.lsbclust). This is an optional argument.

sep.nclust

Logical indicating how nclust should be used across different type's. If sep.nclust is TRUE, nclust is recycled so that each type can have a different number of clusters. If sep.nclust is FALSE, the same vector nclust is used for all type's.

…

Additional arguments passed to kmeans.

Value

Returns an object of S3 class lsbclust which has slots:

overall

Object of class ovl.kmeans for the overall means clustering

rows

Object of class row.kmeans for the row means clustering

columns

Object of class col.kmeans for the column means clustering

interactions

Object of class int.lsbclust for the interaction clustering

call

The function call used to create the object

delta

The value of delta in the fit

df

Breakdown of the degrees-of-freedom across the different subproblems

loss

Breakdown of the loss across subproblems

time

Time taken in seconds to calculate the solution

cluster

Matrix of cluster membership per observation for all cluster types

References

Schoonees, P.C., Groenen, P.J.F., Van de Velden, M. Least-squares Bilinear Clustering of Three-way Data. Econometric Institute Report, EI2014-23.

Description

Usage

Arguments

Value

References

See Also