This function clusters along one way of a three-way array (as specified by margin
) while
decomposing along the other two dimensions. Four types of clusterings are allowed based on the
respective two-way slices of the array: on the overall means, row margins, column margins and the
interactions between rows and columns. Which clusterings can be fit is determined by the vector
delta
, with four binary elements. All orthogonal models are fitted.
The nonorthogonal case delta = (1, 1, 0, 0)
returns an error. See the reference for further details.
lsbclust(data, margin = 3L, delta = c(1L, 1L, 1L, 1L), nclust,
ndim = 2L, fixed = c("none", "rows", "columns"), nstart = 20L,
starts = NULL, nstart.kmeans = 500L, alpha = 0.5,
parallel = FALSE, maxit = 100L, verbose = 1, method = "diag",
type = NULL, sep.nclust = TRUE, ...)
A three-way array representing the data.
An integer giving the single subscript of data
over which the clustering
will be applied.
A four-element binary vector (logical or numeric) indicating which sum-to-zero constraints must be enforced.
A vector of length four giving the number of clusters for the overall mean, the row margins, the column margins and the interactions (in that order) respectively. Alternatively, a vector of length one, in which case all components will have the same number of clusters.
The required rank for the approximation of the interactions (a scalar).
One of "none"
, "rows"
or "columns"
indicating whether to fix neither
sets of coordinates, or whether to fix the row or column coordinates across clusters respectively.
If a vector is supplied, only the first element will be used (passed to int.lsbclust
).
The number of random starts to use for the interaction clustering.
A list containing starting configurations for the cluster membership vector. If not
supplied, random initializations will be generated (passed to int.lsbclust
).
The number of random starts to use in kmeans
.
Numeric value in [0, 1] which determines how the singular values are distributed
between rows and columns (passed to int.lsbclust
).
Logical indicating whether to parallel over different starts or not
(passed to int.lsbclust
).
The maximum number of iterations allowed in the interaction clustering.
Integer controlling the amount of information printed: 0 = no information, 1 = Information on random starts and progress, and 2 = information is printed after each iteration for the interaction clustering.
The method for calculating cluster agreement across random starts, passed on
to cl_agreement
(passed to int.lsbclust
).
One of "rows"
, "columns"
or "overall"
(or a unique abbreviation of
one of these) indicating whether clustering should be done on row margins, column margins or
the overall means of the two-way slices respectively. If more than one opion are supplied, the
algorithm is run for all (unique) options supplied (passed to orc.lsbclust
). This
is an optional argument.
Logical indicating how nclust should be used across different type
's.
If sep.nclust
is TRUE
, nclust
is recycled so that each type
can
have a different number of clusters. If sep.nclust
is FALSE
, the same vector
nclust
is used for all type
's.
Additional arguments passed to kmeans
.
Returns an object of S3 class lsbclust
which has slots:
overall
Object of class ovl.kmeans
for the overall means clustering
rows
Object of class row.kmeans
for the row means clustering
columns
Object of class col.kmeans
for the column means clustering
interactions
Object of class int.lsbclust
for the interaction clustering
call
The function call used to create the object
delta
The value of delta
in the fit
df
Breakdown of the degrees-of-freedom across the different subproblems
loss
Breakdown of the loss across subproblems
time
Time taken in seconds to calculate the solution
cluster
Matrix of cluster membership per observation for all cluster types
Schoonees, P.C., Groenen, P.J.F., Van de Velden, M. Least-squares Bilinear Clustering of Three-way Data. Econometric Institute Report, EI2014-23.