default.mass: Default Mass Selection

Description

This function selects an optimal mass value for Cluster Analysis via Random Partition Distribtuions, using the Ewens-Pitman Attraction distribution.

Usage

default.mass(
  mass,
  list.epam,
  dis,
  new.draws = TRUE,
  w = c(1, 1, 1),
  discount = 0,
  temp = 10,
  loss = "binder",
  n.draws = 100L,
  two.stage = TRUE,
  parallel = TRUE
)
# S3 method for shallot.default.mass
print(x, ...)

Arguments

mass

optional, a vector of mass values.

list.epam

optional, a list of expected pairwise allocation matrices. Each matrix in the list needs the attributes "mass" and "n.draws".

dis

a dissimilarity structure of class dist.

new.draws

logical; if TRUE then new draws are obtained at each mass value.

a vector of length 3 of the weights to be used in the mass.algorithm.

discount

parameter of the Ewens-Pitman Attraction distribution.

temp

temperature parameter of the Ewens-Pitman Attraction distribution.

loss

One of "binder" or "VI.lb" to indicate the optimization should seek to minimize the expectation of the Binder loss (Binder 1978) or the lower bound of the expectation of the variation of information loss (Wade & Ghahramani 2017), respectively.

n.draws

number of draws of partitions to be obtained at each mass value.

two.stage

logical; if TRUE, the two stage algorithm is implemented in mass.algorithm.

parallel

logical; if TRUE computations will take advantage multiple CPU cores.

An object from the default.mass function.

...

currently ignored

Value

An object of class shallot.default.mass. This object is a list containing a matrix of `best' possible mass values to maximize partition confidence and minimize the variance ratio, the clustering estimate, the expected pairwise allocation matrix, parameters used for optimization and the EPA distribution, and the list of expected pairwise allocation matrices for each mass value.

Details

The function draws n.draws partitions at each specified mass value. If a vector of mass values is not given, then the default of seq(0.1,10,0.2) is used for loss "VI.lb" and seq(0.1,5,0.05) used for the other loss functions.

If a list of expected pairwise allocation matrices (EPAM) is provided, additional draws at matching mass values are added to the corresponding matrix. Additionally, no new draws are needed for estimation, if a list of EPAMs is provided.

A partition/clustering estimate from each EPAM is obtained using the SALSO method in salso. The estimate given minimizes the specified loss function with respect to the EPAM.

The function then uses the mass.algorithm to select the optimal mass value for clustering estimation.

Description

Usage

Arguments

Value

Details

See Also