sim_lsbclust: Simulate and Analyze LSBCLUST

Description

Perform a single simulation run for the LSBCLUST model. Multiple data sets are generated for a single set of underlying parameters,

Usage

sim_lsbclust(ndata, nobs, size, nclust, clustsize = NULL,
  delta = rep(1L, 4L), ndim = 2L, alpha = 0.5, fixed = c("none",
  "rows", "columns"), err_sd = 1, svmins = 0.5, svmax = 5,
  seed = NULL, parallel = FALSE, parallel_data = TRUE, verbose = 0,
  nstart_T3 = 20L, nstart_ak = 20L, mc.cores = detectCores() - 1,
  include_fits = FALSE, include_data = FALSE, nstart, nstart.kmeans)

Arguments

ndata

Integer giving the number of data sets to generate with the same underlying parameters.

nobs

Integer giving the number of observations to sample.

size

Vector with two elements giving the number of rows and columns respectively of each simulated observation.

nclust

A vector of length four giving the number of clusters for the overall mean, the row margins, the column margins and the interactions (in that order) respectively. Alternatively, a vector of length one, in which case all components will have the same number of clusters.

clustsize

A list of length four, with each element containing a vector of the same length as the corresponding entry in nclust, indicating the number of elements to contribute to each sample. Naturally, each of these vectors must sum to nobs, or an error will result. Positional matching are used, in the order "overall", "rows", "columns" and "interactions". If NULL, all clusters will be of equal size.

delta

A four-element binary vector (logical or numeric) indicating which sum-to-zero constraints must be enforced.

ndim

The required rank for the approximation of the interactions (a scalar).

alpha

Numeric value in [0, 1] which determines how the singular values are distributed between rows and columns (passed to int.lsbclust).

fixed

One of "none", "rows" or "columns" indicating whether to fix neither sets of coordinates, or whether to fix the row or column coordinates across clusters respectively. If a vector is supplied, only the first element will be used (passed to int.lsbclust).

err_sd

The standard deviation of the error distribution, as passed to rnorm

svmins

Vector of minimum values for the singular values (as passed to simsv). Optionally, if all minima are equal, a single numeric value which will be expanded to the correct length.

svmax

The maximum possible singular value (as passed to simsv)

seed

An optional seed to be set for the random number generator

parallel

Logical indicating whether to parallelize over random starts. Note that parallel_data has precedence over this

parallel_data

Logical indicating whether to parallelize over the data sets. If FALSE, parallelization is done over random starts (depending on parallel).

verbose

Integer giving the number of iterations after which the loss values is printed.

nstart_T3

The number of random starts to use for T3Clusf

nstart_ak

The number of random starts to use for akmeans

mc.cores

The number of cores to use, passed to makeCluster

include_fits

Logical indicating whether to include the model fits, or or only the fit statistics

include_data

Logical indicating whether to include the simulated data fitted on, or only the results

nstart

From lsbclust

nstart.kmeans

From lsbclust

Examples

Run this code

# NOT RUN {
set.seed(1)
res <- sim_lsbclust(ndata = 5, nobs = 100, size = c(10, 8), nclust = rep(5, 4), 
                    verbose = 0, nstart_T3 = 2, nstart_ak = 1, parallel_data = FALSE,
                    nstart = 2, nstart.kmeans = 5 )

# }

Run the code above in your browser using DataLab