blca.boot: Bayesian Latent Class Analysis via an EM Algorithm and Using Empirical Bootstrapping

Description

Latent class analysis (LCA) attempts to find G hidden classes in binary data X. blca.boot repeatedly samples from X with replacement then utilises an EM algorithm to find maximum posterior (MAP) and standard error estimates of the parameters.

Usage

blca.boot(X, G, alpha = 1, beta = 1, delta = rep(1, G), 
	  start.vals = c("single", "across"), counts.n = NULL, 
	  fit = NULL, iter = 50, B = 100, relabel = FALSE, 
          verbose = TRUE, verbose.update = 10, small = 1e-100)

Arguments

The data matrix. This may take one of several forms, see data.blca.

The number of classes to run lca for.

alpha, beta

The prior values for the data conditional on group membership. These may take several forms: a single value, recycled across all groups and columns, a vector of length G or M (the number of columns in the data), or finally, a \(G \times M\) matrix specifying each prior value separately. Defaults to 1, i.e, a uniform prior, for each value.

delta

Prior values for the mixture components in model. Defaults to 1, i.e., a uniform prior. May be single or vector valued (of length G).

start.vals

Denotes how class membership is to be assigned during the initial step of the algorithm. Two character values may be chosen, "single", which randomly assigns data points exclusively to one class, or "across", which assigns class membership via runif. Alternatively, class membership may be pre-specified, either as a vector of class membership, or as a matrix of probabilities. Defaults to "single".

counts.n

If data patterns have already been counted, a data matrix consisting of each unique data pattern can be supplied to the function, in addition to a vector counts.n, which supplies the corresponding number of times each pattern occurs in the data.

fit

Previously fitted models may be supplied in order to approximate standard error and unbiased point estimates. fit should be an object of class "blca.em". Defaults to NULL if no object is supplied.

iter

The maximum number of iterations that the algorithm runs over, for each bootstrapped sample. Will stop earlier if the algorithm converges.

The number of bootstrap samples to run. Defaults to 100.

relabel

Logical valued. As the data is recursively sampled, it is possible that label-switching may occur with respect to parameter estimates. If TRUE, parameter estimates are checked at each iteration, and relabeled if necessary. Defaults to FALSE.

verbose

Logical valued. If TRUE, the current number of completed bootstrap samples is printed at regular intervals.

verbose.update

If verbose=TRUE, verbose.update determines the periodicity with which updates are printed.

small

To ensure numerical stability a small constant is added to certain parameter estimates. Defaults to 1e-100.

Value

A list of class "blca.boot" is returned, containing:

call

The initial call passed to the function.

itemprob

The item probabilities, conditional on class membership.

classprob

The class probabilities.

Estimate of class membership for each unique datapoint.

itemprob.sd

Posterior standard deviation estimates of the item probabilities.

classprob.sd

Posterior standard deviation estimates of the class probabilities.

classprob.initial, itemprob.initial

Initial parameter values for classprob and itemprob, used to run over each bootstrapped sample.

samples

A list containing the parameter estimates for each bootstrapped sample.

logpost

The log-posterior of the estimated model.

BIC

The Bayesian Information Criterion for the estimated model.

AIC

Akaike's Information Criterion for the estimated model.

label

Logical value, indicating whether label switching has been checked for.

counts

The number of times each unique datapoint point occured.

prior

A list containing the prior values specified for the model.

Details

Bootstrapping methods can be used to estimate properties of a distribution's parameters, such as the standard error estimates, by constructing multiple resamples of an observed dataset, obtained by sampling with replacement from said dataset. The multiple parameter estimates obtained from these resamples may then be analysed. This method is implemented in blca.boot by first running blca.em over the full data set and then using the returned values of the item and class probabilities as the initial values when running the algorithm for each bootstrapped sample. Alternatively, initial parameter estimates may be specified using the fit argument.

Note that if a previously fitted model is supplied, then the prior values with which the model was fitted will be used for the sampling run, regardless of the values supplied to the prior arguments.

References

Wasserman, L, 22nd May 2007, All of Nonparametric Statistics, Springer-Verlag.

Examples

Run this code

# NOT RUN {
type1 <- c(0.8, 0.8, 0.2, 0.2)
type2 <- c(0.2, 0.2, 0.8, 0.8)
x <- rlca(1000, rbind(type1,type2), c(0.6,0.4))
fit.boot <- blca.boot(x, 2)
summary(fit.boot)

# }
# NOT RUN {
fit <- blca.em(x, 2, se=FALSE)
# }
# NOT RUN {
fit.boot <- blca.boot(x, 2, fit=fit)
# }
# NOT RUN {
fit.boot
# }
# NOT RUN {
plot(fit.boot, which=1:4)
# }

Run the code above in your browser using DataLab