Learn R Programming

clusternomics (version 0.1.0)

contextCluster: Clusternomics: Context-dependent clustering

Description

This function fits the context-dependent clustering model to the data using Gibbs sampling. It allows the user to specify a different number of clusters on the global level, as well as on the local level.

Usage

contextCluster(datasets, clusterCounts, dataDistributions = "diagNormal", prior = NULL, maxIter = 1000, burnin = NULL, lag = 3, verbose = FALSE)

Arguments

datasets
List of data matrices where each matrix represents a context-specific dataset. Each data matrix has the size N times M, where N is the number of data points and M is the dimensionality of the data. The full list of matrices has length C. The number of data points N must be the same for all data matrices.
clusterCounts
Number of cluster on the global level and in each context. List with the following structure: clusterCounts = list(global=global, context=context) where global is the number of global clusters, and context is the list of numbers of clusters in the individual contexts (datasets) of length C where context[c] is the number of clusters in dataset c.
dataDistributions
Distribution of data in each dataset. Can be either a list of length C where dataDistributions[c] is the distribution of dataset c, or a single string when all datasets have the same distribution. Currently implemented distribution is the 'diagNormal' option for multivariate Normal distribution with diagonal covariance matrix.
prior
Prior distribution. If NULL then the prior is estimated using the datasets. The 'diagNormal' distribution uses the Normal-Gamma distribution as a prior for each dimension.
maxIter
Number of iterations of the Gibbs sampling algorithm.
burnin
Number of burn-in iterations that will be discarded. If not specified, the algorithm discards the first half of the maxIter samples.
lag
Used for thinning the samples.
verbose
Print progress, by default FALSE.

Value

Returns list containing the sequence of MCMC states and the log likelihoods of the individual states.

Examples

Run this code
# Example with simulated data (see vignette for details)
# Number of elements in each cluster
groupCounts <- c(50, 10, 40, 60)
# Centers of clusters
means <- c(-1.5,1.5)
testData <- generateTestData_2D(groupCounts, means)
datasets <- testData$data

# Fit the model
# 1. specify number of clusters
clusterCounts <- list(global=10, context=c(3,3))
# 2. Run inference
# Number of iterations is just for demonstration purposes, use
# a larger number of iterations in practice!
results <- contextCluster(datasets, clusterCounts,
     maxIter = 10, burnin = 5, lag = 1,
     dataDistributions = 'diagNormal',
     verbose = TRUE)

# Extract results from the samples
# Final state:
state <- results$samples[[length(results$samples)]]
# 1) assignment to global clusters
globalAssgn <- state$Global
# 2) context-specific assignmnets- assignment in specific dataset (context)
contextAssgn <- state[,"Context 1"]

Run the code above in your browser using DataLab