generatePrior: Generate a basic prior distribution for the datasets.

Description

Creates a basic prior distribution for the clustering model, assuming a unit prior covariance matrix for clusters in each dataset.

Usage

generatePrior(datasets, distributions = "diagNormal", globalConcentration = 0.1, localConcentration = 0.1)

Arguments

datasets

List of data matrices where each matrix represents a context-specific dataset. Each data matrix has the size N times M, where N is the number of data points and M is the dimensionality of the data. The full list of matrices has length C. The number of data points N must be the same for all data matrices.

distributions

Distribution of data in each dataset. Can be either a list of length C where dataDistributions[c] is the distribution of dataset c, or a single string when all datasets have the same distribution. Currently implemented distribution is the 'diagNormal' option for multivariate Normal distribution with diagonal covariance matrix.

globalConcentration

Prior concentration parameter for the global clusters. Small values of this parameter give larger prior probability to smaller number of clusters.

localConcentration

Prior concentration parameter for the local context-specific clusters. Small values of this parameter give larger prior probability to smaller number of clusters.

Value

Returns the prior object that can be used as an input for the contextCluster function.

Examples

Run this code

# Example with simulated data (see vignette for details)
nContexts <- 2
# Number of elements in each cluster
groupCounts <- c(50, 10, 40, 60)
# Centers of clusters
means <- c(-1.5,1.5)
testData <- generateTestData_2D(groupCounts, means)
datasets <- testData$data

# Generate the prior
fullDataDistributions <- rep('diagNormal', nContexts)
prior <- generatePrior(datasets, fullDataDistributions, 0.01, 0.1)

# Fit the model
# 1. specify number of clusters
clusterCounts <- list(global=10, context=c(3,3))
# 2. Run inference
# Number of iterations is just for demonstration purposes, use
# a larger number of iterations in practice!
results <- contextCluster(datasets, clusterCounts,
     maxIter = 10, burnin = 5, lag = 1,
     dataDistributions = 'diagNormal', prior = prior,
     verbose = TRUE)

Run the code above in your browser using DataLab