Learn R Programming

Rlda (version 0.1.0)

rlda.binomial: LDA with binomial entry and Stick-Breaking prior.

Description

This method implements the Latent Dirichlet Allocation with Stick-Breaking prior for binomial data. rlda.binomial works with frequency data.frame and also a population data.frame.

Usage

rlda.binomial(data, pop, n_community, alpha0, alpha1, gamma,
  n_gibbs, ll_prior = TRUE, display_progress = TRUE)

Arguments

data
A abundance data.frame where each row is a sampling unit (i.e. Plots, Locations, Time, etc.) and each column is a categorical type of element (i.e. Species, Firms, Issues, etc.).
pop
A population data.frame where each row is a sampling unit (i.e. Plots, Locations, Time, etc.) and each column is a categorical type of element (i.e. Species, Firms, Issues, etc.). The elements inside this data.frame must all be greater than the elements inside the data data.frame.
n_community
Total number of communities to return. It must be less than the total number of columns inside the data and pop data.frame.
alpha0
Hyperparameter associated with the Beta prior Beta(alpha0, alpha1).
alpha1
Hyperparameter associated with the Beta prior Beta(alpha0, alpha1).
gamma
Hyperparameter associated with the Stick-Breaking prior.
n_gibbs
Total number of Gibbs Samples.
ll_prior
boolean scalar, TRUE if the log-likelihood must be computed using also the priors or FALSE otherwise.
display_progress
boolean scalar, TRUE if the Progress Bar must be showed and FALSE otherwise.

Value

A R List with three elements:
Theta
The individual probability for each observation (ex: location) belong in each cluster (ex: community). It is a matrix with dimension equal n_gibbs by nrow(data) * n_community
Phi
The individual probability for each variable (ex: Specie) belong in each cluster (ex: community). It is a matrix with dimension equal n_gibbs by ncol(data) * n_community
LogLikelihood
The vector of Log-Likelihoods compute for each Gibbs Sample.

Details

rlda.binomial uses a modified Latent Dirichlet Allocation method to construct Mixed-Membership Clusters using Bayesian Inference. The data must be a non-empty data.frame with the frequencies for each variable (column) in each observation (row). The pop must be a non-empty data.frame with the frequencies for each variable (column) in each observation (row) greater than the entries inside data data.frame.

References

  • Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent dirichlet allocation." Journal of machine Learning research 3.Jan (2003): 993-1022. http://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf
  • Valle, Denis, et al. "Decomposing biodiversity data using the Latent Dirichlet Allocation model, a probabilistic multivariate statistical method." Ecology letters 17.12 (2014): 1591-1601.

See Also

rlda.multinomial, rlda.bernoulli

Examples

Run this code
	## Not run: ------------------------------------
# 		library(Rlda)
# 		# Read the SP500 data
# 		data(sp500)
# 		# Create size
# 		spSize <- as.data.frame(matrix(100,
# 							  ncol = ncol(sp500),
# 							  nrow = nrow(sp500)))
# 		# Set seed
# 		set.seed(5874)
# 		# Hyperparameters for each prior distribution
# 		gamma  <- 0.01
# 		alpha0 <- 0.01
# 		alpha1 <- 0.01
# 		# Execute the LDA for the Binomial entry
# 		res <- rlda.binomial(data = sp500, pop = spSize, n_community = 10,
# 		alpha0 = alpha0, alpha1 = alpha1, gamma = gamma,
# 		n_gibbs = 500, ll_prior = TRUE, display_progress = TRUE)
# 	
## ---------------------------------------------

Run the code above in your browser using DataLab