estepRR: Function to perform the E-step for a Gaussian mixture distribution

Description

Compute the log PDF for each observation, the posterior probabilities and the objective function (total log-likelihood) for a Gaussian mixture distribution

Value

The function returns a list with the following elements:

obj The value of the objective function (total log-likelihood)
postprob an n-by-k matrix with the posterior probablilities
logpdf a vector of length n containing the log PDF for each observation

Arguments

ll: Rcpp::NumericMatrix, n-by-k where n is the number of observations and k is the number of clusters.

Details

Formally a mixture model corresponds to the mixture distribution that represents the probability distribution of observations in the overall population. Mixture models are used to make statistical inferences about the properties of the sub-populations given only observations on the pooled population, without sub-population-identity information. Mixture modeling approaches assume that data at hand $$y_1, ..., y_n$ in $R^p$ come from a probability distribution with density given by the sum of k components $$\sum_{j=1}^k \pi_j \phi( \cdot, \theta_j)$$ with $\phi( \cdot, \theta_j)$ being the $p$-variate (generally multivariate normal) densities with parameters $\theta_j$, $j=1, \ldots, k$. Generally $\theta_j= (\mu_j, \Sigma_j)$ where $\mu_j$ is the population mean and $\Sigma_j$ is the covariance matrix for component $j$. $\pi_j$ is the (prior) probability of component $j$. The objective function is obj is equal to $$ obj = \log \left( \prod_{i=1}^n \sum_{j=1}^k \pi_j \phi (y_i; \; \theta_j) \right) $$

$$ obj = \sum_{i=1}^n \log \left( \sum_{j=1}^k \pi_j \phi (y_i; \; \theta_j) \right) $$ where $k$ is the number of components of the mixture and $\pi_j$ are the component probabilitites and $\theta_j$ are the parameters of the $j$-th mixture component.

References

McLachlan, G.J.; Peel, D. (2000). Finite Mixture Models. Wiley. ISBN 0-471-00626-2

Examples

Run this code

##      Generate two Gaussian normal distributions
##      and do not produce plots

       mu1 = c(1,2)
       sigma1 = matrix(c(2, 0, 0, .5), nrow=2, byrow=TRUE)    #[2 0; 0 .5];
       mu2 = c(-3, -5)
       sigma2 = matrix(c(1, 0, 0, 1), nrow=2, byrow=TRUE)
       n1 = 100
       n2 = 200
       Y = rbind(MASS::mvrnorm(n1, mu1, sigma1), 
                 MASS::mvrnorm(n2, mu2, sigma2))
       k = 2
       pi = c(1/3, 2/3)
       mu = rbind(mu1, mu2)
       sigma = array(0, dim=c(2,2,2))
       sigma[,,1] = sigma1
       sigma[,,2] = sigma2
       
       ll = matrix(0, nrow=n1+n2, ncol=2)
       for(j in 1:k)
           ll[,j] = log(pi[j]) +  tclust:::dmvnrm(Y, mu[j,], sigma[,,j])

       dd = tclust:::estepRR(ll)
       dd$obj
       dd$logpdf
       dd$postprob

Run the code above in your browser using DataLab