Learn R Programming

SSL (version 0.1)

sslGmmEM: Gaussian Mixture Model with an EM Algorithm

Description

sslGmmEM implements Gaussian Mixture Model with an EM algorithm, and weights the unlabeled data by introducing lambda-EM technique.

Usage

sslGmmEM(xl, yl, xu, seed = 0, improvement = 1e-04, p = 0.3)

Arguments

xl
a n * p matrix or data.frame of labeled data
yl
a n * 1 integer vector of labels.
xu
a m * p matrix or data.frame of unlabeled data
seed
an integer specifying random number generation state for spliting labeled data into training set and cross-validation set.
improvement
numeric. Minimal allowed improvement of parameters.
p
percentage of labeled data are splitted into cross-validation set.

Value

a list of values is returned:

Fields

para
a numeric estimated parameter matrix in which the column represents variables and the row represents estimated means and standard deviation of each class. for example, the first and second row represents the mean and standard deviation of the first class, the third and fourth row represents the mean and standard deviation of the second class,etc.
classProb
the estimated class probabilities
yu
the predicted label of unlabeled data
optLambda
the optimal lambda chosen by cross-validation

Details

sslGmmEM introduces unlabeled data into parameter estimation process. The weight lambda is chosen by cross-validation. The Gaussian Mixture Model is estimated based on maximum log likelihood function with an EM algorithm. The E-step computes the probabilities of each class for every observation. The M-step computes parameters based on probabilities obtained in the E-step.

References

Kamal Nigam, Andrew Mccallum, Sebastian Thrun, Tom Mitchell(1999) Text Classification from Labeled and Unlabeled Documents using EM

Examples

Run this code
data(iris)
xl<-iris[,-5]
#Suppose we know the first twenty observations of each class
#and we want to predict the remaining with Gaussian Mixture Model
#1 setosa, 2 versicolor, 3 virginica
yl<-rep(1:3,each=20)
known.label <-c(1:20,51:70,101:120)
xu<-xl[-known.label,]
xl<-xl[known.label,]
l<-sslGmmEM(xl,yl,xu)

Run the code above in your browser using DataLab