mclustDA: MclustDA discriminant analysis.

Description

MclustDA training and testing.

Usage

mclustDA(trainingData, labels, testData, G=1:6, verbose = FALSE)

Arguments

trainingData

A numeric vector, matrix, or data frame of training observations. Categorical variables are not allowed. If a matrix or data frame, rows correspond to observations and columns correspond to variables.

labels

A numeric or character vector assigning a class label to each training observation.

testData

An integer vector specifying the numbers of mixture components (clusters) to be considered for each class. Default: 1:6.

verbose

A logical variable telling whether or not to print an indication that the function is in the training phase, which may take some time to complete.

Value

A list with the following components:
testClassificationmclustDA classification of the test data.
trainingClassificationmclustDA classification of the training data.
VofIindexMeila's Variation of Information index, to compare classification of the training data to the known labels.
summaryGives the best model and number of clusters for each training class.
modelsThe mixture models used to fit the known classes.
postProbA matrix whose [i,k]th entry is the probability that observation i in the test data belongs to the kth class.

Details

The following models are compared in Mclust: "E" for spherical, equal variance (one-dimensional) "V" for spherical, variable variance (one-dimensional) "EII": spherical, equal volume "VII": spherical, unequal volume "EEI": diagonal, equal volume, equal shape "VVI": diagonal, varying volume, varying shape "EEE": ellipsoidal, equal volume, shape, and orientation "VVV": ellipsoidal, varying volume, shape, and orientation mclustDA is a simplified function combining mclustDAtrain and mclustDAtest and their summaries.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association 97:611-631. See http://www.stat.washington.edu/mclust. C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density estimation and discriminant analysis. Technical Report, Department of Statistics, University of Washington. See http://www.stat.washington.edu/mclust.

M. Meila (2002). Comparing clusterings. Technical Report 418, Department of Statistics, University of Washington. See http://www.stat.washington.edu/www/research/reports.

Examples

Run this code

n <- 250 ## create artificial data
set.seed(0)
x <- rbind(matrix(rnorm(n*2), n, 2) %*% diag(c(1,9)),
           matrix(rnorm(n*2), n, 2) %*% diag(c(1,9))[,2:1])
xclass <- c(rep(1,n),rep(2,n))

par(pty = "s")
mclust2Dplot(x, classification = xclass, type="classification", ask=FALSE)

odd <- seq(from = 1, to = 2*n, by = 2)
even <- odd + 1
testMclustDA <- mclustDA(trainingData = x[odd, ], labels = xclass[odd], 
                         testData = x[even,])

clEven <- testMclustDA$testClassification ## classify training set
compareClass(clEven,xclass[even])
plot(testMclustDA, trainingData = x[odd, ], labels = xclass[odd], 
              testData = x[even,])

Run the code above in your browser using DataLab