DA: Discriminant analysis using the mixture of generalized hyperbolic distributions.

Description

Carries out model-based discriminant analysis using 5 different models: the mixture of multiple scaled generalized hyperbolic distributions (MGHD), the mixture of generalized hyperbolic factor analyzers (MGHFA), the mixture of multiple scaled generalized hyperbolic distributions (MSGHD),the mixture of convex multiple scaled generalized hyperbolic distributions (cMSGHD) and the mixture of coaelesed generalized hyperbolic distributions (MCGHD).

Usage

DA(train,trainL,test,testL,method="MGHD",starting="km",max.iter=100,
	eps=1e-2,q=2,scale=TRUE)

Arguments

train

A n1 x p matrix or data frame such that rows correspond to observations and columns correspond to variables of the training data set.

trainL

A n1 dimensional vector of membership for the units of the training set. If trainL[i]=k then observation belongs to group k.

test

A n2 x p matrix or data frame such that rows correspond to observations and columns correspond to variables of the test data set.

testL

A n2 dimensional vector of membership for the units of the test set. If testL[i]=k then observation belongs to group k.

method

( optional) A string indicating the method to be used form discriminant analysis , if not specified MGHD is used. Alternative methods are: MGHFA, MSGHD, cMSGHD, MCGHD.

starting

( optional) A string indicating the initialization criterion, if not specified kmeans clustering is used. Alternative methods are: hierarchical "hierarchical",random "random", kmedoids "kmedoids", and model based "modelBased"

max.iter

(optional) A numerical parameter giving the maximum number of iterations each EM algorithm is allowed to use.

eps

(optional) A number specifying the epsilon value for the convergence criteria used in the EM algorithms. For each algorithm, the criterion is based on the difference between the log-likelihood at an iteration and an asymptotic estimate of the log-likelihood at that iteration. This asymptotic estimate is based on the Aitken acceleration.

(optional) used only if MGHFA method is selected. A numerical parameter giving the number of factors.

scale

( optional) A logical value indicating whether or not the data should be scaled, true by default.

Value

A list with components

model

An S4 object of class MixGHD with the model parameters.

testMembership

A vector of integers indicating the membership of the units in the test set

ARItest

A value indicating the adjusted rand index for the test set.

ARItrain

A value indicating the adjusted rand index for the train set.

%% ~Describe the value returned %% If it is a LIST, use %% \item{comp1 }{Description of 'comp1'} %% \item{comp2 }{Description of 'comp2'} %% ...

References

R.P. Browne, and P.D. McNicholas (2015). A Mixture of Generalized Hyperbolic Distributions. Canadian Journal of Statistics, 43.2 176-198. C. Tortora, B.C. Franczak, R.P. Browne, and P.D. McNicholas (2019). A Mixture of Coalesced Generalized Hyperbolic Distributions. Journal of Classification (to appear). C.Tortora, P.D. McNicholas, and R.P. Browne (2016). Mixtures of Generalized Hyperbolic Factor Analyzers. Advanced in data analysis and classification 10(4) p.423-440.

Examples

Run this code

# NOT RUN {
##loading banknote data
data(banknote)
banknote[,1]=as.numeric(factor(banknote[,1]))


##divide the data in training set and test set
train=banknote[c(1:74,126:200),]
test=banknote[75:125,]

##model estimation
 model=DA(train[,2:7],train[,1],test[,2:7],test[,1],method="MGHD",max.iter=20)

#result
model$ARItest
# }

Run the code above in your browser using DataLab