Learn R Programming

Rdimtools (version 1.0.6)

do.fssem: Feature Subset Selection using Expectation-Maximization

Description

Feature Subset Selection using Expectation-Maximization (FSSEM) takes a wrapper approach to feature selection problem. It iterates over optimizing the selection of variables by incrementally including each variable that adds the most significant amount of scatter separability from a labeling obtained by Gaussian mixture model. This method is quite computation intensive as it pertains to multiple fitting of GMM. Setting smaller max.k for each round of EM algorithm as well as target dimension ndim would ease the burden.

Usage

do.fssem(
  X,
  ndim = 2,
  max.k = 10,
  preprocess = c("null", "center", "scale", "cscale", "whiten", "decorrelate")
)

Arguments

X

an \((n\times p)\) matrix or data frame whose rows are observations and columns represent independent variables.

ndim

an integer-valued target dimension.

max.k

maximum number of clusters for GMM fitting with EM algorithms.

preprocess

an additional option for preprocessing the data. Default is "null". See also aux.preprocess for more details.

Value

a named list containing

Y

an \((n\times ndim)\) matrix whose rows are embedded observations.

trfinfo

a list containing information for out-of-sample prediction.

projection

a \((p\times ndim)\) whose columns are basis for projection.

References

dy_feature_2004Rdimtools

Examples

Run this code
# NOT RUN {
## run FSSEM with IRIS dataset - select 2 of 4 variables
data(iris)
irismat = as.matrix(iris[,2:4])

## select 50 observations for CRAN-purpose small example
id50 = sample(1:nrow(irismat), 50)
sel.dat = irismat[id50,]
sel.lab = as.factor(iris[id50,5])

## run and visualize
out0 = do.fssem(sel.dat, ndim=2, max.k=3)
opar = par(no.readonly=TRUE)
plot(out0$Y, main="small run", col=sel.lab, pch=19)
par(opar)

# }
# NOT RUN {
## NOT-FOR-CRAN example; run at your machine !
## try different maximum number of clusters
out3 = do.fssem(irismat, ndim=2, max.k=3)
out6 = do.fssem(irismat, ndim=2, max.k=6)
out9 = do.fssem(irismat, ndim=2, max.k=9)

## visualize
cols = as.factor(iris[,5])
opar = par(no.readonly=TRUE)
par(mfrow=c(3,1))
plot(out3$Y, main="max k=3", col=cols)
plot(out6$Y, main="max k=6", col=cols)
plot(out9$Y, main="max k=9", col=cols)
par(opar)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab