This function implements the variable selection in model-based clustering using a lasso ranking on the variables as described in Sedki et al (2014). The variable ranking step uses the penalized EM algorithm of Zhou et al (2009).
SelvarClustLasso(x, nbcluster, lambda, rho, type, rank, hsize, criterion,
models, rmodel, imodel, nbcores)
matrix or data frame containing quantitative data. Rows correspond to observations and columns correspond to variables
numeric listing of the number of clusters (must be positive integers)
numeric listing of the tuning parameters for \(\ell_1\) mean penalty
numeric listing of the tuning parameters for \(\ell_1\) precision matrix penalty
character defining the type of ranking procedure, must be "lasso" or "likelihood". Default is "lasso"
integer listing the rank of variables with (the length this vector must be equal to the number of variables in the dataset)
optional parameter make less strength the forward and backward algorithms to select \(S\) and \(W\) sets
list of character defining the criterion to select the best model. The best model is the one with the highest criterion value. Possible values: "BIC", "ICL", c("BIC", "ICL"). Default is "BIC"
list of character defining the covariance matrix form for the linear regression of \(U\) on the \(R\) set of variables. Possible values: "LI" for spherical form, "LB" for diagonal form and "LC" for general form. Possible values: "LI", "LB", "LC", c("LI", "LB"), c("LI", "LC"), c("LB", "LC") and c("LI", "LB", "LC"). Default is c("LI", "LB", "LC")
list of character defining the covariance matrix form for independent variables \(W\). Possible values: "LI" for spherical form and "LB" for diagonal form. Possible values: "LI", "LB", c("LI", "LB"). Default is c("LI", LB")
number of CPUs to be used when parallel computing is used (default is 2)
for each criterion BIC or ICL
The selected set of relevant clustering variables
The selected subset of regressors
The selected set of redundant variables
The selected set of independent variables
The criterion value for the selected model
The selected number of clusters
The selected Gaussian mixture form
The selected covariance form for the regression
The selected covariance form for the independent Gaussian distribution
Rmixmod ['>Parameter
] object containing all mixture parameters
Matrix containing all regression coefficients, each column is the regression coefficients of one redundant variable on the selected R set
Matrix containing the conditional probabilities of belonging to each cluster for all observations
Vector of length n containing the cluster assignments of the n observations according to the Maximum-a-Posteriori rule
Zhou, H., Pan, W., and Shen, X., 2009. "Penalized model-based clustering with unconstrained covariance matrices". Electronic Journal of Statistics, vol. 3, pp.1473-1496.
Maugis, C., Celeux, G., and Martin-Magniette, M. L., 2009. "Variable selection in model-based clustering: A general variable role modeling". Computational Statistics and Data Analysis, vol. 53/11, pp. 3872-3882.
Sedki, M., Celeux, G., Maugis-Rabusseau, C., 2014. "SelvarMix: A R package for variable selection in model-based clustering and discriminant analysis with a regularization approach". Inria Research Report available at http://hal.inria.fr/hal-01053784
# NOT RUN {
## wine data set
## n = 178 observations, p = 27 variables
data(wine)
set.seed(123)
obj <- SelvarClustLasso(x=wine[,1:27], nbcluster=1:5, nbcores=4)
summary(obj)
print(obj)
# }
Run the code above in your browser using DataLab