TDDclust: Trimmed clustering based on L1 data depth

Description

This is the trimmed version of the clustering algorithm based on the L1 depth proposed by Rebecka Jornsten (2004). She segments all the observations in clusters, and assigns to each point z in the data space, the L1 depth value regarding its cluster. A trimmed procedure is incorporated to remove the more extreme individuals of each cluster (those one with the lowest depth values), in line with trimowa.

Usage

TDDclust(data,numClust,lambda,Th,niter,T0,simAnn,alpha,data1,verbose=TRUE)

Value

A list with the following elements:

NN: Cluster assignment, NN[1,] is the final partition.

cases: Anthropometric cases (the multivariate median cluster representatives).

DD: Depth values of the observations (only if there are trimmed observations).

Cost: Final value of the optimal partition.

discarded: Discarded (trimmed) observations.

klBest: Iteration in which the optimal partition was found.

Arguments

data: Data frame. Each row corresponds to an observation, and each column corresponds to a variable. All variables must be numeric.
numClust: Number of clusters.
lambda: Tuning parameter that controls the influence the data depth has over the clustering, see Jornsten (2004).
Th: Threshold for observations to be relocated, usually set to 0.
niter: Number of random initializations (iterations).
T0: Simulated annealing parameter. It is the current temperature in the simulated annealing procedure.
simAnn: Simulated annealing parameter. It is the decay rate, default 0.9.
alpha: Proportion of trimmed sample.
data1: The same data frame as data, used to incorporate the trimmed observations into the rest of them for the next iteration.
verbose: A logical specifying whether to provide descriptive output about the running process. Default TRUE.

Author

This function has been defined from the original functions developed by Rebecka Jornsten, which were available freely on http://www.stat.rutgers.edu/home/rebecka/DDcl/. However, the link to this page doesn't currently exist as a result of a website redesign.

References

Jornsten R., (2004). Clustering and classification based on the L1 data depth, Journal of Multivariate Analysis 90, 67--89

Vinue, G., and Ibanez, M. V., (2014). Data depth and Biclustering applied to anthropometric data. Exploring their utility in apparel design. Technical report.

Examples

Run this code

#In the interests of simplicity of the computation involved, only 15 points are selected:
dataTDDcl <- sampleSpanishSurvey[1 : 15, c(2, 3, 5)]  
dataTDDcl_aux <- sampleSpanishSurvey[1 : 15, c(2, 3, 5)]

numClust <- 3 ; alpha <- 0.01 ; lambda <- 0.5 ; niter <- 2
Th <- 0 ; T0 <- 0 ; simAnn <- 0.9  

#For reproducing results, seed for randomness:
#suppressWarnings(RNGversion("3.5.0"))
#set.seed(2014)
res_TDDcl <- TDDclust(dataTDDcl, numClust, lambda, Th, niter, T0, simAnn, 
                      alpha, dataTDDcl_aux,FALSE) 

prototypes <- anthrCases(res_TDDcl)

table(res_TDDcl$NN[1,]) 
res_TDDcl$Cost
res_TDDcl$klBest

trimmed <- trimmOutl(res_TDDcl)

Run the code above in your browser using DataLab