Aggregated Data Ensemble Clustering (ADEC) is a direct clustering multi-source technique. ADEC is an iterative procedure which starts with the merging of the data sets. In each iteration, a random sample of the features is selected and/or a resulting dendrogram is divided into k clusters for a range of values of k.
ADEC(List, distmeasure = "tanimoto", normalize = FALSE, method = NULL,
t = 10, r = NULL, nrclusters = NULL, clust = "agnes",
linkage = "flexible", alpha = 0.625)
A list of data matrices of the same type. It is assumed the rows are corresponding with the objects.
Choice of metric for the dissimilarity matrix (character). Should be one of "tanimoto", "euclidean", "jaccard", "hamming". Defaults to "tanimoto".
Logical. Indicates whether to normalize the distance matrices or not, defaults to FALSE. This is recommended if different distance types are used. More details on normalization in Normalization
.
A method of normalization. Should be one of "Quantile","Fisher-Yates", "standardize","Range" or any of the first letters of these names. Default is NULL.
The number of iterations. Defaults to 10.
The number of features to take for the random sample. If NULL (default), all features are considered.
A sequence of numbers of clusters to cut the dendrogram in. If NULL (default), the function stops.
Choice of clustering function (character). Defaults to "agnes".
Choice of inter group dissimilarity (character). Defaults to "flexible".
The parameter alpha to be used in the "flexible" linkage of the agnes function. Defaults to 0.625 and is only used if the linkage is set to "flexible".
The returned value is a list with the following three elements.
Fused data matrix of the data matrices
The resulting co-association matrix
The resulting clustering
If r is specified and nrclusters is a fixed number, only a random sampling of the features will be performed for the t iterations (ADECa). If r is NULL and the nrclusters is a sequence, the clustering is performedon all features and the dendrogam is divided into clusters for the values of nrclusters (ADECb). If both r is specified and nrclusters is a sequence, the combination is performed (ADECc). After every iteration, either be random sampling, multiple divisions of the dendrogram or both, an incidence matrix is set up. All incidence matrices are summed and represent the distance matrix on which a final clustering is performed.
Fodeh2013IntClust
# NOT RUN {
data(fingerprintMat)
data(targetMat)
L=list(fingerprintMat,targetMat)
MCF7_ADEC=ADEC(List=L,distmeasure="tanimoto",normalize=FALSE,method=NULL,t=100,
r=100,nrcluster=seq(1,10,1),clust="agnes",linkage="flexible",alpha=0.625)
# }
Run the code above in your browser using DataLab