This function allows to collapse the rows and columns of the input contingency table on the basis of the results of a hierarchical clustering.
table.collapse(data, graph = FALSE)
Name of the dataset (must be in dataframe format)
Logical (TRUE/FALSE); it takes TRUE if the user wants the row and colum profiles dendrograms to be produced.
The function returns a list containing the input table, the rows-collapsed table, the columns-collapsed table, and a table with both rows and columns collapsed. It optionally returns two dendrograms (one for the row profiles, one for the column profiles) representing the clusters.
The hierarchical clustering is obtained using the FactoMineR's 'HCPC()' function. Rationale: clustering rows and/or columns of a table could interest the users who want to know where a "significant association is concentrated" by "collecting together similar rows (or columns) in discrete groups" (Greenacre M, Correspondence Analysis in Practice, Boca Raton-London-New York, Chapman&Hall/CRC 2007, pp. 116, 120). Rows and/or columns are progressively aggregated in a way in which every successive merging produces the smallest change in the table's inertia. The underlying logic lies in the fact that rows (or columns) whose merging produces a small change in table's inertia have similar profiles. This procedure can be thought of as maximizing the between-group inertia and minimizing the within-group inertia. A method essentially similar is that provided by the 'FactoMineR' package (Husson F, Le S, Pages J, Exploratory Multivariate Analysis by Example Using R, Boca Raton-London-New York, CRC Press, pp. 177-185). The cluster solution is based on the following rationale: a division into Q (i.e., a given number of) clusters is suggested when the increase in between-group inertia attained when passing from a Q-1 to a Q partition is greater than that from a Q to a Q+1 clusters partition. In other words, during the process of rows (or columns) merging, if the following aggregation raises highly the within-group inertia, it means that at the further step very different profiles are being aggregated.
# NOT RUN {
data(greenacre_data)
#collapse the table, store the results into an object called 'res', and return 2 dendrograms
res <- table.collapse(greenacre_data, graph=TRUE)
# }
Run the code above in your browser using DataLab