Learn R Programming

mvtboost (version 0.5.0)

mvtb.cluster: Clustering the covariance explained or relative influence matrix

Description

The 'covariance explained' by each predictor is the reduction in covariance between each pair of outcomes due to splitting on each predictor over all trees ($covex). To aid in the interpretability of the covariance explained matrix, this function clusters the rows (pairs of outcomes) and the columns (predictors) of object$covex so that groups of predictors that explain similar pairs of covariances are closer together. This function can also be used to cluster the relative influence matrix. In this case, the rows (usually outcomes) and columns (usually predictors) with similar values will be clustered together.

Usage

mvtb.cluster(x, clust.method = "complete", dist.method = "euclidean", plot = FALSE, ...)

Arguments

x
Any matrix, such as mvtb.covex(object), or mvtb.ri(object).
clust.method
clustering method for rows and columns. This should be (an unambiguous abbreviation of) one of "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC).
dist.method
method for computing the distance between two lower triangular covariance matrices. This must be one of "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski". Any unambiguous substring can be given.
plot
Produces a heatmap of the covariance explained matrix. see ?mvtb.heat
...
Arguments passed to mvtb.heat

Value

clustered covariance matrix, with re-ordered rows and columns.

Details

The covariance explained by each predictor is only unambiguous if the predictors are uncorrelated and interaction.depth = 1. If predictors are not independent, the decomposition of covariance explained is only approximate (like the decomposition of R^2 by each predictor in a linear model). If interaction.depth > 1, the following heuristic is used: the covariance explained by the tree is assigned to the predictor with the largest influence in each tree.

Note that different distances measures (e.g. "manhattan", "euclidean") provide different ways to measure (dis)similarities between the covariance explained patterns for each predictor. See ?dist for further details. After the distances have been computed, hclust is used to form clusters. Different clustering methods (e.g. "ward.D", "complete") generally group rows and columns differently (see ?hclust for further details). It is suggested to try different distance measures and clustering methods to obtain the most interpretable solution. The defaults are for "euclidean" distances and "complete" clustering. Transposing the rows and columns may also lead to different results.

A simple heatmap of the clustered matrix can be obtained by setting plot=TRUE. Details of the plotting procedure are available via mvtb.heat.

covex values smaller than getOption("digits") are truncated to 0. Note that it is possible to obtain negative variance explained due to sampling fluctuation. These can be truncated or ignored.

See Also

mvtb.heat