Learn R Programming

hdi (version 0.1-9)

clusterGroupBound: Hierarchical structure group tests in linear model

Description

Computes confidence intervals for the l1-norm of groups of linear regression coefficients in a hierarchical clustering tree.

Usage

clusterGroupBound(x, y, method = "average",
                  dist = as.dist(1 - abs(cor(x))), alpha = 0.05,
                  eps = 0.1, hcloutput, nsplit = 11,
                  s = min(10, ncol(x) - 1),
                  silent = FALSE, setseed = TRUE, lpSolve = TRUE)

Arguments

x

numeric design matrix of the regression \(n \times p\) with \(p\) columns for \(p\) predictor variables and \(n\) rows corresponding to \(n\) observations.

y

numeric response variable of length \(n\).

method

a character string; the method used for constructing the hierarchical clustering tree (default: "average" for “average linkage”) via hclust. Alternatively, you can provide your own hierarchical clustering through the optional argument hcloutput.

dist

a distance matrix can be specified on which the hierarchical clustering will be based (see dist). The default option is that the distance between variables will be calculated as 1 less the absolute correlation matrix. Alternatively, you can provide your own hierarchical clustering through the optional argument hcloutput.

alpha

numeric level in \((0, 1)\) at which the test / confidence intervals are to be constructed.

eps

a level of eps*alpha is used and the values of different splits are aggregated using the (1-eps) quantile. See reference below for more details.

hcloutput

optionally, the value of a hclust() call. If it is provided, the arguments dist and method are ignored.

nsplit

the number of data splits used.

s

the dimensionality of the projection that is used. Lower values lead to faster computation and if \(n > 50\), then s is set to 50 if left unspecified, to avoid lengthy computations.

silent

logical enabling progress output.

setseed

a logical; if this is true (recommended), then the same random seeds are used for all groups, which makes the confidence intervals simultaneously valid over all groups of variables tested.

lpSolve

logical; only set it to false if lpSolve() is not working on the current machine: setting it to false will result in much slower computations; only use on small problems.

Value

Returns a list with components

groupNumber

The index of the group tested in the original hierarchical clustering tree

members

A list containing the variables that belong into each testes group

noMembers

A vector containing the number of members in each group

lowerBound

The lower bound on the l1-norm in each group

position

The position on the x-axis of each group (used for plotting)

leftChild

Gives the index of the group that corresponds to the left child node in the tested tree (negative values correspond to leaf nodes)

rightChild

Same as leftCHild for the right child of each node

isLeaf

Logical vector. Is TRUE for a group if it is a leaf node in the tested tree or if both child nodes have a zero lower bound on their group l1-norm

References

Meinshausen, N. (2015); JRSS B, see groupBound.

See Also

Use groupBound to compute the lower bound for selected groups of variables whereas you use this clusterGroupBound to test all groups in a hierarchical clustering tree.

Examples

Run this code
# NOT RUN {
<!-- %% the following code is in donttest environment to -->
# }
# NOT RUN {
<!-- %% speed-up computing -->
# }
# NOT RUN {
<!-- %% >>> copy any changes to "../tests/ex-clusterGroupBound.R" <<< to ensure -->
# }
# NOT RUN {
<!-- %% code is running -->
# }
# NOT RUN {
## Create a regression problem with correlated design (n = 10, p = 3):
## a block of size 2 and a block of size 1, within-block correlation is 0.99

set.seed(29)
p   <- 3
n   <- 10

Sigma <- diag(p)
Sigma[1,2] <- Sigma[2,1] <- 0.99

x <- matrix(rnorm(n * p), nrow = n) %*% chol(Sigma)

## Create response with active variable 1
beta    <- rep(0, p)
beta[1] <- 5

y  <- as.numeric(x %*% beta + rnorm(n))
# }
# NOT RUN {
out <- clusterGroupBound(x, y, nsplit = 4) ## use larger value for nsplit!

## Plot and print the hierarchical group-test
plot(out)
print(out)
out$members
out$lowerBound
# }

Run the code above in your browser using DataLab