Learn R Programming

hsm (version 0.2.0)

hsm: Solves proximal operator of latent group lasso in Hierarchical Sparse Modeling.

Description

Solves proximal operator of the latent group Lasso appearing in Yan & Bien (2017) $$min_\beta || y - \beta ||_2^2 + lam * \Omega(\beta; w)$$ where \(\Omega(\beta; w) = min_{sum_l v^(l) = \beta; supp(v^(l)) \subset g_l} w_l * || v^(l) ||_2\) is known as the latent group lasso penalty as defined in Jacob et al. (2009). In the problem, \(\beta\) is a length-\(p\) parameter vector and its elements are embedded in a directed acyclic graph (DAG). The desired sparsity pattern is a subgraph of the DAG such that if \(\beta_i\) embedded in node \(i\) are set to zero, all the parameters embedded in the descendant nodes of \(i\) are zeroed out as well. The problem is solved by breaking down the DAG into several path graphs for which closed-form solutions are available for the proximal operator corresponding with each path graph, and performing block coordinate descent across the path graphs. See Section 4.3 of the paper for more details and explanations.

Usage

hsm(y, lam, w = NULL, map, var, assign = NULL, w.assign = NULL,
  get.penalval = FALSE, tol = 1e-08, maxiter = 10000, beta.ma = NULL)

Arguments

y

Length-p vector used in proximal operator.

lam

Non-negative tuning parameter that controls the sparsity level.

w

Length-n.nodes vector of positive values for which w_l gives the weight for g_l, where n.nodes is the number of nodes in DAG. If this is NULL, w_l = sqrt(|g_l|) will be used. Necessary condition is w_l increases with |g_l|.

map

Matrix of n.edges-by-2 dimension, where n.edges is the number of directed edges in DAG. The first column has indices of nodes that edges directing from, whereas the second column gives the indices of nodes the corresponding edges directing towards. If a node indexed i does not have edges linked to it, record the corresponding row as map[i, NA].

var

Length-n.nodes list for which the lth element contains the indices of variables embedded in the lth node.

assign

Matrix of p columns that gives the assignments of variables over different path graphs. Each row of assign corresponds to a path graph decomposed from DAG. If this is NULL, hsm first break down DAG into different path graphs, and then give value to assign afterwards, based on map and var; otherwise, map and var are ignored. Refer to paths for more details.

w.assign

List of length nrow(assign), for which the lth element contains the weights corresponding to the lth row of assign (the lth path graph). For example, if the lth path graph is made up of three nodes indexed with {3, 4, 6, 8}, w.assign[[l]] = {w_3, w_4, w_6, w_8}. If this is NULL, hsm will give value to w.assign, along with assign; otherwise, map and var are ignored. Refer to paths for more details.

get.penalval

If TRUE, \(lam * \Omega(\beta; w)\) are computed and returned, otherwise NA is returned.

tol

Tolerance level used in BCD. Convergence is assumed when no parameter of interest in each path graph changes by more than tol in BCD.

maxiter

Upperbound of the number of iterations that BCD to perform.

beta.ma

n.paths-by-p matrix of initialization of beta value in the n.paths path graphs decomposed from DAG. Do not use unless you know the decomposition of DAG.

Value

Returns an estimate of the solution to the proximal operator of the latent group Lasso. The returned value is an exact solution if the DAG is a directed path graph.

beta

A length-p vector giving solution to the proximal operator defined above.

ite

Number of cycles of BCD performed.

penalval

Value of the penalty \(lam * \Omega(\beta; w)\) if get.penalval is TRUE, otherwise NA.

assign

Value of assign.

w.assign

Value of w.assign.

beta.ma

n.paths-by-p matrix of decomposed beta values for all the decomposed path graphs. The beta values are from the last iteration in hsm.

Details

See Section 2.2 of the paper for problem setup and group structure specifications. See Figure 7 in Section 4.3 for an example of decomposing DAG into path graphs. See Algorithm 4 in paper for details of the path-based BCD.

References

Yan, X. and Bien, J. (2017). Hierarchical Sparse Modeling: A Choice of Two Group Lasso Formulations. Statist. Sci. 32, no. 4, 531--560. doi:10.1214/17-STS622.

Jacob, L., Obozinski, G. and Vert, J. (2009). Group Lasso with Overlap and Graph Lasso. In Proceedings of the 26th Annual International Conference on Machine Learning. ICML'09 433-440. ACM, New York.

See Also

hsm.path

paths

Examples

Run this code
# NOT RUN {
# The following example appears in Figure 7 of Yan & Bien (2015).
# Generate map defining DAG.
map <- matrix(0, ncol=2, nrow=8)
map[1, ] <- c(1, 2)
map[2, ] <- c(2, 7)
map[3, ] <- c(3, 4)
map[4, ] <- c(4, 6)
map[5, ] <- c(6, 7)
map[6, ] <- c(6, 8)
map[7, ] <- c(3, 5)
map[8, ] <- c(5, 6)
# Assume one parameter per node.
# Let parameter and node share the same index.
var <- as.list(1:8)
set.seed(100)
y <- rnorm(8)
result <- hsm(y=y, lam=0.5, map=map, var=var, get.penalval=TRUE)

# Another example in which DAG contains two separate nodes
map <- matrix(0, ncol=2, nrow=2)
map[1, ] <- c(1, NA)
map[2, ] <- c(2, NA)
# Assume ten parameters per node.
var <- list(1:10, 11:20)
set.seed(100)
y <- rnorm(20)
lam <- 0.5
result <- hsm(y=y, lam=lam, map=map, var=var, get.penalval=TRUE)
# The solution is equivalent to performing group-wise soft-thresholdings
beta.st <- c(y[1:10] * max(0, 1 - lam * sqrt(10) / norm(y[1:10], "2")),
          y[11:20] * max(0, 1 - lam * sqrt(10) / norm(y[11:20], "2")))
all.equal(result$beta, beta.st)
# }

Run the code above in your browser using DataLab