stages_hclust: Learn a staged tree with hierarchical clustering

Description

Build a stage event tree with k stages for each variable by clustering stage probabilities with hierarchical clustering.

Usage

stages_hclust(
  object,
  distance = "totvar",
  k = NA,
  method = "complete",
  ignore = object$name_unobserved,
  limit = length(object$tree),
  scope = NULL,
  score = function(x) {
     return(-BIC(x))
 }
)

Value

A staged event tree object.

Arguments

object: an object of class sevt with fitted probabilities and data, as returned by full or sevt_fit.
distance: character, the distance measure to be used, either a possible method for dist or one of the following: "totvar", "hellinger".
k: integer or (named) vector: number of clusters, that is stages per variable. Values will be recycled if needed. If NA (default) a search of the number of stage is performed with respect to the maximization of the score function. NA and integer can be mixed to fix the number of stage for some variables and use the score to select others.
method: the agglomeration method to be used in hclust.
ignore: vector of stages which will be ignored and left untouched. By default the name of the unobserved stages stored in object$name_unobserved.
limit: the maximum number of variables to consider.
scope: names of the variables to consider.
score: A function. Score to maximize for automatic selection of the number of stages. Used if k=NA for some variables.

Details

hclust_sevt performs hierarchical clustering of the initial stage probabilities in object and it aggregates them into the specified number of stages (k). A different number of stages for the different variables in the model can be specified by supplying a (named) vector via the argument k. If k is NA for some variables, all possible number of stages will be checked and the one that maximize the score will be selected.

Examples

Run this code

data("Titanic")
model <- stages_hclust(full(Titanic, join_unobserved = TRUE, lambda = 1), k = 2)
summary(model)

### or search k via BIC minimization
model1 <- stages_hclust(full(Titanic), k = NA)

Run the code above in your browser using DataLab