Learn R Programming

⚠️There's a newer version (2.3.0) of this package.Take me there.

stagedtrees

Preprint

F Carli, M Leonelli, E Riccomagno, G Varando, The R Package stagedtrees for Structural Learning of Stratified Staged Trees, 2020 arXiv:2004.06459

@misc{2004.06459,
Author = {Federico Carli and Manuele Leonelli and Eva Riccomagno and Gherardo Varando},
Title = {The R Package stagedtrees for Structural Learning of Stratified Staged Trees},
Year = {2020},
Eprint = {arXiv:2004.06459},
}

Overview

stagedtrees is a package that implements staged event trees, a probability model for discrete random variables.

Installation

#stable version from CRAN 
install.packages("stagedtrees")

#development version from github
# install.packages("devtools")
devtools::install_github("gherardovarando/stagedtrees")

Usage

library("stagedtrees")

With the stagedtrees package it is possible to fit (stratified) staged event trees to data, use them to compute probabilities, make predictions, visualize and compare different models.

Creating the model

A staged event tree object (sevt class) can be initialized as the full (saturated) or as the fully independent model with, respectively, the functions indep and full. It is possible to build a staged event tree from data stored in a data.frame or a table object.

# Load the PhDArticles data
data("PhDArticles")

# define order of variables
order <- c("Gender", "Kids",  "Married", "Articles")

# Independence model 
mod_indep <- indep(PhDArticles, order)
mod_indep
#> Staged event tree (fitted) 
#> Gender[2] -> Kids[2] -> Married[2] -> Articles[3]  
#> 'log Lik.' -2774.754 (df=7)

# Full (saturated) model
mod_full <- full(PhDArticles, order) 
mod_full
#> Staged event tree (fitted) 
#> Gender[2] -> Kids[2] -> Married[2] -> Articles[3]  
#> 'log Lik.' -2546.796 (df=21)
Structural zeros and unobserved situations

By default staged trees object are defined assuming structural zeros in the contingency tables. This is implemented by joining all unobserved situations in particular stages (named by default "UNOBSERVED") which are, by default, ignored by other methods and functions (see the ignore argument in ?stages_bhc or ?plot.sevt).

## there are no observations for Gender=male (female), Kids = yes, Married = no
get_stage(mod_full, c("male", "yes", "no"))
#> [1] "UNOBSERVED"

## and obviously 
prob(mod_full, c(Kids = "yes", Married = "no"))
#> [1] 0
Initialize a model without structural zeros

It is possible to initialize a staged tree without structural zeros by setting the argument join_unobserved=FALSE. In that case, it can be useful to set lambda > 0 to avoid problems with probabilities on unobserved situations.

mod_full0 <- full(PhDArticles, join_unobserved = FALSE, lambda = 1)

Model selection

stagedtrees implements methods to perform automatic model selection. All methods can be initialized from an arbitrary staged event tree object.

Score methods

This methods perform optimization for a given score using different heuristics.

  • Hill-Climbing stages_hc(object, score, max_iter, scope, ignore, trace)
mod1 <- stages_hc(mod_indep)
mod1
#> Staged event tree (fitted) 
#> Gender[2] -> Kids[2] -> Married[2] -> Articles[3]  
#> 'log Lik.' -2555.887 (df=10)
  • Backward Hill-Climbing stages_bhc(object, score, max_iter, scope, ignore, trace)
mod2 <- stages_bhc(mod_full)
mod2
#> Staged event tree (fitted) 
#> Gender[2] -> Kids[2] -> Married[2] -> Articles[3]  
#> 'log Lik.' -2555.887 (df=10)
  • Backward Fast Hill-Climbing stages_fbhc(object, score, max_iter, scope, ignore, trace)
mod3 <- stages_fbhc(mod_full, score = function(x) -BIC(x))
mod3
#> Staged event tree (fitted) 
#> Gender[2] -> Kids[2] -> Married[2] -> Articles[3]  
#> 'log Lik.' -2555.887 (df=10)
Clustering methods
  • Backward Joining stages_bj(object, distance, thr, scope, ignore, trace)
mod4 <- stages_bj(mod_full)
mod4
#> Staged event tree (fitted) 
#> Gender[2] -> Kids[2] -> Married[2] -> Articles[3]  
#> 'log Lik.' -2561.404 (df=9)
  • Hierarchical Clustering stages_hclust(object, distance, k, method, ignore, limit, scope)
mod5 <- stages_hclust(mod_full,
                    k = 2, 
                    distance = "totvar",
                   method = "mcquitty")
mod5
#> Staged event tree (fitted) 
#> Gender[2] -> Kids[2] -> Married[2] -> Articles[3]  
#> 'log Lik.' -2554.904 (df=11)
  • K-Means Clustering stages_kmeans(object, k, algorithm, ignore, limit, scope, nstart)
mod6 <- stages_kmeans(mod_full,
                    k = 2, 
                   algorithm = "Hartigan-Wong")
mod6
#> Staged event tree (fitted) 
#> Gender[2] -> Kids[2] -> Married[2] -> Articles[3]  
#> 'log Lik.' -2554.904 (df=11)

Combining model selections with %>%

The pipe operator from the magrittr package can be used to combine easily various model selection algorithms and to specify models easily.

library(magrittr)
model <- PhDArticles %>% full(lambda = 1) %>% 
           stages_hclust %>% stages_hc

## extract a sub_tree and join two stages
sub_model <- model %>% subtree(path = c(">2"))  %>%  
              join_stages("Mentor", "1", "2")

Probabilities, predictions and sampling

Marginal probabilities

Obtain marginal probabilities with the prob function.

# estimated probability of c(Gender = "male", Married = "yes")
# using different models
prob(mod_indep, c(Gender = "male", Married = "yes")) 
#> [1] 0.357567
prob(mod3, c(Gender = "male", Married = "yes"))
#> [1] 0.4163934

Or for a data.frame of observations:

obs <- expand.grid(mod_full$tree)
p <- prob(mod2, obs)
cbind(obs, P = p)
#>    Gender Kids Married Articles          P
#> 1    male  yes      no        0 0.00000000
#> 2  female  yes      no        0 0.00000000
#> 3    male   no      no        0 0.03711667
#> 4  female   no      no        0 0.06437935
#> 5    male  yes     yes        0 0.07751799
#> 6  female  yes     yes        0 0.02627729
#> 7    male   no     yes        0 0.04762758
#> 8  female   no     yes        0 0.04762758
#> 9    male  yes      no      1-2 0.00000000
#> 10 female  yes      no      1-2 0.00000000
#> 11   male   no      no      1-2 0.05722715
#> 12 female   no      no      1-2 0.09926125
#> 13   male  yes     yes      1-2 0.11951865
#> 14 female  yes     yes      1-2 0.04051480
#> 15   male   no     yes      1-2 0.07343307
#> 16 female   no     yes      1-2 0.07343307
#> 17   male  yes      no       >2 0.00000000
#> 18 female  yes      no       >2 0.00000000
#> 19   male   no      no       >2 0.02915345
#> 20 female   no      no       >2 0.05056705
#> 21   male  yes     yes       >2 0.06088686
#> 22 female  yes     yes       >2 0.02063961
#> 23   male   no     yes       >2 0.03740930
#> 24 female   no     yes       >2 0.03740930
Predictions

A staged event tree object can be used to make predictions with the predict method. The class variable can be specified, otherwise the first variable (root) in the tree will be used.

## check accuracy over the PhDArticles data
predicted <- predict(mod3, newdata = PhDArticles)
table(predicted, PhDArticles$Gender)
#>          
#> predicted male female
#>    male    311    145
#>    female  183    276

Conditional probabilities (or log-) can be obtained setting prob = TRUE:

## obtain estimated conditional probabilities in mod3 for first 5 obs
## P(Articles|Gender, Kids, Married)
predict(mod3, newdata = PhDArticles[1:5,], prob = TRUE)
#>        male    female
#> 1 0.5000000 0.5000000
#> 2 0.3656958 0.6343042
#> 3 0.3656958 0.6343042
#> 4 0.7468354 0.2531646
#> 5 0.3656958 0.6343042
Sampling
sample_from(mod4, 5)
#>   Gender Kids Married Articles
#> 1   male   no     yes       >2
#> 2   male   no     yes      1-2
#> 3   male  yes     yes      1-2
#> 4   male   no      no      1-2
#> 5 female  yes     yes        0

Explore the model

Model info
# variables 
variable.names(mod1)
#> NULL

# stages
stages(mod1, "Kids")
#> [1] "3" "1"

# summary
summary(mod1)
#> Call: 
#> stages_hc(mod_indep)
#> lambda:  0 
#> Stages: 
#>   Variable:  Gender 
#>  stage npaths sample.size      male    female
#>      1      0         915 0.5398907 0.4601093
#>   ------------ 
#>   Variable:  Kids 
#>  stage npaths sample.size       yes        no
#>      3      1         494 0.4777328 0.5222672
#>      1      1         421 0.1900238 0.8099762
#>   ------------ 
#>   Variable:  Married 
#>  stage npaths sample.size        no       yes
#>      3      2         316 0.0000000 1.0000000
#>      4      1         258 0.4379845 0.5620155
#>      1      1         341 0.5747801 0.4252199
#>   ------------ 
#>   Variable:  Articles 
#>       stage npaths sample.size         0      1-2        >2
#>  UNOBSERVED      2           0        NA       NA        NA
#>           1      6         915 0.3005464 0.463388 0.2360656
#>   ------------
Plot
plot(mod4, main = "Staged tree learned with bj.sevt", 
     cex_label_edges = 0.6, cex_nodes = 1.5)

By default stages associated with the unobserved situations are not plotted, if the model has been created with join_unobserved = TRUE. But we can plot also the unobserved stages in a specific color.

plot(stndnaming(mod5, uniq = TRUE), 
     main = "Staged tree learned with stages_hclust 
     (unobserved in grey)",  
     ignore = FALSE, ## do not ignore stages
     col = function(stages) ifelse(stages=="UNOBSERVED", "grey", stages),
     cex_label_edges = 0.6, cex_nodes = 1.5)

Barplot

barplot_stages permit to easily plot barplots (via barplot) representing the different probabilities defined for the different stages of a variable.

barplot(mod4, "Kids", legend.text = TRUE)

Subtrees

A subtree can be extracted, the result is another staged event tree object in the remaining variables.

sub <- subtree(mod4, c("female"))
plot(sub)

Comparing models

Check if models are equal.

compare_stages(mod1, mod4)
#> [1] FALSE

compare_stages(mod1, mod4, method = "hamming", plot = TRUE, 
             cex_label_nodes = 0, cex_label_edges = 0)

#> [1] FALSE

hamming_stages(mod1, mod4)
#> [1] 1

difftree <- compare_stages(mod1, mod4, method = "stages", plot = FALSE, 
             return_tree = TRUE)

difftree$Married
#> [1] 0 1 0 1

Penalized log-likelihood.

BIC(mod_indep, mod_full, mod1, mod2, mod3, mod4, mod5)
#>           df      BIC
#> mod_indep  7 5597.240
#> mod_full  21 5236.789
#> mod1      10 5179.964
#> mod2      10 5179.964
#> mod3      10 5179.964
#> mod4       9 5184.178
#> mod5      11 5184.816

Copy Link

Version

Install

install.packages('stagedtrees')

Monthly Downloads

176

Version

2.1.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Gherardo Varando

Last Published

April 22nd, 2021

Functions in stagedtrees (2.1.0)

as_adj_matrix

Convert to an adjacency matrix
Pokemon

Pokemon Go Users
compare_stages

Compare two staged event tree
distance_mat_stages

Compute the distance matrix
as_sevt

Coerce to sevt
barplot.sevt

Bar plots of stage probabilities
has_ctables

Check if the stages event tree has ctables field
as_parentslist

Obtain the equivalent DAG as list of parents
as_bn

Convert to a bnlearn object
check_sevt

check sevt object
generate_linear_dataset

Generate a random binary dataset for classification
has_prob

Check if the stages event tree has probabilities
check_sevt_prob

check
plot.sevt

Plot method for staged event trees
logLik.sevt

Log-Likelihood of a staged event tree
prob

Probabilities for a staged event tree
Asym

Asym dataset
make_ctables

Distribute counts along tree
probdist

Distances between probabilities
inclusions_stages

Inclusions of stages
as.character.parentslist

Print a parentslist object
get_stage

Get stage or path
generate_xor_dataset

Generate a xor dataset
sevt

Staged event tree (sevt) class
is_fitted_sevt

Check if the stages event tree is fitted
set_stage

Set stage to path
predict.sevt

Predict method for staged event tree
PhDArticles

PhD Students Publications
print.sevt

Print a staged event tree
generate_random_dataset

Generate a random binary dataset
find_stage

Find the stage of the path
full_indep

Full and independent staged event tree
simple_clustering

Perform a simple clustering in 2 groups
sevt_varnames

Variable names
split_stage_random

Split randomly a stage
stages_bhc

Backward hill-climbing
sevt_nvar

Number of variables
stages_bhcr

Backward random hill-climbing
noisy_xor

noisy xor function
path_probability

Compute probability of a path from root
sevt_df

Number of parameters of a staged event tree
sevt_fit

Fit a staged event tree
stages

Stages of a variable
stagedtrees

Staged event trees.
text.sevt

Add text to a staged event tree plot
rename_stage

Rename stage(s) in staged event tree
subtree

Extract subtree
tree_idx

return path index
join_stages

Join stages
stages_hclust

Learn a staged tree with hierarchical clustering
stages_hc

Hill-climbing
summary.sevt

Summarizing staged event trees
uni_idx

Unique id from named list
join_unobserved

Join situations with no observations
new_label

New label
which_class

Find maximum value
node

Plot a node
sample_from

Sample from a staged event tree
stages_kmeans

Learn a staged tree with k-means clustering
stages_bj

Backward joining of stages
stndnaming

Standard renaming of stages
stages_fbhc

Fast backward hill-climbing
ceg2adjmat

Ceg to adjmat of graph
ceg

Chain event graph (CEG)
edge

Plot an edge
expand_prob

Expand probabilities of a staged event tree