Learn R Programming

isotree (version 0.6.1-1)

isotree.to.graphviz: Generate GraphViz Dot Representation of Tree

Description

Generate GraphViz representations of model trees in 'dot' format - either separately per tree (the default), or for a single tree if needed (if passing `tree`) Can also be made to output terminal node numbers (numeration starting at one).

These can be loaded as graphs through e.g. `DiagrammeR::grViz(x)`, where `x` would be the output of this function for a given tree.

Graph format is based on XGBoost's.

Usage

isotree.to.graphviz(
  model,
  output_tree_num = FALSE,
  tree = NULL,
  column_names = NULL,
  column_names_categ = NULL,
  nthreads = model$nthreads
)

Value

If passing `tree=NULL`, will return a list with one element per tree in the model, where each element consists of an R character / string with the 'dot' format representation of the tree. If passing `tree`, the output will be instead a single character / string element with the 'dot' representation for that tree.

Arguments

model

An Isolation Forest object as returned by isolation.forest.

output_tree_num

Whether to make the statements / outputs return the terminal node number instead of the isolation depth. The numeration will start at one.

tree

Tree for which to generate SQL statements or other outputs. If passed, will generate the statements only for that single tree. If passing `NULL`, will generate statements for all trees in the model.

column_names

Column names to use for the numeric columns. If not passed and the model was fit to a `data.frame`, will use the column names from that `data.frame`, which can be found under `model$metadata$cols_num`. If not passing it and the model was fit to data in a format other than `data.frame`, the columns will be named `column_N` in the resulting SQL statement. Note that the names will be taken verbatim - this function will not do any checks for e.g. whether they constitute valid SQL or not when exporting to SQL, and will not escape characters such as double quotation marks when exporting to SQL.

column_names_categ

Column names to use for the categorical columns. If not passed, will use the column names from the `data.frame` to which the model was fit. These can be found under `model$metadata$cols_cat`.

nthreads

Number of parallel threads to use.

Details

  • The generated graphs will not include range penalizations, thus predictions might differ from calls to `predict` when using `penalize_range=TRUE`.

  • The generated graphs will only include handling of missing values when using `missing_action="impute"`. When using the single-variable model with categorical variables + subset splits, the rule buckets might be incomplete due to not including categories that were not present in a given node - this last point can be avoided by using `new_categ_action="smallest"`, `new_categ_action="random"`, or `missing_action="impute"` (in the latter case will treat them as missing, but the `predict` function might treat them differently).

  • If using `scoring_metric="density"` or `scoring_metric="boxed_ratio"` plus `output_tree_num=FALSE`, the outputs will correspond to the logarithm of the density rather than the density.

Examples

Run this code
library(isotree)
set.seed(123)
X <- matrix(rnorm(100 * 3), nrow = 100)
model <- isolation.forest(X, ndim=1, max_depth=3, ntrees=2, nthreads=1)
model_as_graphviz <- isotree.to.graphviz(model)

# These can be parsed and plotted with library 'DiagrammeR'
if (require("DiagrammeR")) {
    # first tree
    DiagrammeR::grViz(model_as_graphviz[[1]])

    DiagrammeR::grViz(model_as_graphviz[[1]])
}

Run the code above in your browser using DataLab