Learn R Programming

isotree (version 0.5.15)

isotree.append.trees: Append isolation trees from one model into another

Description

This function is intended for merging models that use the same hyperparameters but were fitted to different subsets of data.

In order for this to work, both models must have been fit to data in the same format - that is, same number of columns, same order of the columns, and same column types, although not necessarily same object classes (e.g. can mix `base::matrix` and `Matrix::dgCMatrix`).

If the data has categorical variables, the models should have been built with parameter `recode_categ=FALSE` in the call to isolation.forest, and the categorical columns passed as type `factor` with the same `levels` - otherwise different models might be using different encodings for each categorical column, which will not be preserved as only the trees will be appended without any associated metadata.

Note that this function will not perform any checks on the inputs, and passing two incompatible models (e.g. fit to different numbers of columns) will result in wrong results and potentially crashing the R process when using the resulting object.

Also be aware that the first input will be modified in-place.

Usage

isotree.append.trees(model, other)

Value

The same input `model` object, now with the new trees appended, returned as invisible.

Arguments

model

An Isolation Forest model (as returned by function isolation.forest) to which trees from `other` (another Isolation Forest model) will be appended into.

Will be modified in-place, and on exit will contain the resulting merged model.

other

Another Isolation Forest model, from which trees will be appended into `model`. It will not be modified during the call to this function.

Details

Be aware that, if an out-of-memory error occurs, the resulting object might be rendered unusable (might crash when calling certain functions).

For safety purposes, the model object can be deep copied (including the underlying C++ object) through function isotree.deep.copy before undergoing an in-place modification like this.

Examples

Run this code
library(isotree)

### Generate two random sets of data
m <- 100
n <- 2
set.seed(1)
X1 <- matrix(rnorm(m*n), nrow=m)
X2 <- matrix(rnorm(m*n), nrow=m)

### Fit a model to each dataset
iso1 <- isolation.forest(X1, ntrees=3, nthreads=1)
iso2 <- isolation.forest(X2, ntrees=2, nthreads=1)

### Check the terminal nodes for some observations
nodes1 <- predict(iso1, head(X1, 3), type="tree_num")
nodes2 <- predict(iso2, head(X1, 3), type="tree_num")

### Check also the average isolation depths
nodes1.depths <- predict(iso1, head(X1, 3), type="avg_depth")
nodes2.depths <- predict(iso2, head(X1, 3), type="avg_depth")

### Append the trees from 'iso2' into 'iso1'
iso1 <- isotree.append.trees(iso1, iso2)

### Check that it predicts the same as the two models
nodes.comb <- predict(iso1, head(X1, 3), type="tree_num")
nodes.comb == cbind(nodes1, nodes2)

### The new predicted scores will be a weighted average
### (Be aware that, due to round-off, it will not match with '==')
nodes.comb.depths <- predict(iso1, head(X1, 3), type="avg_depth")
nodes.comb.depths
(3*nodes1.depths + 2*nodes2.depths) / 5

Run the code above in your browser using DataLab