This program implements Hoeffding trees, a form of streaming decision tree
suited best for large (or streaming) datasets. This program supports both
categorical and numeric data. Given an input dataset, this program is able
to train the tree with numerous training options, and save the model to a
file. The program is also able to use a trained model or a model from file
in order to predict classes for a given test set.
The training file and associated labels are specified with the "training" and
"labels" parameters, respectively. Optionally, if "labels" is not specified,
the labels are assumed to be the last dimension of the training dataset.
The training may be performed in batch mode (like a typical decision tree
algorithm) by specifying the "batch_mode" option, but this may not be the
best option for large datasets.
When a model is trained, it may be saved via the "output_model" output
parameter. A model may be loaded from file for further training or testing
with the "input_model" parameter.
Test data may be specified with the "test" parameter, and if performance
statistics are desired for that test set, labels may be specified with the
"test_labels" parameter. Predictions for each test point may be saved with
the "predictions" output parameter, and class probabilities for each
prediction may be saved with the "probabilities" output parameter.