This program is an implementation of the standard random forest
classification algorithm by Leo Breiman. A random forest can be trained and
saved for later use, or a random forest may be loaded and predictions or
class probabilities for points may be generated.
The training set and associated labels are specified with the "training" and
"labels" parameters, respectively. The labels should be in the range `[0,
num_classes - 1]`. Optionally, if "labels" is not specified, the labels are
assumed to be the last dimension of the training dataset.
When a model is trained, the "output_model" output parameter may be used to
save the trained model. A model may be loaded for predictions with the
"input_model"parameter. The "input_model" parameter may not be specified when
the "training" parameter is specified. The "minimum_leaf_size" parameter
specifies the minimum number of training points that must fall into each leaf
for it to be split. The "num_trees" controls the number of trees in the
random forest. The "minimum_gain_split" parameter controls the minimum
required gain for a decision tree node to split. Larger values will force
higher-confidence splits. The "maximum_depth" parameter specifies the
maximum depth of the tree. The "subspace_dim" parameter is used to control
the number of random dimensions chosen for an individual node's split. If
"print_training_accuracy" is specified, the calculated accuracy on the
training set will be printed.
Test data may be specified with the "test" parameter, and if performance
measures are desired for that test set, labels for the test points may be
specified with the "test_labels" parameter. Predictions for each test point
may be saved via the "predictions"output parameter. Class probabilities for
each prediction may be saved with the "probabilities" output parameter.