A Fast Regularized Greedy Forest regressor
A Fast Regularized Greedy Forest regressor
# init <- FastRGF_Regressor$new(n_estimators = 500, max_depth = 6,
# max_leaf = 50, tree_gain_ratio = 1.0,
# min_samples_leaf = 5, l1 = 1.0,
# l2 = 1000.0, opt_algorithm = "rgf",
# learning_rate = 0.001, max_bin = NULL,
# min_child_weight = 5.0, data_l2 = 2.0,
# sparse_max_features = 80000,
# sparse_min_occurences = 5,
# n_jobs = 1, verbose = 0)
FastRGF_Regressor$new(n_estimators = 500, max_depth = 6,
max_leaf = 50, tree_gain_ratio = 1.0,
min_samples_leaf = 5, l1 = 1.0,
l2 = 1000.0, opt_algorithm = "rgf",
learning_rate = 0.001, max_bin = NULL,
min_child_weight = 5.0, data_l2 = 2.0,
sparse_max_features = 80000,
sparse_min_occurences = 5,
n_jobs = 1, verbose = 0)
--------------
fit(x, y, sample_weight = NULL)
--------------
predict(x)
--------------
cleanup()
--------------
get_params(deep = TRUE)
--------------
score(x, y, sample_weight = NULL)
--------------
RGF::Internal_class
-> FastRGF_Regressor
Inherited methods
new()
FastRGF_Regressor$new(
n_estimators = 500,
max_depth = 6,
max_leaf = 50,
tree_gain_ratio = 1,
min_samples_leaf = 5,
l1 = 1,
l2 = 1000,
opt_algorithm = "rgf",
learning_rate = 0.001,
max_bin = NULL,
min_child_weight = 5,
data_l2 = 2,
sparse_max_features = 80000,
sparse_min_occurences = 5,
n_jobs = 1,
verbose = 0
)
n_estimators
an integer. The number of trees in the forest (Original name: forest.ntrees.)
max_depth
an integer. Maximum tree depth (Original name: dtree.max_level.)
max_leaf
an integer. Maximum number of leaf nodes in best-first search (Original name: dtree.max_nodes.)
tree_gain_ratio
a float. New tree is created when leaf-nodes gain < this value * estimated gain of creating new tree (Original name: dtree.new_tree_gain_ratio.)
min_samples_leaf
an integer or float. Minimum number of training data points in each leaf node. If an integer, then consider min_samples_leaf as the minimum number. If a float, then min_samples_leaf is a percentage and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node (Original name: dtree.min_sample.)
l1
a float. Used to control the degree of L1 regularization (Original name: dtree.lamL1.)
l2
a float. Used to control the degree of L2 regularization (Original name: dtree.lamL2.)
opt_algorithm
a character string. Either "rgf" or "epsilon-greedy". Optimization method for training forest (Original name: forest.opt.)
learning_rate
a float. Step size of epsilon-greedy boosting. Meant for being used with opt_algorithm = "epsilon-greedy" (Original name: forest.stepsize.)
max_bin
an integer or NULL. Maximum number of discretized values (bins). If NULL, 65000 is used for dense data and 200 for sparse data (Original name: discretize.(sparse/dense).max_buckets.)
min_child_weight
a float. Minimum sum of data weights for each discretized value (bin) (Original name: discretize.(sparse/dense).min_bucket_weights.)
data_l2
a float. Used to control the degree of L2 regularization for discretization (Original name: discretize.(sparse/dense).lamL2.)
sparse_max_features
an integer. Maximum number of selected features. Meant for being used with sparse data (Original name: discretize.sparse.max_features.)
sparse_min_occurences
an integer. Minimum number of occurrences for a feature to be selected. Meant for being used with sparse data (Original name: discretize.sparse.min_occrrences.)
n_jobs
an integer. The number of jobs to run in parallel for both fit and predict. If -1, all CPUs are used. If -2, all CPUs but one are used. If < -1, (n_cpus + 1 + n_jobs) are used (Original name: set.nthreads.)
verbose
an integer. Controls the verbosity of the tree building process (Original name: set.verbose.)
clone()
The objects of this class are cloneable with this method.
FastRGF_Regressor$clone(deep = FALSE)
deep
Whether to make a deep clone.
the fit function builds a regressor from the training set (x, y).
the predict function predicts the regression target for x.
the cleanup function removes tempfiles used by this model. See the issue https://github.com/RGF-team/rgf/issues/75, which explains in which cases the cleanup function applies.
the get_params function returns the parameters of the model.
the score function returns the coefficient of determination ( R^2 ) for the predictions.
https://github.com/RGF-team/rgf/tree/master/python-package, Tong Zhang, FastRGF: Multi-core Implementation of Regularized Greedy Forest (https://github.com/RGF-team/rgf/tree/master/FastRGF)
try({
if (reticulate::py_available(initialize = FALSE)) {
if (reticulate::py_module_available("rgf.sklearn")) {
library(RGF)
set.seed(1)
x = matrix(runif(100000), nrow = 100, ncol = 1000)
y = runif(100)
fast_RGF_regr = FastRGF_Regressor$new(max_leaf = 50)
fast_RGF_regr$fit(x, y)
preds = fast_RGF_regr$predict(x)
}
}
}, silent = TRUE)
Run the code above in your browser using DataLab