The package CORElearn is an R port of CORElearn data mining system. This document is a short description of the C++ part which can also serve as a standalone Linux or Windows data mining system, its organization and main classes and data structures.
The C++ part is called from R functions collected in file Rinterface.R
.
The C++ functions called from R and providing interface to R are collected in Rfront.cpp
and Rconvert.cpp
. The front end for standalone version is in file frontend.cpp
.
For many parts of the code there are two variants, classification and regression one.
Regression part usually has Reg
somewhere in its name.
The main classes are
marray, mmatrix
are templates for storing vectors and matrixes
dataStore
contains data storage and data manipulation methods, of which the most important are
mmatrix<int> DiscData, DiscPredictData
contain values of discrete attributes and class for training and prediction (optional).
In classification column 0 always stores class values.
mmatrix<double> ContData, ContPredictData
contain values of numeric attribute and prediction values for training and prediction (optional).
In regression column 0 always stores target values.
marray<attribute> AttrDesc
with information about attributes' types, number of values, min, max, column index in DiscData or ContData, ...
estimation, estimationReg
evaluate attributes with different purposes: decision/regression tree splitting, binarization,
discretization, constructive induction, feature selection, etc. Because of efficiency these classes store its own data in
mmatrix<int> DiscValues
containing discrete attributes and class values,
mmatrix<double> ContValues
containing numeric attribute and prediction values.
Options
stores and handles all the parameters of the system.
featureTree, regressionTree
build all the models, predict with them, and create output.
CORElearn
, CoreModel
, predict.CoreModel
,
modelEval
, attrEval
, ordEval
,
plot.ordEval
, helpCore
, paramCoreIO
,
infoCore
, versionCore
.