semiArtificial (version 2.4.1)
Generator of Semi-Artificial Data
Description
Contains methods to generate and evaluate semi-artificial data sets.
Based on a given data set different methods learn data properties using machine learning algorithms and
generate new data with the same properties.
The package currently includes the following data generators:
i) a RBF network based generator using rbfDDA() from package 'RSNNS',
ii) a Random Forest based generator for both classification and regression problems
iii) a density forest based generator for unsupervised data
Data evaluation support tools include:
a) single attribute based statistical evaluation: mean, median, standard deviation, skewness, kurtosis, medcouple, L/RMC, KS test, Hellinger distance
b) evaluation based on clustering using Adjusted Rand Index (ARI) and FM
c) evaluation based on classification performance with various learning models, e.g., random forests.