Learn R Programming

semiArtificial (version 2.4.1)

performanceCompare: Evaluate similarity of two data sets based on predictive performance

Description

Depending on the type of problem (classification or regression), a classification performance (accuracy, AUC, brierScore, etc) or regression performance (RMSE, MSE, MAE, RMAE, etc) on two data sets is used to compare the similarity of two data sets.

Usage

performanceCompare(data1, data2, formula, model="rf", stat=NULL, ...)

Arguments

data1

A data.frame containing the reference data.

data2

A data.frame with the same number and names of columns as data1.

formula

A formula specifying the response and predictive variables.

model

A predictive model used for performance comparison. The default value "rf" stands for random forest, but any classification or regression model supported by function CoreModel in CORElearn package can be used.

stat

A statistics used as performance indicator. The default value is NULL and means that for classification "accuracy" is used, and for regression "RMSE"" (relative mean squared error) is used. Other values supported and output by modelEval from CORElearn package can be used e.g., "AUC" or "brierScore".

...

Additional parameters passed to CoreModel function.

Value

The method returns a list of performance indicators computed on both data sets:

diff.m1

The difference between performance of model built on data1 (and evaluated on both data1 and data2.)

diff.m2

The difference between performance of model built on data2 (and evaluated on both data1 and data2.)

perf.m1d1

The performance of model built on data1 on data1.

perf.m1d2

The performance of model built on data1 on data2.

perf.m2d1

The performance of model built on data2 on data1.

perf.m2d2

The performance of model built on data2 on data2.

Details

The function compares data stored in data1 with data2 by comparing models constructed on data1 and evaluated on both data1 and data2 with models built on data2 and evaluated on both data1 and data2. The difference between these performances are indicative on similarity of the data sets if used in machine learning and data mining. The performance indicator used is determined by parameter stat.

See Also

newdata.RBFgenerator.

Examples

Run this code
# NOT RUN {
# use iris data set

# create RBF generator
irisGenerator<- rbfDataGen(Species~.,iris)

# use the generator to create new data
irisNew <- newdata(irisGenerator, size=200)

# compare statistics of original and new data
performanceCompare(iris, irisNew, Species~.)

# }

Run the code above in your browser using DataLab