Learn R Programming

semiArtificial (version 2.4.1)

rbfDataGen: A data generator based on RBF network

Description

Using given formula and data the method builds a RBF network and extracts its properties thereby preparing a data generator which can be used with newdata.RBFgenerator method to generate semi-artificial data.

Usage

rbfDataGen(formula, data, eps=1e-4, minSupport=1, 
            nominal=c("encodeBinary","asInteger"))

Arguments

formula

A formula specifying the response and variables to be modeled.

data

A data frame with training data.

eps

The minimal probability considered in data generator to be larger than 0.

minSupport

The minimal number of instances defining a Gaussian kernel to copy the kernel to the data generator.

nominal

The way how to treat nominal features. The option "asInteger" converts factors into integers and treats them as numeric features. The option "encodeBinary" converts each nominal attribute into a set of binary features, which encode the nominal value, e.g., for three valued attribute three binary attributes are constructed, each encoding a presence of one nominal value with 0 or 1.

Value

The created model is returned as a structure of class RBFgenerator, containing the following items:

noGaussians

The number of extracted Gaussian kernels.

centers

A matrix of Gaussian kernels' centers, with one row for each Gaussian kernel.

probs

A vector of kernel probabilities. Probabilities are defined as relative frequencies of training set instances with maximal activation in the given kernel.

unitClass

A vector of class values, one for each kernel.

bias

A vector of kernels' biases, one for each kernel. The bias is multiplied by the kernel activation to produce output value of given RBF network unit.

spread

A matrix of estimated variances for the kernels, one row for each kernel. The j-th value in i-th row represents the variance of training instances for j-th attribute with maximal activation in i-th Gaussian.

gNoActivated

A vector containing numbers of training instances with maximal activation in each kernel.

noAttr

The number of attributes in training data.

datNames

A vector of attributes' names.

originalNames

A vector of original attribute names.

attrClasses

A vector of attributes' classes (i.e., data types like numeric or factor).

attrLevels

A list of levels for discrete attributes (with class factor).

attrOrdered

A vector of type logical indicating whether the attribute is ordered (only possible for attributes of type factor.

normParameters

A list of parameters for normalization of attributes to [0,1].

noCol

The number of columns in the internally generated data set.

isDiscrete

A vector of type logical, each value indicating whether a respective attribute is discrete.

noAttrGen

The number of attributes to generate.

nominal

The value of parameter nominal.

Details

Parameter formula is used as a mechanism to select features (attributes) and the prediction variable (response, class). Only simple terms can be used and interaction terms are not supported. The simplest way is to specify just the response variable using e.g. class ~ .. See examples below.

A RBF network is build using rbfDDA from RSNNS package. The learned Gaussian kernels are extracted and used in data generation with newdata.RBFgenerator method.

References

Marko Robnik-Sikonja: Not enough data? Generate it!. Technical Report, University of Ljubljana, Faculty of Computer and Information Science, 2014

Other references are available from http://lkm.fri.uni-lj.si/rmarko/papers/

See Also

newdata.RBFgenerator.

Examples

Run this code
# NOT RUN {
# use iris data set, split into training and testing, inspect the data
set.seed(12345)
train <- sample(1:nrow(iris),size=nrow(iris)*0.5)
irisTrain <- iris[train,]
irisTest <- iris[-train,]

# inspect properties of the original data
plot(irisTrain, col=irisTrain$Species)
summary(irisTrain)

# create rbf generator
irisGenerator<- rbfDataGen(Species~.,irisTrain)

# use the generator to create new data
irisNew <- newdata(irisGenerator, size=200)

#inspect properties of the new data
plot(irisNew, col = irisNew$Species) #plot generated data
summary(irisNew)
# }

Run the code above in your browser using DataLab