kernelFactory: Binary classification with Kernel Factory

Description

kernelFactory implements an ensemble method for kernel machines (Ballings and Van den Poel, 2013).

Usage

kernelFactory(x = NULL, y = NULL, cp = 1, rp = round(log(nrow(x), 10)), method = "burn", ntree = 500, filter = 0.01, popSize = rp * cp * 7, iters = 80, mutationChance = 1/(rp * cp), elitism = max(1, round((rp * cp) * 0.05)), oversample = TRUE)

Arguments

A data frame of predictors (numeric, integer or factor). Categorical variables need to be factors. Indicator values should not be too imbalanced because this might produce constants in the subsetting process.

A factor containing the response vector. Only {0,1} is allowed.

The number of column partitions.

The number of row partitions.

method

Can be one of the following: POLynomial kernel function (pol), LINear kernel function (lin), Radial Basis kernel Function rbf), random choice (random=pol, lin, rbf) (random), burn- in choice of best function (burn=pol, lin, rbf ) (burn). Use random or burn if you don't know in advance which kernel function is best.

ntree

Number of trees in the Random Forest base classifiers.

filter

either NULL (deactivate) or a percentage denoting the minimum class size of dummy predictors. This parameter is used to remove near constants. For example if nrow(xTRAIN)=100, and filter=0.01 then all dummy predictors with any class size equal to 1 will be removed. Set this higher (e.g., 0.05 or 0.10) in case of errors.

popSize

Population size of the genetic algorithm.

iters

Number of generations of the genetic algorithm.

mutationChance

Mutationchance of the genetic algorithm.

elitism

Elitism parameter of the genetic algorithm.

oversample

Oversample the smallest class. This helps avoid problems related to the subsetting procedure (e.g., if rp is too high).

Value

trn: Training data set.
trnlst: List of training partitions.
rbfstre: List of used kernel functions.
rbfmtrX: List of augmented kernel matrices.
rsltsKF: List of models.
cpr: Number of column partitions.
rpr: Number of row partitions.
cntr: Number of partitions.
wghts: Weights of the ensemble members.
nmDtrn: Vector indicating the numeric (and integer) features.
rngs: Ranges of numeric predictors.
constants: To exclude from newdata.

References

Ballings, M. and Van den Poel, D. (2013), Kernel Factory: An Ensemble of Kernel Machines. Expert Systems With Applications, 40(8), 2904-2913.

Examples

Run this code

#Credit Approval data available at UCI Machine Learning Repository
data(Credit)
#take subset (for the purpose of a quick example) and train and test
Credit <- Credit[1:100,]
train.ind <- sample(nrow(Credit),round(0.5*nrow(Credit)))

#Train Kernel Factory on training data
kFmodel <- kernelFactory(x=Credit[train.ind,names(Credit)!= "Response"],
          y=Credit[train.ind,"Response"], method=random)

#Deploy Kernel Factory to predict response for test data
#predictedresponse <- predict(kFmodel, newdata=Credit[-train.ind,names(Credit)!= "Response"])