Learn R Programming

ChemometricsWithR (version 0.1.13)

GA: Genetic Algorithms for variable selection in classification

Description

A set of functions implementing simple variable selection in classification applications using genetic algorithms.

Usage

GAfun(X, C, eval.fun, kmin, kmax, popsize = 20, niter = 50,
      mut.prob = 0.05, ...)
GAfun2(X, C, eval.fun, kmin, kmax, popsize = 20, niter = 50,
       mut.prob = 0.05, ...)

GA.init.pop(popsize, nvar, kmin, kmax) GA.select(pop, number, qlts, min.qlt = 0.4, qlt.exp = 1) GA.mut(subset, maxvar, mut.prob = 0.01) GA.XO(subset1, subset2)

Arguments

X

Data matrix: independent variables used by eval.fun

C

Class vector, used by eval.fun

eval.fun

evaluation function. Should take a data matrix, a class vector (or factor), and a subset argument

kmin

Minimal number of variables to retain

kmax

Maximal number of variables to retain

popsize

Size of the GA population

niter

Number of iterations

mut.prob

Mutation probability

Further arguments to the evaluation function

nvar

The total number of variables to choose from

pop, subset, subset1, subset2

A (part of a) population of trial solutions

number

The number of trial solutions that may produce offspring

qlts

Vector of quality measures for members in a population

min.qlt

Minimal quality of a trial solution to be considered as a future parent

qlt.exp

Quality scaling parameter: the larger this number, the more discrimination between good and bad solutions, and the more greedy the search characteristics

maxvar

Number of variables to choose from

Value

Functions GAfun and GAfun2 both return a list containing the following fields:

best

The best subset

best.q

The quality of the best subset

n.iter

The number of iterations

In addition, the outcome of GAfun2 also contains
qualities

A matrix containing the best, median and worst quality value throughout the optimization

Details

The function generates a population of trial solutions, each containing a number of variables to be retained. For every member of the population, the evaluation function calculates a quality measure, which determines the chance of that member to create offspring. In a process of "survival of the fittest", this leads to subsets for which the evaluation function has a maximal value.

The initialization is done randomly. Selection is simple threshold selection. Mutation swaps variables in or out of the subset; the cross-over type is uniform. Functions GA.init.pop, GA.select, GA.mut and GA.XO are auxiliary functions, not meant to be called directly by the user.

References

R. Wehrens. "Chemometrics with R - Multivariate Data Analysis in the Natural Sciences and Life Sciences". Springer, Heidelberg, 2011.

See Also

Evaluation, SA

Examples

Run this code
# NOT RUN {
if (require("pls")) {
  data(gasoline, package = "pls")
  ## Usually more iterations are needed
  GAobj <- GAfun(gasoline$NIR, gasoline$octane,
                 eval.fun = pls.cvfun, niter = 20,
                 kmin = 3, kmax = 25, ncomp = 2)
  GAobj
} else {
  cat("Package pls not available.\nInstall it by typing 'install.packages(\"pls\")'")
}
# }

Run the code above in your browser using DataLab