EPPlab: Function for Exploratory Projection Pursuit.

Description

REPPlab optimizes a projection pursuit (PP) index using a Genetic Algorithm (GA) or one of two Particle Swarm Optimisation (PSO) algorithms over several runs, implemented in the Java program EPP-lab. One of the PSO algorithms is a classic one while the other one is a parameter-free extension called Tribes. The parameters of the algorithms (maxiter and individuals for GA and maxiter and particles for PSO) can be modified by the user. The PP indices are the well-known Friedman and Friedman-Tukey indices together with the kurtosis and a so-called discriminant index that is devoted to the detection of groups. At each run, the function finds a local optimum of the PP index and gives the associated projection direction and criterion value.

Usage

EPPlab(
  x,
  PPindex = "KurtosisMax",
  PPalg = "GA",
  n.simu = 20,
  sphere = FALSE,
  maxiter = NULL,
  individuals = NULL,
  particles = NULL,
  step_iter = 10,
  eps = 10^(-6)
)

Value

A list with class 'epplab' containing the following components:

PPdir: Matrix containing the PP directions as columns, see details.
PPindexVal: Vector containing the objective criterion value of each run.
PPindex: Name of the used projection index.
PPiter: Vector containing the number of iterations of each run.
PPconv: Boolean vector. Is TRUE if the run converged and FALSE else.
PPalg: Name of the used algorithm.
maxiter: Maximum number of iterations, as given in function call.
x: Matrix containing the data (centered!).
sphere: Logical
transform: The transformation matrix from the whitening or standardization step.
backtransform: The back-transformation matrix from the whitening or standardization step.
center: The mean vector of the data

Arguments

x: Matrix where each row is an observation and each column a dimension.
PPindex: The used index, see details.
PPalg: The used algorithm, see details.
n.simu: Number of simulation runs.
sphere: Logical, sphere the data. Default is FALSE, in which case the data is only standardized.
maxiter: Maximum number of iterations.
individuals: Size of the generated population in GA.
particles: Number of generated particles in the standard PSO algorithm.
step_iter: Convergence criterium parameter, see details. (Default: 10)
eps: Convergence criterium parameter, see details. (Default: 10^(-6))

Author

Daniel Fischer, Klaus Nordhausen

Details

The function always centers the data using colMeans and divides by the standard deviation. Sphering the data is optional. If sphering is requested the function WhitenSVD is used, which automatically tries to determine the rank of the data.

Currently the function provides the following projection pursuit indices: KurtosisMax, Discriminant, Friedman, FriedmanTukey, KurtosisMin.

Three algorithms can be used to find the projection directions. These are a Genetic Algorithm GA and two Particle Swarm Optimisation algorithms PSO and Tribe.

Since the algorithms might find local optima they are run several times. The function sorts the found directions according to the optimization criterion.

The different algorithms have different default settings. It is for GA: maxiter=50 and individuals=20. For PSO: maxiter=20 and particles=50. For Tribe: maxiter=20.

For GA, the size of the generated population is fixed by the user (individuals). The algorithm is based on a tournament section of three participants. It uses a 2-point crossover with a probability of 0.65 and the mutation operator is applied to all the individuals with a probability of 0.05. The termination criterion corresponds to the number of generations and is also fixed by the user (maxiter).

For PSO, the user can give the number of initial generated particles and also the maximum number of iterations. The other parameters are fixed following Clerc (2006) and using a "cosine" neighborhood adapted to PP for the PSO algorithm. For Tribes, only the maximum number of iterations needs to be fixed. The algorithm proposed by Cooren and Clerc (2009) and adapted to PP using a "cosine neighborhood" is used.

The algorithms stop as soon as one of the two following conditions holds: the maximum number of iterations is reached or the relative difference between the index value of the present iteration i and the value of iteration i-step_iter is less than eps. In the last situation, the algorithm is said to converge and EPPlab will return the number of iterations needed to attain convergence. If the convergence is not reached but the maximum number of iterations is attained, the function will return some warnings. The default values are 10 for step_iter and 1E-06 for eps. Note that if few runs have not converged this might not be problem and even non-converged projections might reveal some structure.

References

Larabi Marie-Sainte, S., (2011), Biologically inspired algorithms for exploratory projection pursuit, PhD thesis, University of Toulouse.

Ruiz-Gazen, A., Larabi Marie-Sainte, S. and Berro, A. (2010), Detecting multivariate outliers using projection pursuit with particle swarm optimization, COMPSTAT2010, pp. 89-98.

Berro, A., Larabi Marie-Sainte, S. and Ruiz-Gazen, A. (2010). Genetic algorithms and particle swarm optimization for exploratory projection pursuit. Annals of Mathematics and Artifcial Intelligence, 60, 153-178.

Larabi Marie-Sainte, S., Berro, A. and Ruiz-Gazen, A. (2010). An effcient optimization method for revealing local optima of projection pursuit indices. Swarm Intelligence, pp. 60-71.

Clerc, M. (2006). Particle Swarm Optimization. ISTE, Wiley.

Cooren, Y., Clerc, M. and Siarry, P. (2009). Performance evaluation of TRIBES, an adaptive particle swarm optimization algorithm. Swarm Intelligence, 3(2), 149-178.

Examples

Run this code


  library(tourr)
  data(olive)
  olivePP <- EPPlab(olive[,3:10],PPalg="PSO",PPindex="KurtosisMax",n.simu=5, maxiter=20)
  summary(olivePP)

  library(amap)
  data(lubisch)
  X <- lubisch[1:70,2:7]
  rownames(X) <- lubisch[1:70,1]
  res <- EPPlab(X,PPalg="PSO",PPindex="FriedmanTukey",n.simu=15, maxiter=20,sphere=TRUE)
  print(res)
  summary(res)
  fitted(res)
  plot(res)
  pairs(res)
  predict(res,data=lubisch[71:74,2:7])

Run the code above in your browser using DataLab