Find optimal subset of a fixed size k
from a finite sequence of length n
.
A distributed multiple-population genetic algorithm (GA) is used to do
subset selection based on the maximization of a user-supplied fitness function.
FindOptimalSubset(
n,
k,
Fitness,
...,
popSize = 100,
numIslands = 4,
migrationRate = 0.1,
migrationInterval = 10,
pcrossover = 0.8,
pmutation = 0.1,
elitism = max(1, round(popSize/numIslands * 0.05)),
maxiter = 1000,
run = maxiter,
suggestions = NULL,
parallel = TRUE,
monitor = NULL,
seed = NULL
)
'integer' count.
Maximum permissible index, that is, the length of the finite sequence (1:n
).
The GA chooses a subset from this sequence.
'integer' count. Number of indices to choose, that is, the fixed size of the subset.
'function'.
Fitness function, also known as the objective function, is any allowable R function which
takes as its first argument the binary string
representing a potential solution.
And as its second argument the maximum permissible index, n
.
Use the DecodeChromosome(string, n)
command to decode the binary string
.
The fitness function returns a single numerical value describing its fitness.
Recall that the GA searches for a maximum fitness value.
Additional arguments to be passed to the fitness function.
'integer' count. Population size that is distributed evenly between islands.
'integer' count. Number of islands
'numeric' number. Proportion of individuals that should migrate between islands.
'integer' count. Number of generations at which exchange of individuals (or migration) takes place. This interval between migrations is called an epoch.
'numeric' number. Probability of crossover between pairs of chromosomes.
'numeric' number. Probability of mutation in a parent chromosome.
'integer' count. Number of chromosomes to survive into the next generation. Defaults to 5-percent of the island population.
'integer' count. Maximum number of generations to run on each island before the GA search is halted.
'integer' count. Number of consecutive generations without any improvement in the “best” fitness value before the GA is stopped.
integer 'matrix'.
Integer chromosomes to be included in the initial population.
See returned solution
component for a suggested value for this argument.
'logical' flag or 'integer' count. Whether to use parallel computing. This argument can also be used to specify the number of cores to employ; by default, this is the number of physical CPUs/cores. The parallel and doParallel packages must be installed for parallel computing to work.
'function'.
Function that takes as input the current state of the gaisl-class
object,
and is run at each epoch of the islands GA search.
'integer' count. Random number generator state for random number generation, used to replicate the results. The doRNG package must be installed if using parallel computing.
A 'list' with components:
call
original call which can be used for later re-use.
solution
a 'matrix' representation of the best solution found.
Each row represents a unique solution giving the best fitness at the final generation.
More than one row indicates a non-unique solution.
The number of columns is equal to the subset size k
.
ga_output
output from the ISLPGAs,
see gaisl-class
for format description.
ga_time
time required to run the ISLPGAs,
see system.time
for details.
The fitness function (see Fitness
argument) is solved using
the gaisl
function in the GA package (Scrucca, 2013, 2016).
The function implements an islands evolution model (first proposed by Cohoon and others, 1987).
to maximize a fitness function using islands parallel genetic algorithms (ISLPGAs)
(Luke, 2013, p. 103-104; Scrucca, 2016, p. 197-200).
Independent GAs are configured to use integer chromosomes
represented with a binary codification, linear-rank selection,
uniform crossover, and uniform mutation.
Cohoon, J.P., Hegde, S.U., Martin, W.N., and Richards, D., 1987, Punctuated Equilibria: A Parallel Genetic Algorithm, in Genetic Algorithms and their Applications: Proceedings of the Second International Conference on Genetic Algorithms, Grefenstette, J.J., Lawrence Earlbaum Associates, p. 155-161.
Luke, Sean, 2015, Essentials of metaheuristics (2nd ed.): Lulu, 263 p., available for free at https://cs.gmu.edu/~sean/book/metaheuristics/.
Scrucca, Luca, 2013, GA: A Package for Genetic Algorithms in R: Journal of Statistical Software, v. 53, no. 4, p. 1-37, 10.18637/jss.v053.i04.
Scrucca, Luca, 2017, On some extensions to GA package: hybrid optimisation, parallelisation and islands evolution: The R Journal, v. 9, no. 1, p. 187-206, https://journal.r-project.org/archive/2017/RJ-2017-008/.
# NOT RUN {
# Problem: Choose the 4 smallest numbers from a list
# of 100 values generated from a standard
# uniform distribution.
k <- 4
n <- 100
seed <- 123
set.seed(seed); numbers <- sort.int(runif(n))
Fitness <- function(string, n, numbers) {
idxs <- DecodeChromosome(string, n)
-1 * sum(numbers[idxs])
}
# }
# NOT RUN {
out <- FindOptimalSubset(n, k, Fitness, numbers, run = 10,
monitor = GA::gaislMonitor, seed = seed)
plot(out[["ga_output"]])
summary(out[["ga_output"]])
print(out[["solution"]])
print(out[["ga_output"]]@fitnessValue)
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab