Optimization of sample configurations using spatial simulated annealing
spsann is a package for the optimization of spatial sample configurations using spatial simulated annealing. It includes multiple objective functions to optimize spatial sample configurations for various purposes such as variogram estimation, spatial trend estimation, and spatial interpolation. Most of the objective functions were designed to optimize spatial sample configurations when a) multiple spatial variables must be modelled, b) we know very little about the model of spatial variation of those variables, and c) sampling is limited to a single phase.
Spatial simulated annealing is a well known method with widespread use to solve combinatorial optimization problems in the environmental sciences. This is mainly due to its robustness against local optima and easiness to implement. In short, the algorithm consists of randomly changing the spatial location of a candidate sampling point at a time and evaluating if the resulting spatial sample configuration is better than the previous one with regard to the chosen quality criterion, i.e. an objective function. Sometimes a worse spatial sample configuration is accepted so that the algorithm is able to scape from local optima solutions, i.e. those spatial sample configurations that are too good and appear to early in the optimization to be true. The chance of accepting a worse spatial sample configuration reduces as the optimization proceeds so that we can get very close to the optimum spatial sample configuration.
spsann also combines multiple objective functions so that spatial sample configurations can be optimized regarding more than one modelling objective. Combining multiple objective functions gives rise to a multi-objective combinatorial optimization problem (MOCOP). A MOCOP usually has multiple possible solutions. spsann finds a single solution by aggregating the objective functions using the weighted-sum method. With this method the relative importance of every objective function can be specified at the beginning of the optimization so that their relative influence on the resulting optimized spatial sample configuration can be different. But this requires the objective functions first to be scaled to the same approximate range of values. The upper-lower bound approach is used for that end. In this approach, every objective function is scaled using as reference the respective minimum and maximum attainable objective function values, also known as the Pareto minimum and maximum.
spsann has a very simple structure composed of three families of functions. The first is the family of
optim
functions. These are the functions that include the spatial simulated annealing algorithm,
that is, the functions that perform the optimization regarding the chosen quality criterion (objective
function). Every optim
function is named after the objective function used as quality criterion. For
example, the quality criterion used by optimMSSD
is the mean squared shortest
distance (MSSD) between sample and prediction points. As the example shows, the name of the optim
functions is composed of the string 'optim'
followed by a suffix that indicates the respective
objective function. In the example this is 'MSSD'
.
There currently are nine function in the optim
family: optimACDC
,
optimCLHS
, optimCORR
, optimDIST
,
optimMSSD
, optimMKV
, optimPPL
,
optimSPAN
, and optimUSER
. The latter is a general purpose
function that enables to user to define his/her own objective function and plug it in the spatial simulated
annealing algorithm.
The second family of functions is the obj
family. This family of functions is used to return the
current objective function value of a spatial sample configuration. Like the family of optim
functions, the name of the obj
functions is composed of the string 'obj'
plus a suffix that
indicates the objective function being used. For example, objMSSD
computes the value
of the mean squared shortest distance between sample and prediction points of any spatial sample
configuration. Accordingly, there is a obj
function for every optim
function, except for
optimUSER
. A ninth obj
function, objSPSANN
, returns the
objective function value at any point of the optimization, irrespective of the objective function used.
The third family of functions implemented in spsann corresponds to a set of auxiliary functions.
These auxiliary functions can be used for several purposes, such as organizing the information needed to
feed an optim
function, retrieving information from an object of class
OptimizedSampleConfiguration
, i.e. an object containing an optimized sample configuration,
generating plots of the spatial distribution an optimized sample configuration, and so on. These functions
are named after the purpose for which they have been designed. For example: countPPL
,
minmaxPareto
, scheduleSPSANN
, spJitter
,
and plot
.
Despite spsann functions are classified into three general family of functions defined according to the purpose for which they were designed, the documentation is constructed with regard to the respective objective functions. For example, every spsann function that uses as quality criterion the MSSD is documented in the same documentation page. The exception are the auxiliary functions, that generally are documented separately.
spsann was initially developed as part of the PhD research project entitled ‘Contribution to the Construction of Models for Predicting Soil Properties’, developed by Alessandro Samuel-Rosa under the supervision of L<U+00FA>cia Helena Cunha dos Anjos lanjos@ufrrj.br (Universidade Federal Rural do Rio de Janeiro, Brazil), Gustavo de Mattos Vasques gustavo.vasques@embrapa.br (Embrapa Solos, Brazil), and Gerard B. M. Heuvelink gerard.heuvelink@wur.nl (ISRIC -- World Soil Information, the Netherlands). The project was supported from March/2012 to February/2016 by the CAPES Foundation, Ministry of Education of Brazil, and the CNPq Foundation, Ministry of Science and Technology of Brazil.
Some of the solutions used to build spsann were found in the source code of other R-packages and scripts developed and published by other researchers. For example, the original skeleton of the optimization functions was adopted from the intamapInteractive package with the approval of the package authors, Edzer Pebesma edzer.pebesma@uni-muenster.de and Jon Skoien jon.skoien@gmail.com. The current skeleton is based on the later adoption of several solutions implemented in the script developed and published by Murray Lark mlark@bgs.ac.uk as part of a short course (‘Computational tools to optimize spatial sampling’) offered for the first time at the 2015 EGU General Assembly in Vienna, Austria.
A few small solutions were adopted from the packages SpatialTools, authored by Joshua French joshua.french@ucdenver.edu, clhs, authored by Pierre Roudier roudierp@landcareresearch.co.nz, and spcosa, authored by Dennis Walvoort dennis.Walvoort@wur.nl, Dick Brus dick.brus@wur.nl, and Jaap de Gruijter Jaap.degruijter@wur.nl.
Major conceptual contributions were made by Gerard Heuvelink gerard.heuvelink@wur.nl, Dick Brus dick.brus@wur.nl, Murray Lark mlark@bgs.ac.uk, and Edzer Pebesma edzer.pebesma@uni-muenster.de.