Optimize a sample configuration for spatial trend identification and estimation. A criterion is defined so that the sample reproduces the marginal distribution of the covariates (DIST).
optimDIST(points, candi, covars, strata.type = "area",
use.coords = FALSE, schedule = scheduleSPSANN(), plotit = FALSE,
track = FALSE, boundary, progress = "txt", verbose = FALSE)objDIST(points, candi, covars, strata.type = "area",
use.coords = FALSE)
Integer value, integer vector, data frame or matrix, or list.
Integer value. The number of points. These points will be randomly sampled from candi
to form
the starting sample configuration.
Integer vector. The row indexes of candi
that correspond to the points that form the starting
sample configuration. The length of the vector defines the number of points.
Data frame or matrix. An object with three columns in the following order: [, "id"]
, the
row indexes of candi
that correspond to each point, [, "x"]
, the projected x-coordinates, and
[, "y"]
, the projected y-coordinates.
List. An object with two named sub-arguments: fixed
, a data frame or matrix with the projected
x- and y-coordinates of the existing sample configuration -- kept fixed during the optimization --, and
free
, an integer value defining the number of points that should be added to the existing sample
configuration -- free to move during the optimization.
Data frame or matrix with the candidate locations for the jittered points. candi
must
have two columns in the following order: [, "x"]
, the projected x-coordinates, and [, "y"]
,
the projected y-coordinates.
Data frame or matrix with the covariates in the columns.
(Optional) Character value setting the type of stratification that should be used to
create the marginal sampling strata (or factor levels) for the numeric covariates. Available options are
"area"
, for equal-area, and "range"
, for equal-range. Defaults to strata.type = "area"
.
(Optional) Logical value. Should the spatial x- and y-coordinates be used as covariates?
Defaults to use.coords = FALSE
.
List with 11 named sub-arguments defining the control parameters of the cooling schedule.
See scheduleSPSANN
.
(Optional) Logical for plotting the optimization results, including a) the progress of the
objective function, and b) the starting (gray circles) and current sample configuration (black dots), and
the maximum jitter in the x- and y-coordinates. The plots are updated at each 10 jitters. When adding
points to an existing sample configuration, fixed points are indicated using black crosses. Defaults to
plotit = FALSE
.
(Optional) Logical value. Should the evolution of the energy state be recorded and returned
along with the result? If track = FALSE
(the default), only the starting and ending energy states
are returned along with the results.
(Optional) SpatialPolygon defining the boundary of the spatial domain. If missing and
plotit = TRUE
, boundary
is estimated from candi
.
(Optional) Type of progress bar that should be used, with options "txt"
, for a text
progress bar in the R console, "tk"
, to put up a Tk progress bar widget, and NULL
to omit the
progress bar. A Tk progress bar widget is useful when using parallel processors. Defaults to
progress = "txt"
.
(Optional) Logical for printing messages about the progress of the optimization. Defaults to
verbose = FALSE
.
optimDIST
returns an object of class OptimizedSampleConfiguration
: the optimized sample
configuration with details about the optimization.
objDIST
returns a numeric value: the energy state of the sample configuration -- the objective
function value.
Details about the mechanism used to generate a new sample configuration out of the current sample
configuration by randomly perturbing the coordinates of a sample point are available in the help page of
spJitter
.
Reproducing the marginal distribution of the numeric covariates depends upon the definition of marginal sampling strata. These marginal sampling strata are also used to define the factor levels of all numeric covariates that are passed together with factor covariates. Two types of marginal sampling strata can be used: equal-area and equal-range.
Equal-area marginal sampling strata are defined using the sample quantiles estimated with
quantile
using a discontinuous function (type = 3
). Using a discontinuous
function avoids creating breakpoints that do not occur in the population of existing covariate values.
Depending on the level of discretization of the covariate values, quantile
produces
repeated breakpoints. A breakpoint will be repeated if that value has a relatively high frequency in the
population of covariate values. The number of repeated breakpoints increases with the number of marginal
sampling strata. Repeated breakpoints result in empty marginal sampling strata. To avoid this, only the
unique breakpoints are used.
Equal-range marginal sampling strata are defined by breaking the range of covariate values into pieces of equal size. Depending on the level of discretization of the covariate values, this method creates breakpoints that do not occur in the population of existing covariate values. Such breakpoints are replaced with the nearest existing covariate value identified using Euclidean distances.
Like the equal-area method, the equal-range method can produce empty marginal sampling strata. The solution used here is to merge any empty marginal sampling strata with the closest non-empty marginal sampling strata. This is identified using Euclidean distances as well.
The approaches used to define the marginal sampling strata result in each numeric covariate having a different number of marginal sampling strata, some of them with different area/size. Because the goal is to have a sample that reproduces the marginal distribution of the covariate, each marginal sampling strata will have a different number of sample points. The wanted distribution of the number of sample points per marginal strata is estimated empirically as the proportion of points in the population of existing covariate values that fall in each marginal sampling strata.
Hyndman, R. J.; Fan, Y. Sample quantiles in statistical packages. The American Statistician, v. 50, p. 361-365, 1996.
Everitt, B. S. The Cambridge dictionary of statistics. Cambridge: Cambridge University Press, p. 432, 2006.
# NOT RUN {
require(sp)
data(meuse.grid)
candi <- meuse.grid[, 1:2]
covars <- meuse.grid[, 5]
schedule <- scheduleSPSANN(initial.temperature = 1, chains = 1,
x.max = 1540, y.max = 2060, x.min = 0,
y.min = 0, cellsize = 40)
set.seed(2001)
res <- optimDIST(points = 10, candi = candi, covars = covars,
use.coords = TRUE, schedule = schedule)
objSPSANN(res) -
objDIST(points = res, candi = candi, covars = covars, use.coords = TRUE)
# }
Run the code above in your browser using DataLab