Learn R Programming

SamplingStrata (version 1.5-4)

optimStrata: Optimization of the stratification of a sampling frame given a sample survey

Description

Wrapper function to call the different optimization functions: (i) optimizeStrata (method = "atomic"); (ii) optimizeStrata2 (method = "continuous"); (iii) optimizeStrataSpatial (method = "spatial"). For continuity reasons, these functions are still available to be used standalone.

Usage

optimStrata(method=c("atomic","continuous","spatial"),
            # common parameters
            framesamp,
            framecens=NULL,
            model=NULL,
            nStrata=NA,
            errors,
            alldomains=TRUE,
            dom=NULL,
            strcens=FALSE,
            minnumstr=2,
            iter=50,
            pops=20,
            mut_chance=NA,
            elitism_rate=0.2,
            suggestions=NULL,
            writeFiles=FALSE,
            showPlot=TRUE,
            parallel=TRUE,
            cores=NA,
            # parameters only for optimizeStrataSpatial
            fitting=NA,
            range=NA,
            kappa=NA)

Value

List containing (1) the vector of the solution, (2) the optimal aggregated strata, (3) the total sampling frame with the label of aggregated strata

Arguments

method

This parameter allows to choose the method to be applied in the optimization step: (i) optimizeStrata (method = "atomic"); (ii) optimizeStrata (method = "continuous"); (iii) optimizeStrata (method = "spatial")

errors

This is the (mandatory) dataframe containing the precision levels expressed in terms of maximum expected value of the Coefficients of Variation related to target variables of the survey.

framesamp

This is the dataframe containing the information related to the sampling frame.

framecens

This the dataframe containing the units to be selected in any case. It has same structure than "framesamp" dataframe.

nStrata

Indicates the number of strata to be obtained in the final solution.

model

In case the Y variables are not directly observed, but are estimated by means of other explicative variables, in order to compute the anticipated variance, information on models are given by a dataframe "model" with as many rows as the target variables. Each row contains the indication if the model is linear o loglinear, and the values of the model parameters beta, sig2, gamma (> 1 in case of heteroscedasticity). Default is NULL.

alldomains

Flag (TRUE/FALSE) to indicate if the optimization must be carried out on all domains (default is TRUE). If it is set to FALSE, then a value must be given to parameter 'dom'.

dom

Indicates the domain on which the optimization must be carried. It is an integer value that has to be internal to the interval (1 <--> number of domains). If 'alldomains' is set to TRUE, it is ignored.

strcens

Flag (TRUE/FALSE) to indicate if takeall strata do exist or not. Default is FALSE.

minnumstr

Indicates the minimum number of units that must be allocated in each stratum. Default is 2.

iter

Indicates the maximum number of iterations (= generations) of the genetic algorithm. Default is 50.

pops

The dimension of each generations in terms of individuals. Default is 20.

mut_chance

Mutation chance: for each new individual, the probability to change each single chromosome, i.e. one bit of the solution vector. High values of this parameter allow a deeper exploration of the solution space, but a slower convergence, while low values permit a faster convergence, but the final solution can be distant from the optimal one. Default is NA, in correspondence of which it is computed as 1/(vars+1) where vars is the length of elements in the solution.

elitism_rate

This parameter indicates the rate of better solutions that must be preserved from one generation to another. Default is 0.2 (20

suggestions

Optional parameter for genetic algorithm that indicates a suggested solution to be introduced in the initial population. The most convenient is the one found by the function "KmeanSolution". Default is NULL.

writeFiles

Indicates if the various dataframes and plots produced during the execution have to be written in the working directory. Default is FALSE.

showPlot

Indicates if the plot showing the trend in the value of the objective function has to be shown or not. In parallel = TRUE, this defaults to FALSE Default is TRUE.

parallel

Should the analysis be run in parallel. Default is TRUE.

cores

If the analysis is run in parallel, how many cores should be used. If not specified n-1 of total available cores are used OR if number of domains < (n-1) cores, then number of cores equal to number of domains are used.

fitting

Fitting of the model(s) (in terms of R squared). It is a vector with as many elements as the number of target variables Y.

range

Maximum range for spatial autocorrelation. It is a vector with as many elements as the number of target variables Y.

kappa

Factor used in evaluating spatial autocorrelation.

Author

Giulio Barcaroli

Examples

Run this code
if (FALSE) {
library(SamplingStrata)
############################
# Example of "atomic" method
############################
data(swissmunicipalities)
swissmunicipalities$id <- c(1:nrow(swissmunicipalities))
frame <- buildFrameDF(df = swissmunicipalities,
                      id = "id",
                      domainvalue = "REG",
                      X = c("POPTOT","HApoly"),
                      Y = c("Surfacesbois", "Airind"))
ndom <- length(unique(frame$domainvalue))
cv <- as.data.frame(list(DOM = rep("DOM1",ndom),
                         CV1 = rep(0.1,ndom),
                         CV2 = rep(0.1,ndom),
                         domainvalue = c(1:ndom)))
strata <- buildStrataDF(frame)
kmean <- KmeansSolution(strata,cv,maxclusters=30)
nstrat <- tapply(kmean$suggestions, kmean$domainvalue,
                 FUN=function(x) length(unique(x)))
solution <- optimStrata(method ="atomic",
                        framesamp = frame,
                        errors = cv,
                        nStrata = nstrat,
                        suggestions = kmean,
                        iter = 50,
                        pops = 10)
outstrata <- solution$aggr_strata
framenew <- solution$framenew
s <- selectSample(framenew, outstrata)
################################
# Example of "continuous" method
################################
kmean <- KmeansSolution2(frame = frame, 
                         errors = cv, 
                         maxclusters = 10)
nstrat <- tapply(kmean$suggestions, kmean$domainvalue,
                 FUN=function(x) length(unique(x)))
sugg <- prepareSuggestion(kmean = kmean, 
                          frame = frame, 
                          nstrat = nstrat)
solution <- optimStrata(method = "continuous",
                        framesamp = frame,
                        errors = cv,
                        nStrata = nstrat,
                        iter = 50,
                        pops = 10,
                        suggestions = sugg)
framenew <- solution$framenew
outstrata <- solution$aggr_strata
s <- selectSample(framenew,outstrata)
#############################
# Example of "spatial" method
#############################
library(sp)
data("meuse")
data("meuse.grid")
meuse.grid$id <- c(1:nrow(meuse.grid))
coordinates(meuse) <- c('x','y')
coordinates(meuse.grid) <- c('x','y')
library(gstat)
library(automap)
v <- variogram(lead ~ dist + soil, data = meuse)
fit.vgm.lead <- autofitVariogram(lead ~ dist + soil,
                                 meuse,
                                 model = "Exp")
plot(v, fit.vgm.lead$var_model)
lead.kr <- krige(lead ~ dist + soil, meuse, meuse.grid,
                 model = fit.vgm.lead$var_model)
lead.pred <- ifelse(lead.kr[1]$var1.pred < 0,0, lead.kr[1]$var1.pred)
lead.var <- ifelse(lead.kr[2]$var1.var < 0, 0, lead.kr[2]$var1.var)
df <- as.data.frame(list(dom = rep(1,nrow(meuse.grid)),
                         lead.pred = lead.pred,
                         lead.var = lead.var,
                         lon = meuse.grid$x,
                         lat = meuse.grid$y,
                         id = c(1:nrow(meuse.grid))))
frame <-buildFrameSpatial(df = df,
                          id = "id",
                          X = c("lead.pred"),
                          Y = c("lead.pred"),
                          variance = c("lead.var"),
                          lon = "lon",
                          lat = "lat",
                          domainvalue = "dom")
cv <- as.data.frame(list(DOM = rep("DOM1",1),
                         CV1 = rep(0.05,1),
                         domainvalue = c(1:1) ))
solution <- optimStrata(method = "spatial",
                        errors = cv, 
                        framesamp = frame, 
                        iter = 25,
                        pops = 10,
                        nStrata = 5, 
                        fitting = 1, 
                        kappa = 1,
                        range = fit.vgm.lead$var_model$range[2])
framenew <- solution$framenew
outstrata <- solution$aggr_strata
frameres <- SpatialPixelsDataFrame(points = framenew[c("LON","LAT")],
                                   data = framenew)
frameres$LABEL <- as.factor(frameres$LABEL)
spplot(frameres,c("LABEL"), col.regions=bpy.colors(5))
s <- selectSample(framenew,outstrata)
}

Run the code above in your browser using DataLab