Learn R Programming

SamplingStrata (version 1.5-4)

KmeansSolution2: Initial solution obtained by applying kmeans clustering of frame units

Description

This function has to be used only in conjunction with "optimizeStrata2", or with "optimStrata" when method = "continuous", i.e. in the case of optimizing with only continuous stratification variables. The function "KmeansSolution2" has a twofold objective: - to give indication about a possible best number of final strata (by fixing a convenient value for "maxclusters", and leaving NA to "nstrata"; - to give an initial solution fo the optimization step. If the parameter "nstrata" is not indicated, the optimal number of clusters is determined inside each domain, and the overall solution is obtained by concatenating optimal clusters obtained in domains. The result is a dataframe with two columns: the first indicates the clusters, the second the domains.

Usage

KmeansSolution2(frame,
                model=NULL,
                errors,
                nstrata = NA,
                minnumstrat =2,
                maxclusters = NA,
                showPlot = TRUE)

Value

A dataframe containing the solution

Arguments

frame

The (mandatory) dataframe containing the information related to each unit in the sampling frame.

model

The (optional) dataframe containing the information related to models used to predict values of the target variables.

errors

The (mandatory) dataframe containing the precision constraints on target variables.

nstrata

Number of aggregate strata (if NULL, it is optimized by varying the number of cluster from 2 to half number of atomic strata). Default is NA.

minnumstrat

Minimum number of units to be selected in each stratum. Default is 2.

maxclusters

Maximum number of clusters to be considered in the execution of kmeans algorithm. If not indicated it will be set equal to the number of atomic strata divided by 2.

showPlot

Allows to visualise on a plot the different sample sizes for each number of aggregate strata. Default is TRUE.

Author

Giulio Barcaroli

Examples

Run this code
if (FALSE) {
library(SamplingStrata)
data("swissmunicipalities")
swissmunicipalities$id <- c(1:nrow(swissmunicipalities))
swissmunicipalities$dom <- 1
frame <- buildFrameDF(swissmunicipalities,
                      id = "id",
                      domainvalue = "REG",
                      X = c("Pop020", "Pop2040"),
                      Y = c("Pop020", "Pop2040")
)
cv <- NULL
cv$DOM <- "DOM1"
cv$CV1 <- 0.1
cv$CV2 <- 0.1
cv <- as.data.frame(cv)
cv <- cv[rep(row.names(cv),7),]
cv$domainvalue <- c(1:7)
cv

# Solution with kmean clustering 
kmean <- KmeansSolution2(frame,model=NULL,errors=cv,nstrata=NA,maxclusters=4)
# number of strata to be obtained in each domain in final solution	
nstrat <- tapply(solutionKmean$suggestions,
                 solutionKmean$domainvalue,
                 FUN=function(x) length(unique(x)))
# Prepare suggestion for optimization
sugg <- prepareSuggestion(kmean,frame,nstrat)
# Optimization
# Attention: number of strata must be the same than in kmeans
solution <- optimStrata (
  method = "continuous",
  cv, 
  framesamp=frame,
  iter = 50,
  pops = 20,
  nStrata = nstrat,
  suggestions = sugg,
  writeFiles = FALSE,
  showPlot = FALSE,
  parallel = FALSE
)
}

Run the code above in your browser using DataLab