Learn R Programming

BalancedSampling (version 2.0.6)

lpm: The (Local) Pivotal Methods

Description

Selects spatially balanced samples with prescribed inclusion probabilities from a finite population using the Local Pivotal Method 1 (LPM1).

Usage

lpm(prob, x, type = "kdtree2", bucketSize = 50, eps = 1e-12)

lpm1(prob, x, type = "kdtree2", bucketSize = 50, eps = 1e-12)

lpm2(prob, x, type = "kdtree2", bucketSize = 50, eps = 1e-12)

lpm1s(prob, x, type = "kdtree2", bucketSize = 50, eps = 1e-12)

spm(prob, eps = 1e-12)

rpm(prob, eps = 1e-12)

Value

A vector of selected indices in 1,2,...,N.

Arguments

prob

A vector of length N with inclusion probabilities, or an integer > 1. If an integer n, then the sample will be drawn with equal probabilities n / N.

x

An N by p matrix of (standardized) auxiliary variables. Squared euclidean distance is used in the x space.

type

The method used in finding nearest neighbours. Must be one of "kdtree0", "kdtree1", "kdtree2", and "notree".

bucketSize

The maximum size of the terminal nodes in the k-d-trees.

eps

A small value used to determine when an updated probability is close enough to 0.0 or 1.0.

Functions

  • lpm1():

  • lpm2():

  • lpm1s():

  • spm():

  • rpm():

k-d-trees

The types "kdtree" creates k-d-trees with terminal node bucket sizes according to bucketSize.

  • "kdtree0" creates a k-d-tree using a median split on alternating variables.

  • "kdtree1" creates a k-d-tree using a median split on the largest range.

  • "kdtree2" creates a k-d-tree using a sliding-midpoint split.

  • "notree" does a naive search for the nearest neighbour.

Details

If prob sum to an integer n, a fixed sized sample (n) will be produced. For spm and rpm, prob must be a vector of inclusion probabilities. If equal inclusion probabilities is wanted, this can be produced by rep(n / N, N).

The available pivotal methods are:

  • lpm1: The Local Pivotal Mehtod 1 (Grafström et al., 2012). Updates only units which are mutual nearest neighbours. Selects such a pair at random.

  • lpm2, lpm: The Local Pivotal Method 2 (Grafström et al., 2012). Selects a unit at random, which competes with this units nearest neighbour.

  • lpm1s: The Local Pivotal Method 1 search: (Prentius, 2023). Updates only units which are mutual nearest neighbours. Selects such a pair by branching the remaining units, giving higher probabilities to update a pair with a long branch. This changes the algorithm of lpm1, but makes it faster.

  • spm: The Sequential Pivotal Method. Selects the two units with smallest indices to compete against each other. If the list is ordered, the algorithm is similar to systematic sampling.

  • rpm: The Random Pivotal Method. Selects two units at random to compete against each other. Produces a design with high entropy.

References

Friedman, J. H., Bentley, J. L., & Finkel, R. A. (1977). An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software (TOMS), 3(3), 209-226.

Deville, J.-C., & Tillé, Y. (1998). Unequal probability sampling without replacement through a splitting method. Biometrika 85, 89-101.

Maneewongvatana, S., & Mount, D. M. (1999, December). It’s okay to be skinny, if your friends are fat. In Center for geometric computing 4th annual workshop on computational geometry (Vol. 2, pp. 1-8).

Chauvet, G. (2012). On a characterization of ordered pivotal sampling. Bernoulli, 18(4), 1320-1340.

Grafström, A., Lundström, N.L.P. & Schelin, L. (2012). Spatially balanced sampling through the Pivotal method. Biometrics 68(2), 514-520.

Lisic, J. J., & Cruze, N. B. (2016, June). Local pivotal methods for large surveys. In Proceedings of the Fifth International Conference on Establishment Surveys.

Prentius, W. (2023) Manuscript.

See Also

Other sampling: cube(), hlpm2(), lcube(), scps()

Examples

Run this code
if (FALSE) {
set.seed(12345);
N = 1000;
n = 100;
prob = rep(n/N, N);
x = matrix(runif(N * 2), ncol = 2);
s = lpm2(prob, x);
plot(x[, 1], x[, 2]);
points(x[s, 1], x[s, 2], pch = 19);

set.seed(12345);
prob = c(0.2, 0.25, 0.35, 0.4, 0.5, 0.5, 0.55, 0.65, 0.7, 0.9);
N = length(prob);
x = matrix(runif(N * 2), ncol = 2);
ep = rep(0L, N);
r = 10000L;
for (i in seq_len(r)) {
  s = lpm2(prob, x);
  ep[s] = ep[s] + 1L;
}
print(ep / r);

set.seed(12345);
N = 1000;
n = 100;
prob = rep(n/N, N);
x = matrix(runif(N * 2), ncol = 2);
lpm1(prob, x);
lpm2(prob, x);
lpm1s(prob, x);
spm(prob);
rpm(prob);
}

Run the code above in your browser using DataLab