Learn R Programming

rchemo (version 0.1-3)

sampks: Kennard-Stone sampling

Description

The function divides the data \(X\) in two sets, "train" vs "test", using the Kennard-Stone (KS) algorithm (Kennard & Stone, 1969). The two sets correspond to two different underlying probability distributions: set "train" has higher dispersion than set "test".

Usage

sampks(X, k, diss = c("eucl", "mahal"))

Value

train

Indexes (i.e. row numbers in \(X\)) of the selected observations, for the training set.

test

Indexes (i.e. row numbers in \(X\)) of the selected observations, for the test set.

Arguments

X

X-data (\(n, p\)) to be sampled.

k

An integer defining the number of training observations to select.

diss

The type of dissimilarity used for selecting the observations in the algorithm. Possible values are "eucl" (default; Euclidean distance) or "mahal" (Mahalanobis distance).

References

Kennard, R.W., Stone, L.A., 1969. Computer aided design of experiments. Technometrics, 11(1), 137-148.

Examples

Run this code

n <- 10 ; p <- 3
X <- matrix(rnorm(n * p), ncol = p)

k <- 7
sampks(X, k = k)  

n <- 10 ; k <- 25
X <- expand.grid(1:n, 1:n)
X <- X + rnorm(nrow(X) * ncol(X), 0, .1)
s <- sampks(X, k)$train 
plot(X)
points(X[s, ], pch = 19, col = 2, cex = 1.5)

Run the code above in your browser using DataLab