Learn R Programming

rchemo (version 0.1-3)

sampdp: Duplex sampling

Description

The function divides the data \(X\) in two sets, "train" vs "test", using the Duplex algorithm (Snee, 1977). The two sets are of equal size. If needed, the user can add \(a posteriori\) the eventual remaining observations (not in "train" nor "test") to "train".

Usage

sampdp(X, k, diss = c("eucl", "mahal"))

Value

train

Indexes (i.e. row numbers in \(X\)) of the selected observations, for the training set.

test

Indexes (i.e. row numbers in \(X\)) of the selected observations, for the test set.

remain

Indexes (i.e., row numbers in \(X\)) of the remaining observations.

Arguments

X

X-data (\(n, p\)) to be sampled.

k

An integer defining the number of training observations to select. Must be <= \(n / 2\).

diss

The type of dissimilarity used for selecting the observations in the algorithm. Possible values are "eucl" (default; Euclidean distance) or "mahal" (Mahalanobis distance).

References

Kennard, R.W., Stone, L.A., 1969. Computer aided design of experiments. Technometrics, 11(1), 137-148.

Snee, R.D., 1977. Validation of Regression Models: Methods and Examples. Technometrics 19, 415-428. https://doi.org/10.1080/00401706.1977.10489581

Examples

Run this code

n <- 10 ; p <- 3
X <- matrix(rnorm(n * p), ncol = p)

k <- 4
sampdp(X, k = k)
sampdp(X, k = k, diss = "mahal")

Run the code above in your browser using DataLab