The function divides the data \(X\) in two sets, "train" vs "test", using the Duplex algorithm (Snee, 1977). The two sets are of equal size. If needed, the user can add \(a posteriori\) the eventual remaining observations (not in "train" nor "test") to "train".
Usage
sampdp(X, k, diss = c("eucl", "mahal"))
Value
train
Indexes (i.e. row numbers in \(X\)) of the selected observations, for the training set.
test
Indexes (i.e. row numbers in \(X\)) of the selected observations, for the test set.
remain
Indexes (i.e., row numbers in \(X\)) of the remaining observations.
Arguments
X
X-data (\(n, p\)) to be sampled.
k
An integer defining the number of training observations to select. Must be <= \(n / 2\).
diss
The type of dissimilarity used for selecting the observations in the algorithm. Possible values are "eucl" (default; Euclidean distance) or "mahal" (Mahalanobis distance).