holdout: Computes indexes for holdout data split into training and test sets.

Description

Computes indexes for holdout data split into training and test sets.

Usage

holdout(y, ratio = 2/3, internalsplit = FALSE, mode = "stratified", iter = 1,  seed = NULL, window=10, increment=1)

Arguments

desired target: numeric vector; or factor -- then a stratified holdout is applied (i.e. the proportions of the classes are the same for each set).

ratio

split ratio (in percentage -- sets the training set size; or in total number of examples -- sets the test set size).

internalsplit

if TRUE then the training data is further split into training and validation sets. The same ratio parameter is used for the internal split.

mode

sampling mode. Options are:

stratified -- stratified randomized holdout if y is a factor; else it behaves as standard randomized holdout;
random -- standard randomized holdout;
order -- static mode, where the first examples are used for training and the later ones for testing (useful for time series data);
rolling -- rolling window, also known as sliding window (e.g. useful for stock market prediction), similar to order except that window is the window size, iter is the rolling iteration and increment is the number of samples slided at each iteration. In each iteration, the training set size is fixed to window, while the test set size is equal to ratio except for the last iteration (where it may be smaller).
incremental -- incremental retraining mode, also known as growing windows, similar to order except that window is the initial window size, iter is the incremental iteration and increment is the number of samples added at each iteration. In each iteration, the training set size grows (+increment), while the test set size is equal to ratio except for the last iteration (where it may be smaller).

iter

iteration of the incremental retraining mode (only used when mode="rolling" or "incremental", typically iter is set within a cycle, see the example below).

seed

if NULL then a random seed is used; else a fixed seed is adopted (will return always the same result for the same seed).

window

training window size (if mode="rolling") or initial training window size (if mode="incremental").

increment

number of samples added to the training window at each iteration (if mode="incremental" or mode="rolling").

Value

A list with the components:

$tr -- numeric vector with the training examples indexes;
$ts -- numeric vector with the test examples indexes;
$itr -- numeric vector with the internal training examples indexes;
$val -- numeric vector with the internal validation examples indexes;

Details

Computes indexes for holdout data split into training and test sets.

References

See fit.

Examples

Run this code

### simple examples:
# preserves order, last two elements go into test set
H=holdout(1:10,ratio=2,internal=TRUE,mode="order")
print(H)
# no seed or NULL returns different splits:
H=holdout(1:10,ratio=2/3,mode="random")
print(H)
H=holdout(1:10,ratio=2/3,mode="random",seed=NULL)
print(H)
# same seed returns identical split:
H=holdout(1:10,ratio=2/3,mode="random",seed=12345)
print(H)
H=holdout(1:10,ratio=2/3,mode="random",seed=12345)
print(H)

### classification example
data(iris)
# random stratified holdout
H=holdout(iris$Species,ratio=2/3,mode="stratified") 
print(table(iris[H$tr,]$Species))
print(table(iris[H$ts,]$Species))
M=fit(Species~.,iris[H$tr,],model="rpart") # training data only
P=predict(M,iris[H$ts,]) # test data
print(mmetric(iris$Species[H$ts],P,"CONF"))

### regression example with incremental and rolling window holdout:
## Not run: 
# ts=c(1,4,7,2,5,8,3,6,9,4,7,10,5,8,11,6,9)
# d=CasesSeries(ts,c(1,2,3))
# print(d) # with 14 examples
# # incremental holdout example (growing window)
# for(b in 1:4) # iterations
#   {
#    H=holdout(d$y,ratio=4,mode="incremental",iter=b,window=5,increment=2)
#    M=fit(y~.,d[H$tr,],model="mlpe",search=2)
#    P=predict(M,d[H$ts,])
#    cat("batch :",b,"TR from:",H$tr[1],"to:",H$tr[length(H$tr)],"size:",length(H$tr),
#        "TS from:",H$ts[1],"to:",H$ts[length(H$ts)],"size:",length(H$ts),
#        "mae:",mmetric(d$y[H$ts],P,"MAE"),"\n")
#   }
# # rolling holdout example (sliding window)
# for(b in 1:4) # iterations
#   {
#    H=holdout(d$y,ratio=4,mode="rolling",iter=b,window=5,increment=2)
#    M=fit(y~.,d[H$tr,],model="mlpe",search=2)
#    P=predict(M,d[H$ts,])
#    cat("batch :",b,"TR from:",H$tr[1],"to:",H$tr[length(H$tr)],"size:",length(H$tr),
#        "TS from:",H$ts[1],"to:",H$ts[length(H$ts)],"size:",length(H$ts),
#        "mae:",mmetric(d$y[H$ts],P,"MAE"),"\n")
#   }
# ## End(Not run)

Run the code above in your browser using DataLab