Learn R Programming

stuart (version 0.10.2)

holdout: Data selection for holdout validation.

Description

Split a data.frame into two subsets for holdout validation.

Usage

holdout(data, prop = 0.5, grouping = NULL, seed = NULL, determined = NULL)

Value

Returns a list containing two data.frames, called calibrate and validate. The first corresponds to the calibration sample, the second to the validation sample.

Arguments

data

A data.frame.

prop

A single value or vector of proportions of data in calibration sample. Defaults to .5, for an even split.

grouping

Name of the grouping variable. Providing a grouping variable ensures that the provided proportion is selected within each group.

seed

A random seed. See Random for more details.

determined

Name of a variable indicating the pre-determined assignment to the calibration or the validation sample. This variable must be a factor containing only NA (no determined assingment), "calibrate", or "validate". If no variable is provided (the default) all cases are assigned randomly.

Author

Martin Schultze

See Also

crossvalidate

Examples

Run this code

# seeded selection, 25% validation sample
data(fairplayer)
split <- holdout(fairplayer, .75, seed = 55635)
lapply(split, nrow) # check size of samples

Run the code above in your browser using DataLab