ResampleInstance
, when given the size of the data set.makeResampleDesc(method, predict = "test", ..., stratify = FALSE,
stratify.cols = NULL)
character(1)
]
“CV” for cross-validation, “LOO” for leave-one-out, “RepCV” for
repeated cross-validation, “Bootstrap” for out-of-bag bootstrap, “Subsample” for
subsampling, “Holdout” for holdout.character(1)
]
What to predict during resampling: “train”, “test” or “both” sets.
Default is “test”.integer(1)
]numeric(1)
]integer(1)
]iters = folds * reps
.
Default is 10.integer(1)]
RepCV
.
Here iters = folds * reps
. Default is 10.logical(1)
]
Should stratification be done for the target variable?
For classification tasks, this means that the resampling strategy is applied to all classes
individually and the resulting index sets are joined to make sure that the proportion of
observations in each training set is as in the original data set. Useful for imbalanced class sizes.
For survival tasks stratification is done on the events, resulting in training sets with comparable
censoring rates.character
]
Stratify on specific columns referenced by name. All columns have to be factors.
Note that you have to ensure yourself that stratification is possible, i.e.
that each strata contains enough observations.
This argument and stratify
are mutually exclusive.ResampleDesc
].setAggregation
.setAggregation
.setAggregation
.makeFixedHoldoutInstance
.character(1)
]integer(1)
]character(1)
]logical(1)
]ResamplePrediction
,
ResampleResult
, addRRMeasure
,
getRRPredictionList
,
getRRPredictions
,
getRRTaskDescription
,
makeResampleInstance
,
resample
# Bootstraping
makeResampleDesc("Bootstrap", iters = 10)
makeResampleDesc("Bootstrap", iters = 10, predict = "both")
# Subsampling
makeResampleDesc("Subsample", iters = 10, split = 3/4)
makeResampleDesc("Subsample", iters = 10)
# Holdout a.k.a. test sample estimation
makeResampleDesc("Holdout")
Run the code above in your browser using DataLab