the whole dataset is split into multiple folds randomly (batch=NULL
) or according to the batch information (batch
is specified). The number of folds are defined by nFold
in the former case. In the latter case, data belonging to each batch is used as one fold if nBatch=0
, otherwise the dataset is split into nBatch
folds according to the batch information (i.e., data from the same batch will be used exclusively in one fold).
dataSplit(ixData, batch = NULL,
nBatch = 0, nFold = 10,
verbose = TRUE, seed = NULL)
a vector of integers, demonstrating the indices of spectra.
a vector of sample identifications (e.g., batch/patient ID), must be the same length as ixData
. Ideally, this should be the identification of the samples at the highest hierarchy (e.g., the patient ID rather than the spectral ID). If missing, the data is split randomly into nFold
folds.
an integer, the number of data folds in case of batch-wise cross-validaiton (if nBatch=0
, each batch will be used as one fold). Ignored if batch
is missing.
an integer, the number of data folds in case of normal k-fold cross-validaiton. Ignored if batch
is given.
a boolean value, if or not to print out the logging info.
an integer, if given, will be used as the random seed to split the data in case of k-fold cross-validation. Ignored if batch
is given.
a list, of which each element representing the indices of the sample belonging to one fold.
S. Guo, T. Bocklitz, et al., Common mistakes in cross-validating classification models. Analytical methods 2017, 9 (30): 4410-4417.