the whole dataset is split into multiple folds randomly (batch=NULL) or according to the batch information (batch is specified). The number of folds are defined by nFold in the former case. In the latter case, data belonging to each batch is used as one fold if nBatch=0, otherwise the dataset is split into nBatch folds according to the batch information (i.e., data from the same batch will be used exclusively in one fold).
dataSplit(ixData, batch = NULL,
nBatch = 0, nFold = 10,
verbose = TRUE, seed = NULL)a vector of integers, demonstrating the indices of spectra.
a vector of sample identifications (e.g., batch/patient ID), must be the same length as ixData. Ideally, this should be the identification of the samples at the highest hierarchy (e.g., the patient ID rather than the spectral ID). If missing, the data is split randomly into nFold folds.
an integer, the number of data folds in case of batch-wise cross-validaiton (if nBatch=0, each batch will be used as one fold). Ignored if batch is missing.
an integer, the number of data folds in case of normal k-fold cross-validaiton. Ignored if batch is given.
a boolean value, if or not to print out the logging info.
an integer, if given, will be used as the random seed to split the data in case of k-fold cross-validation. Ignored if batch is given.
a list, of which each element representing the indices of the sample belonging to one fold.
S. Guo, T. Bocklitz, et al., Common mistakes in cross-validating classification models. Analytical methods 2017, 9 (30): 4410-4417.