The best k is selected for each sample, based on the selected index.
If different k's are obtained for different samples (probable) then we
calculate the mean value of k and return it as an integer. Alternatively, we can
return a more detailed result in the form of a list.
Note: this function is used within define_rb()
, with default parameters, for the
optional automatic selection of k.
Detailed option
If detailed = TRUE
, then the output is a list with information to help decide for k.
More specifically, the list will include:
A data.frame summarizing what information each index provides and how to interpret the value.
A brief summary indicating the number of samples in the dataset and the range of k values used.
A data.frame with the best k for each sample, based on each index.
Automatic k selection
If detailed = FALSE
, this function will provide a single integer with the best k.
The default decision is based on the maximum average Silhouette score obtained
for the values of k between 3 and 10. To better understand why the average Silhouette score and
this range of k's were selected, we refer to Pascoal et al., 2025 and to
vignette("explore-classifications").
Alternatively, this function can also provide the best k, as an integer, based on another index
(Davies-Bouldin and Calinski-Harabasz) and can compare the entire of possible k's.