A dataset containing the performance of several predictive models over three different datasets, where models are built using data from multiple time points (where time 1 has less data than time 2, but each subsequent time point T also uses data from all prior time points up to that time t<= T.) This represents the typical output of a machine learning experiment where several models are being considered across multiple datasets, often with different variants of each model type being considered (i.e., different hyperparameter settings of each model).
sample_experiment_data
A data frame with 180 rows and 10 variables:
Area Under the Receiver Operating Characteristic Curve, or AUC, for this model configuration.
Precision for this model configuration.
Accuracy for this model configuration.
Number of observations in this dataset.
Number of negative observations (i.e., outcome == 0) in this dataset (required for standard error estimation of AUC statistic).
Number of positive observations (i.e., outcome == 1) in this dataset (required for standard error estimation of AUC statistic).
indicator for different datasets.
indicator for different time points used to build each dataset; these represent dependent observations of model performance.
Indicator for the statistical algorithm used (this could be 'Logistic Regression', 'SVM', etc.).
Indicator for different variants of each model which are not equivalent and should be used individually (model should not be averaged over these, and instead should be held fixed when comparing to other model). Example of this could be various hyperparameter settings for a given model (i.e., cost for an SVM).