The PMLB repository contains a curated collection of data sets for evaluating and comparing machine learning algorithms. These data sets cover a range of applications, and include binary/multi-class classification problems and regression problems, as well as combinations of categorical, ordinal, and continuous features. There are approximately 290 data sets included in the PMLB repository and there are no missing values in these data sets.
Maintainer: Trang Le grixor@gmail.com (https://trang.page/)
Authors:
makeyourownmaker makeyourownmaker@gmx.com (https://github.com/makeyourownmaker)
Jason Moore jhmoore@upenn.edu (http://www.epistasisblog.org/)
Other contributors:
University of Pennsylvania [copyright holder]
This R library includes summaries of the classification and regression data sets but does NOT
include any of the PMLB data sets. The data sets can be downloaded using the fetch_data
function which
is similar to the corresponding PMLB python function.
See fetch_data
, pmlb_metadata
for usage examples and further information.
If you use PMLB in a scientific publication, please consider citing the following paper:
Randal S. Olson, William La Cava, Patryk Orzechowski, Ryan J. Urbanowicz, and Jason H. Moore (2017).
PMLB: a large benchmark suite for machine learning evaluation and comparison
https://biodatamining.biomedcentral.com/articles/10.1186/s13040-017-0154-4
BioData Mining 10, page 36.
Useful links: