pmlb: pmlb: R interface to the Penn Machine Learning Benchmarks data repository

Description

The PMLB repository contains a curated collection of data sets for evaluating and comparing machine learning algorithms. These data sets cover a range of applications, and include binary/multi-class classification problems and regression problems, as well as combinations of categorical, ordinal, and continuous features. There are approximately 290 data sets included in the PMLB repository and there are no missing values in these data sets.

Arguments

Author

Maintainer: Trang Le grixor@gmail.com (https://trang.page/)

Authors:

makeyourownmaker makeyourownmaker@gmx.com (https://github.com/makeyourownmaker)
Jason Moore jhmoore@upenn.edu (http://www.epistasisblog.org/)

Other contributors:

University of Pennsylvania [copyright holder]

Details

This R library includes summaries of the classification and regression data sets but does NOT include any of the PMLB data sets. The data sets can be downloaded using the fetch_data function which is similar to the corresponding PMLB python function.

See fetch_data, pmlb_metadata for usage examples and further information.

If you use PMLB in a scientific publication, please consider citing the following paper:

Randal S. Olson, William La Cava, Patryk Orzechowski, Ryan J. Urbanowicz, and Jason H. Moore (2017).

PMLB: a large benchmark suite for machine learning evaluation and comparison

https://biodatamining.biomedcentral.com/articles/10.1186/s13040-017-0154-4

BioData Mining 10, page 36.

Description

Arguments

Author

Details

See Also