Learn R Programming

pmlbr (version 0.2.3)

nearest_datasets: Select nearest datasets given input `x`.

Description

If `x` is a data.frame object, computes dataset characteristics. If `x` is a character object specifying dataset name from PMLB, use the already computed dataset statistics/characteristics in `summary_stats`.

Usage

nearest_datasets(x, ...)

# S3 method for default nearest_datasets(x, ...)

# S3 method for character nearest_datasets( x, n_neighbors = 5, dimensions = c("n_instances", "n_features"), target_name = "target", ... )

# S3 method for data.frame nearest_datasets( x, y = NULL, n_neighbors = 5, dimensions = c("n_instances", "n_features"), task = c("classification", "regression"), target_name = "target", ... )

Value

Character string of names of most similar datasets to df, most similar dataset first.

Arguments

x

Character string of dataset name from PMLB, or data.frame of n_samples x n_features(or n_features+1 with a target column)

...

Further arguments passed to each method.

n_neighbors

Integer. The number of dataset names to return as neighbors.

dimensions

Character vector specifying dataset characteristics to include in similarity calculation. Dimensions must correspond to numeric columns of [all_summary_stats.tsv](https://github.com/EpistasisLab/pmlb/blob/master/pmlb/all_summary_stats.tsv). If 'all' (default), uses all numeric columns.

target_name

Character string specifying column of target/dependent variable.

y

Vector of target column. Required when `x`` does not contain the target column.

task

Character string specifying classification or regression for summary stat generation.

Examples

Run this code
if (interactive()){
  nearest_datasets('penguins')
  nearest_datasets(fetch_data('penguins'))
}

Run the code above in your browser using DataLab