If `x` is a data.frame object, computes dataset characteristics. If `x` is a character object specifying dataset name from PMLB, use the already computed dataset statistics/characteristics in `summary_stats`.
nearest_datasets(x, ...)# S3 method for default
nearest_datasets(x, ...)
# S3 method for character
nearest_datasets(
x,
n_neighbors = 5,
dimensions = c("n_instances", "n_features"),
target_name = "target",
...
)
# S3 method for data.frame
nearest_datasets(
x,
y = NULL,
n_neighbors = 5,
dimensions = c("n_instances", "n_features"),
task = c("classification", "regression"),
target_name = "target",
...
)
Character string of names of most similar datasets to df, most similar dataset first.
Character string of dataset name from PMLB, or data.frame of n_samples x n_features(or n_features+1 with a target column)
Further arguments passed to each method.
Integer. The number of dataset names to return as neighbors.
Character vector specifying dataset characteristics to include in similarity calculation. Dimensions must correspond to numeric columns of [all_summary_stats.tsv](https://github.com/EpistasisLab/pmlb/blob/master/pmlb/all_summary_stats.tsv). If 'all' (default), uses all numeric columns.
Character string specifying column of target/dependent variable.
Vector of target column. Required when `x`` does not contain the target column.
Character string specifying classification or regression for summary stat generation.
if (interactive()){
nearest_datasets('penguins')
nearest_datasets(fetch_data('penguins'))
}
Run the code above in your browser using DataLab