BenchmarkResult: Container for Benchmarking Results

Description

This is the result container object returned by benchmark(). A BenchmarkResult consists of the data row-binded data of multiple ResampleResults, which can easily be re-constructed.

BenchmarkResults can be visualized via mlr3viz's autoplot() function.

For statistical analysis of benchmark results and more advanced plots, see mlr3benchmark.

Arguments

S3 Methods

as.data.table(rr, ..., reassemble_learners = TRUE, convert_predictions = TRUE, predict_sets = "test") BenchmarkResult -> data.table::data.table() Returns a tabular view of the internal data.
c(...) (BenchmarkResult, ...) -> BenchmarkResult Combines multiple objects convertible to BenchmarkResult into a new BenchmarkResult.

Public fields

data: (ResultData) Internal data storage object of type ResultData. We discourage users to directly work with this field. Use as.table.table(BenchmarkResult) instead.

Active bindings

task_type

(character(1)) Task type of objects in the BenchmarkResult. All stored objects (Task, Learner, Prediction) in a single BenchmarkResult are required to have the same task type, e.g., "classif" or "regr". This is NA for empty BenchmarkResults.

tasks

(data.table::data.table()) Table of included Tasks with three columns:

"task_hash" (character(1)),
"task_id" (character(1)), and
"task" (Task).

learners

(data.table::data.table()) Table of included Learners with three columns:

"learner_hash" (character(1)),
"learner_id" (character(1)), and
"learner" (Learner).

Note that it is not feasible to access learned models via this field, as the training task would be ambiguous. For this reason the returned learner are reseted before they are returned. Instead, select a row from the table returned by $score().

resamplings

(data.table::data.table()) Table of included Resamplings with three columns:

"resampling_hash" (character(1)),
"resampling_id" (character(1)), and
"resampling" (Resampling).

resample_results

(data.table::data.table()) Returns a table with three columns:

uhash (character()).
resample_result (ResampleResult).

n_resample_results

(integer(1)) Returns the total number of stored ResampleResults.

uhashes

(character()) Set of (unique) hashes of all included ResampleResults.

Methods

Public methods

Method `new()`

Creates a new instance of this R6 class.

Usage

BenchmarkResult$new(data = NULL)

Arguments

data: (ResultData) An object of type ResultData, either extracted from another ResampleResult, another BenchmarkResult, or manually constructed with as_result_data().

Method `help()`

Opens the help page for this object.

Usage

BenchmarkResult$help()

Method `format()`

Helper for print outputs.

Usage

BenchmarkResult$format()

Method `print()`

Printer.

Usage

BenchmarkResult$print()

Method `combine()`

Fuses a second BenchmarkResult into itself, mutating the BenchmarkResult in-place. If the second BenchmarkResult bmr is NULL, simply returns self. Note that you can alternatively use the combine function c() which calls this method internally.

Usage

BenchmarkResult$combine(bmr)

Arguments

bmr: (BenchmarkResult) A second BenchmarkResult object.

Returns

Returns the object itself, but modified by reference. You need to explicitly $clone() the object beforehand if you want to keep the object in its previous state.

Method `score()`

Returns a table with one row for each resampling iteration, including all involved objects: Task, Learner, Resampling, iteration number (integer(1)), and Prediction. If ids is set to TRUE, character column of extracted ids are added to the table for convenient filtering: "task_id", "learner_id", and "resampling_id".

Additionally calculates the provided performance measures and binds the performance scores as extra columns. These columns are named using the id of the respective Measure.

Usage

BenchmarkResult$score(
  measures = NULL,
  ids = TRUE,
  conditions = FALSE,
  predict_sets = "test"
)

Arguments

measures: (Measure | list of Measure) Measure(s) to calculate.

ids

(logical(1)) Adds object ids ("task_id", "learner_id", "resampling_id") as extra character columns to the returned table.

conditions

(logical(1)) Adds condition messages ("warnings", "errors") as extra list columns of character vectors to the returned table

predict_sets

(character()) Vector of predict sets ({"train", "test"}) to construct the Prediction objects from. Default is "test".

Returns

data.table::data.table().

Method `aggregate()`

Returns a result table where resampling iterations are combined into ResampleResults. A column with the aggregated performance score is added for each Measure, named with the id of the respective measure.

For convenience, different flags can be set to extract more information from the returned ResampleResult:

Usage

BenchmarkResult$aggregate(
  measures = NULL,
  ids = TRUE,
  uhashes = FALSE,
  params = FALSE,
  conditions = FALSE
)

Arguments

measures: (Measure | list of Measure) Measure(s) to calculate.

ids

(logical(1)) Adds object ids ("task_id", "learner_id", "resampling_id") as extra character columns for convenient subsetting.

uhashes

(logical(1)) Adds the uhash values of the ResampleResult as extra character column "uhash".

params

(logical(1)) Adds the hyperparameter values as extra list column "params". You can unnest them with mlr3misc::unnest().

conditions

(logical(1)) Adds the number of resampling iterations with at least one warning as extra integer column "warnings", and the number of resampling iterations with errors as extra integer column "errors".

Returns

data.table::data.table().

Method `filter()`

Subsets the benchmark result. If task_ids is not NULL, keeps all tasks with provided task ids and discards all others tasks. Same procedure for learner_ids and resampling_ids.

Usage

BenchmarkResult$filter(
  task_ids = NULL,
  task_hashes = NULL,
  learner_ids = NULL,
  learner_hashes = NULL,
  resampling_ids = NULL,
  resampling_hashes = NULL
)

Arguments

task_ids: (character()) Ids of Tasks to keep.

task_hashes

(character()) Hashes of Tasks to keep.

learner_ids

(character()) Ids of Learners to keep.

learner_hashes

(character()) Hashes of Learners to keep.

resampling_ids

(character()) Ids of Resamplings to keep.

resampling_hashes

(character()) Hashes of Resamplings to keep.

Returns

Returns the object itself, but modified by reference. You need to explicitly $clone() the object beforehand if you want to keeps the object in its previous state.

Method `resample_result()`

Retrieve the i-th ResampleResult, by position or by unique hash uhash. i and uhash are mutually exclusive.

Usage

BenchmarkResult$resample_result(i = NULL, uhash = NULL)

Arguments

i: (integer(1)) The iteration value to filter for.

uhash

(logical(1)) The ushash value to filter for.

Returns

ResampleResult.

Method `clone()`

The objects of this class are cloneable with this method.

Usage

BenchmarkResult$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

Run this code

# NOT RUN {
set.seed(123)
learners = list(
  lrn("classif.featureless", predict_type = "prob"),
  lrn("classif.rpart", predict_type = "prob")
)

design = benchmark_grid(
  tasks = list(tsk("sonar"), tsk("spam")),
  learners = learners,
  resamplings = rsmp("cv", folds = 3)
)
print(design)

bmr = benchmark(design)
print(bmr)

bmr$tasks
bmr$learners

# first 5 resampling iterations
head(as.data.table(bmr, measures = c("classif.acc", "classif.auc")), 5)

# aggregate results
bmr$aggregate()

# aggregate results with hyperparameters as separate columns
mlr3misc::unnest(bmr$aggregate(params = TRUE), "params")

# extract resample result for classif.rpart
rr = bmr$aggregate()[learner_id == "classif.rpart", resample_result][[1]]
print(rr)

# access the confusion matrix of the first resampling iteration
rr$predictions()[[1]]$confusion

# reduce to subset with task id "sonar"
bmr$filter(task_ids = "sonar")
print(bmr)
# }

Run the code above in your browser using DataLab

Description

Arguments

S3 Methods

Public fields

Active bindings

Methods

Public methods

Method new()

Usage

Arguments

Method help()

Usage

Method format()

Usage

Method print()

Usage

Method combine()

Usage

Arguments

Returns

Method score()

Usage

Arguments

Returns

Method aggregate()

Usage

Arguments

Returns

Method filter()

Usage

Arguments

Returns

Method resample_result()

Usage

Arguments

Returns

Method clone()

Usage

Arguments

Examples

Method `new()`

Method `help()`

Method `format()`

Method `print()`

Method `combine()`

Method `score()`

Method `aggregate()`

Method `filter()`

Method `resample_result()`

Method `clone()`