Function to check the input data before running
score()
.
The data should come in one of three different formats:
A format for binary predictions (see example_binary)
A sample-based format for discrete or continuous predictions (see example_continuous and example_integer)
A quantile-based format (see example_quantile)
check_forecasts(data)
A list with elements that give information about what scoringutils
thinks you are trying to do and potential issues.
target_type
the type of the prediction target as inferred from the
input: 'binary', if all values in true_value
are either 0 or 1 and values
in prediction
are between 0 and 1, 'discrete' if all true values are
integers.
and 'continuous' if not.
prediction_type
inferred type of the prediction. 'quantile', if there is
a column called 'quantile', else 'discrete' if all values in prediction
are integer, else 'continuous.
forecast_unit
unit of a single forecast, i.e. the grouping that uniquely
defines a single forecast. This is assumed to be all
present columns apart from the following protected columns:
c("prediction", "true_value", "sample", "quantile","range", "boundary")
.
It is important that you remove all unnecessary columns before scoring.
rows_per_forecast
a data.frame that shows how many rows (usually
quantiles or samples there are available per forecast. If a forecast model
has several entries, then there a forecasts with differing numbers of
quantiles / samples.
unique_values
A data.frame that shows how many unique values there are
present per model and column in the data. This doesn't directly show missing
values, but rather the maximum number of unique values across the whole data.
warnings
A vector with warnings. These can be ignored if you know what
you are doing.
errors
A vector with issues that will cause an error when running
score()
.
messages
A verbal explanation of the information provided above.
A data.frame or data.table with the predictions and observations.
For scoring using score()
, the following columns need to be present:
true_value
- the true observed values
prediction
- predictions or predictive samples for one
true value. (You only don't need to provide a prediction column if
you want to score quantile forecasts in a wide range format.)
For scoring integer and continuous forecasts a sample
column is needed:
sample
- an index to identify the predictive samples in the
prediction column generated by one model for one true value. Only
necessary for continuous and integer forecasts, not for
binary predictions.
For scoring predictions in a quantile-format forecast you should provide
a column called quantile
:
quantile
: quantile to which the prediction corresponds
In addition a model
column is suggested and if not present this will be
flagged and added to the input data with all forecasts assigned as an
"unspecified model").
You can check the format of your data using check_forecasts()
and there
are examples for each format (example_quantile, example_continuous,
example_integer, and example_binary).
Nikos Bosse nikosbosse@gmail.com
Function to move from sample-based to quantile format:
sample_to_quantile()
check <- check_forecasts(example_quantile)
print(check)
check_forecasts(example_binary)
Run the code above in your browser using DataLab