Conditional quantiles are a very useful way of considering model performance
against observations for continuous measurements (Wilks, 2005). The
conditional quantile plot splits the data into evenly spaced bins. For each
predicted value bin e.g. from 0 to 10~ppb the corresponding values of
the observations are identified and the median, 25/75th and 10/90 percentile
(quantile) calculated for that bin. The data are plotted to show how these
values vary across all bins. For a time series of observations and
predictions that agree precisely the median value of the predictions will
equal that for the observations for each bin.
The conditional quantile plot differs from the quantile-quantile plot (Q-Q
plot) that is often used to compare observations and predictions. A Q-Q~plot
separately considers the distributions of observations and predictions,
whereas the conditional quantile uses the corresponding observations for a
particular interval in the predictions. Take as an example two time series,
the first a series of real observations and the second a lagged time series
of the same observations representing the predictions. These two time series
will have identical (or very nearly identical) distributions (e.g. same
median, minimum and maximum). A Q-Q plot would show a straight line showing
perfect agreement, whereas the conditional quantile will not. This is because
in any interval of the predictions the corresponding observations now have
different values.
Plotting the data in this way shows how well predictions agree with
observations and can help reveal many useful characteristics of how well
model predictions agree with observations --- across the full distribution of
values. A single plot can therefore convey a considerable amount of
information concerning model performance. The conditionalQuantile
function in openair allows conditional quantiles to be considered in a
flexible way e.g. by considering how they vary by season.
The function requires a data frame consisting of a column of observations and
a column of predictions. The observations are split up into bins
according to values of the predictions. The median prediction line together
with the 25/75th and 10/90th quantile values are plotted together with a line
showing a “perfect” model. Also shown is a histogram of predicted
values (shaded grey) and a histogram of observed values (shown as a blue
line).
Far more insight can be gained into model performance through conditioning
using type
. For example, type = "season"
will plot conditional
quantiles by each season. type
can also be a factor or character field
e.g. representing different models used.
See Wilks (2005) for more details and the examples below.