This method creates calibration plots from calibration data stored in a familiarCollection object. For this figures, the expected (predicted) values are plotted against the observed values. A well-calibrated model should be close to the identity line.
plot_calibration_data(
object,
draw = FALSE,
dir_path = NULL,
split_by = NULL,
color_by = NULL,
facet_by = NULL,
facet_wrap_cols = NULL,
ggtheme = NULL,
discrete_palette = NULL,
x_label = waiver(),
x_label_shared = "column",
y_label = waiver(),
y_label_shared = "row",
legend_label = waiver(),
plot_title = waiver(),
plot_sub_title = waiver(),
caption = NULL,
x_range = NULL,
x_n_breaks = 5,
x_breaks = NULL,
y_range = NULL,
y_n_breaks = 5,
y_breaks = NULL,
conf_int_style = c("ribbon", "step", "none"),
conf_int_alpha = 0.4,
show_density = TRUE,
show_calibration_fit = TRUE,
show_goodness_of_fit = TRUE,
density_plot_height = grid::unit(1, "cm"),
width = waiver(),
height = waiver(),
units = waiver(),
export_collection = FALSE,
...
)# S4 method for ANY
plot_calibration_data(
object,
draw = FALSE,
dir_path = NULL,
split_by = NULL,
color_by = NULL,
facet_by = NULL,
facet_wrap_cols = NULL,
ggtheme = NULL,
discrete_palette = NULL,
x_label = waiver(),
x_label_shared = "column",
y_label = waiver(),
y_label_shared = "row",
legend_label = waiver(),
plot_title = waiver(),
plot_sub_title = waiver(),
caption = NULL,
x_range = NULL,
x_n_breaks = 5,
x_breaks = NULL,
y_range = NULL,
y_n_breaks = 5,
y_breaks = NULL,
conf_int_style = c("ribbon", "step", "none"),
conf_int_alpha = 0.4,
show_density = TRUE,
show_calibration_fit = TRUE,
show_goodness_of_fit = TRUE,
density_plot_height = grid::unit(1, "cm"),
width = waiver(),
height = waiver(),
units = waiver(),
export_collection = FALSE,
...
)
# S4 method for familiarCollection
plot_calibration_data(
object,
draw = FALSE,
dir_path = NULL,
split_by = NULL,
color_by = NULL,
facet_by = NULL,
facet_wrap_cols = NULL,
ggtheme = NULL,
discrete_palette = NULL,
x_label = waiver(),
x_label_shared = "column",
y_label = waiver(),
y_label_shared = "row",
legend_label = waiver(),
plot_title = waiver(),
plot_sub_title = waiver(),
caption = NULL,
x_range = NULL,
x_n_breaks = 5,
x_breaks = NULL,
y_range = NULL,
y_n_breaks = 5,
y_breaks = NULL,
conf_int_style = c("ribbon", "step", "none"),
conf_int_alpha = 0.4,
show_density = TRUE,
show_calibration_fit = TRUE,
show_goodness_of_fit = TRUE,
density_plot_height = grid::unit(1, "cm"),
width = waiver(),
height = waiver(),
units = waiver(),
export_collection = FALSE,
...
)
NULL
or list of plot objects, if dir_path
is NULL
.
familiarCollection
object, or one or more familiarData
objects, that will be internally converted to a familiarCollection
object. It is also possible to provide a familiarEnsemble
or one or more
familiarModel
objects together with the data from which data is computed
prior to export. Paths to such files can also be provided.
(optional) Draws the plot if TRUE.
(optional) Path to the directory where created calibration
plots are saved to. Output is saved in the calibration
subdirectory. If
NULL
no figures are saved, but are returned instead.
(optional) Splitting variables. This refers to column names on which datasets are split. A separate figure is created for each split. See details for available variables.
(optional) Variables used to determine fill colour of plot
objects. The variables cannot overlap with those provided to the split_by
argument, but may overlap with other arguments. See details for available
variables.
(optional) Variables used to determine how and if facets of
each figure appear. In case the facet_wrap_cols
argument is NULL
, the
first variable is used to define columns, and the remaing variables are
used to define rows of facets. The variables cannot overlap with those
provided to the split_by
argument, but may overlap with other arguments.
See details for available variables.
(optional) Number of columns to generate when facet wrapping. If NULL, a facet grid is produced instead.
(optional) ggplot
theme to use for plotting.
(optional) Palette to use to color the different
data points and fit lines in case a non-singular variable was provided to
the color_by
argument.
(optional) Label to provide to the x-axis. If NULL, no label is shown.
(optional) Sharing of x-axis labels between facets. One of three values:
overall
: A single label is placed at the bottom of the figure. Tick
text (but not the ticks themselves) is removed for all but the bottom facet
plot(s).
column
: A label is placed at the bottom of each column. Tick text (but
not the ticks themselves) is removed for all but the bottom facet plot(s).
individual
: A label is placed below each facet plot. Tick text is kept.
(optional) Label to provide to the y-axis. If NULL, no label is shown.
(optional) Sharing of y-axis labels between facets. One of three values:
overall
: A single label is placed to the left of the figure. Tick text
(but not the ticks themselves) is removed for all but the left-most facet
plot(s).
row
: A label is placed to the left of each row. Tick text (but not the
ticks themselves) is removed for all but the left-most facet plot(s).
individual
: A label is placed below each facet plot. Tick text is kept.
(optional) Label to provide to the legend. If NULL, the legend will not have a name.
(optional) Label to provide as figure title. If NULL, no title is shown.
(optional) Label to provide as figure subtitle. If NULL, no subtitle is shown.
(optional) Label to provide as figure caption. If NULL, no caption is shown.
(optional) Value range for the x-axis.
(optional) Number of breaks to show on the x-axis of the
plot. x_n_breaks
is used to determine the x_breaks
argument in case it
is unset.
(optional) Break points on the x-axis of the plot.
(optional) Value range for the y-axis.
(optional) Number of breaks to show on the y-axis of the
plot. y_n_breaks
is used to determine the y_breaks
argument in case it
is unset.
(optional) Break points on the y-axis of the plot.
(optional) Confidence interval style. See details for allowed styles.
(optional) Alpha value to determine transparency of confidence intervals or, alternatively, other plot elements with which the confidence interval overlaps. Only values between 0.0 (fully transparent) and 1.0 (fully opaque) are allowed.
(optional) Show point density in top margin of the
figure. If color_by
is set, this information will not be shown.
(optional) Specifies whether the calibration in
the large and calibration slope are annotated in the plot. If color_by
is
set, this information will not be shown.
(optional) Specifies whether a the results of
goodness of fit tests are annotated in the plot. If color_by
is set, this
information will not be shown.
(optional) Height of the density plot. The height
is 1.5 cm by default. Height is expected to be grid unit (see grid::unit
),
which also allows for specifying relative heights. Will be ignored if
show_density
is FALSE
.
(optional) Width of the plot. A default value is derived from the number of facets.
(optional) Height of the plot. A default value is derived from the number of features and the number of facets.
(optional) Plot size unit. Either cm
(default), mm
or
in
.
(optional) Exports the collection if TRUE.
Arguments passed on to as_familiar_collection
, ggplot2::ggsave
, extract_calibration_data
familiar_data_names
Names of the dataset(s). Only used if the object
parameter is one or more familiarData
objects.
collection_name
Name of the collection.
device
Device to use. Can either be a device function
(e.g. png), or one of "eps", "ps", "tex" (pictex),
"pdf", "jpeg", "tiff", "png", "bmp", "svg" or "wmf" (windows only). If
NULL
(default), the device is guessed based on the filename
extension.
scale
Multiplicative scaling factor.
dpi
Plot resolution. Also accepts a string input: "retina" (320), "print" (300), or "screen" (72). Applies only to raster output types.
limitsize
When TRUE
(the default), ggsave()
will not
save images larger than 50x50 inches, to prevent the common error of
specifying dimensions in pixels.
bg
Background colour. If NULL
, uses the plot.background
fill value
from the plot theme.
create.dir
Whether to create new directories if a non-existing
directory is specified in the filename
or path
(TRUE
) or return an
error (FALSE
, default). If FALSE
and run in an interactive session,
a prompt will appear asking to create a new directory when necessary.
data
A dataObject
object, data.table
or data.frame
that
constitutes the data that are assessed.
is_pre_processed
Flag that indicates whether the data was already
pre-processed externally, e.g. normalised and clustered. Only used if the
data
argument is a data.table
or data.frame
.
cl
Cluster created using the parallel
package. This cluster is then
used to speed up computation through parallellisation.
evaluation_times
One or more time points that are used for in analysis of
survival problems when data has to be assessed at a set time, e.g.
calibration. If not provided explicitly, this parameter is read from
settings used at creation of the underlying familiarModel
objects. Only
used for survival
outcomes.
ensemble_method
Method for ensembling predictions from models for the same sample. Available methods are:
median
(default): Use the median of the predicted values as the ensemble
value for a sample.
mean
: Use the mean of the predicted values as the ensemble value for a
sample.
verbose
Flag to indicate whether feedback should be provided on the computation and extraction of various data elements.
message_indent
Number of indentation steps for messages shown during computation and extraction of various data elements.
detail_level
(optional) Sets the level at which results are computed and aggregated.
ensemble
: Results are computed at the ensemble level, i.e. over all
models in the ensemble. This means that, for example, bias-corrected
estimates of model performance are assessed by creating (at least) 20
bootstraps and computing the model performance of the ensemble model for
each bootstrap.
hybrid
(default): Results are computed at the level of models in an
ensemble. This means that, for example, bias-corrected estimates of model
performance are directly computed using the models in the ensemble. If there
are at least 20 trained models in the ensemble, performance is computed for
each model, in contrast to ensemble
where performance is computed for the
ensemble of models. If there are less than 20 trained models in the
ensemble, bootstraps are created so that at least 20 point estimates can be
made.
model
: Results are computed at the model level. This means that, for
example, bias-corrected estimates of model performance are assessed by
creating (at least) 20 bootstraps and computing the performance of the model
for each bootstrap.
Note that each level of detail has a different interpretation for bootstrap
confidence intervals. For ensemble
and model
these are the confidence
intervals for the ensemble and an individual model, respectively. That is,
the confidence interval describes the range where an estimate produced by a
respective ensemble or model trained on a repeat of the experiment may be
found with the probability of the confidence level. For hybrid
, it
represents the range where any single model trained on a repeat of the
experiment may be found with the probability of the confidence level. By
definition, confidence intervals obtained using hybrid
are at least as
wide as those for ensemble
. hybrid
offers the correct interpretation if
the goal of the analysis is to assess the result of a single, unspecified,
model.
hybrid
is generally computationally less expensive then ensemble
, which
in turn is somewhat less expensive than model
.
A non-default detail_level
parameter can be specified for separate
evaluation steps by providing a parameter value in a named list with data
elements, e.g. list("auc_data"="ensemble", "model_performance"="hybrid")
.
This parameter can be set for the following data elements: auc_data
,
decision_curve_analyis
, model_performance
, permutation_vimp
,
ice_data
, prediction_data
and confusion_matrix
.
estimation_type
(optional) Sets the type of estimation that should be possible. This has the following options:
point
: Point estimates.
bias_correction
or bc
: Bias-corrected estimates. A bias-corrected
estimate is computed from (at least) 20 point estimates, and familiar
may
bootstrap the data to create them.
bootstrap_confidence_interval
or bci
(default): Bias-corrected
estimates with bootstrap confidence intervals (Efron and Hastie, 2016). The
number of point estimates required depends on the confidence_level
parameter, and familiar
may bootstrap the data to create them.
As with detail_level
, a non-default estimation_type
parameter can be
specified for separate evaluation steps by providing a parameter value in a
named list with data elements, e.g. list("auc_data"="bci", "model_performance"="point")
. This parameter can be set for the following
data elements: auc_data
, decision_curve_analyis
, model_performance
,
permutation_vimp
, ice_data
, and prediction_data
.
aggregate_results
(optional) Flag that signifies whether results
should be aggregated during evaluation. If estimation_type
is
bias_correction
or bc
, aggregation leads to a single bias-corrected
estimate. If estimation_type
is bootstrap_confidence_interval
or bci
,
aggregation leads to a single bias-corrected estimate with lower and upper
boundaries of the confidence interval. This has no effect if
estimation_type
is point
.
The default value is equal to TRUE
except when assessing metrics to assess
model performance, as the default violin plot requires underlying data.
As with detail_level
and estimation_type
, a non-default
aggregate_results
parameter can be specified for separate evaluation steps
by providing a parameter value in a named list with data elements, e.g.
list("auc_data"=TRUE, , "model_performance"=FALSE)
. This parameter exists
for the same elements as estimation_type
.
confidence_level
(optional) Numeric value for the level at which
confidence intervals are determined. In the case bootstraps are used to
determine the confidence intervals bootstrap estimation, familiar
uses the
rule of thumb \(n = 20 / ci.level\) to determine the number of required
bootstraps.
The default value is 0.95
.
bootstrap_ci_method
(optional) Method used to determine bootstrap confidence intervals (Efron and Hastie, 2016). The following methods are implemented:
percentile
(default): Confidence intervals obtained using the percentile
method.
bc
: Bias-corrected confidence intervals.
Note that the standard method is not implemented because this method is often not suitable due to non-normal distributions. The bias-corrected and accelerated (BCa) method is not implemented yet.
This function generates a calibration plot for each model in each dataset. Any data used for calibration (e.g. baseline survival) is obtained during model creation.
Available splitting variables are: fs_method
, learner
, data_set
and
evaluation_time
(survival analysis only) and positive_class
(multinomial
endpoints only). By default, separate figures are created for each
combination of fs_method
and learner
, with facetting by data_set
.
Calibration in survival analysis is performed at set time points so that
survival probabilities can be computed from the model, and compared with
observed survival probabilities. This is done differently depending on the
underlying model. For Cox partial hazards regression models, the base
survival (of the development samples) are used, whereas accelerated failure
time models (e.g. Weibull) and survival random forests can be used to
directly predict survival probabilities at a given time point. For survival
analysis, evaluation_time
is an additional facet variable (by default).
Calibration for multinomial endpoints is performed in a one-against-all
manner. This yields calibration information for each individual class of the
endpoint. For such endpoints, positive_class
is an additional facet variable
(by default).
Calibration plots have a density plot in the margin, which shows the density
of the plotted points, ordered by the expected probability or value. For
binomial and multinomial outcomes, the density for positive and negative
classes are shown separately. Note that this information is only provided in
when color_by
is not used as a splitting variable (i.e. one calibration
plot per facet).
Calibration plots are annotated with the intercept and the slope of a linear
model fitted to the sample points. A well-calibrated model has an intercept
close to 0.0 and a slope of 1.0. Intercept and slope are shown with their
respective 95% confidence intervals. In addition, goodness-of-fit tests may
be shown. For most endpoints these are based on the Hosmer-Lemeshow (HL)
test, but for survival endpoints both the Nam-D'Agostino (ND) and the
Greenwood-Nam-D'Agostino (GND) tests are shown. Note that this information
is only annotated when color_by
is not used as a splitting variable (i.e.
one calibration plot per facet).
Available palettes for discrete_palette
are those listed by
grDevices::palette.pals()
(requires R >= 4.0.0), grDevices::hcl.pals()
(requires R >= 3.6.0) and rainbow
, heat.colors
, terrain.colors
,
topo.colors
and cm.colors
, which correspond to the palettes of the same
name in grDevices
. If not specified, a default palette based on palettes
in Tableau are used. You may also specify your own palette by using colour
names listed by grDevices::colors()
or through hexadecimal RGB strings.
Labeling methods such as set_risk_group_names
or set_data_set_names
can
be applied to the familiarCollection
object to update labels, and order
the output in the figure.
Hosmer, D. W., Hosmer, T., Le Cessie, S. & Lemeshow, S. A comparison of goodness-of-fit tests for the logistic regression model. Stat. Med. 16, 965–980 (1997).
D’Agostino, R. B. & Nam, B.-H. Evaluation of the Performance of Survival Analysis Models: Discrimination and Calibration Measures. in Handbook of Statistics vol. 23 1–25 (Elsevier, 2003).
Demler, O. V., Paynter, N. P. & Cook, N. R. Tests of calibration and goodness-of-fit in the survival setting. Stat. Med. 34, 1659–1680 (2015).