This method creates a heatmap based on data stored in a
familiarCollection
object. Features in the heatmap are ordered so that
more similar features appear together.
plot_feature_similarity(
object,
feature_cluster_method = waiver(),
feature_linkage_method = waiver(),
feature_cluster_cut_method = waiver(),
feature_similarity_threshold = waiver(),
draw = FALSE,
dir_path = NULL,
split_by = NULL,
facet_by = NULL,
facet_wrap_cols = NULL,
ggtheme = NULL,
gradient_palette = NULL,
gradient_palette_range = NULL,
x_label = waiver(),
x_label_shared = "column",
y_label = waiver(),
y_label_shared = "row",
legend_label = waiver(),
plot_title = waiver(),
plot_sub_title = waiver(),
caption = NULL,
y_range = NULL,
y_n_breaks = 3,
y_breaks = NULL,
rotate_x_tick_labels = waiver(),
show_dendrogram = c("top", "right"),
dendrogram_height = grid::unit(1.5, "cm"),
width = waiver(),
height = waiver(),
units = waiver(),
export_collection = FALSE,
...
)# S4 method for ANY
plot_feature_similarity(
object,
feature_cluster_method = waiver(),
feature_linkage_method = waiver(),
feature_cluster_cut_method = waiver(),
feature_similarity_threshold = waiver(),
draw = FALSE,
dir_path = NULL,
split_by = NULL,
facet_by = NULL,
facet_wrap_cols = NULL,
ggtheme = NULL,
gradient_palette = NULL,
gradient_palette_range = NULL,
x_label = waiver(),
x_label_shared = "column",
y_label = waiver(),
y_label_shared = "row",
legend_label = waiver(),
plot_title = waiver(),
plot_sub_title = waiver(),
caption = NULL,
y_range = NULL,
y_n_breaks = 3,
y_breaks = NULL,
rotate_x_tick_labels = waiver(),
show_dendrogram = c("top", "right"),
dendrogram_height = grid::unit(1.5, "cm"),
width = waiver(),
height = waiver(),
units = waiver(),
export_collection = FALSE,
...
)
# S4 method for familiarCollection
plot_feature_similarity(
object,
feature_cluster_method = waiver(),
feature_linkage_method = waiver(),
feature_cluster_cut_method = waiver(),
feature_similarity_threshold = waiver(),
draw = FALSE,
dir_path = NULL,
split_by = NULL,
facet_by = NULL,
facet_wrap_cols = NULL,
ggtheme = NULL,
gradient_palette = NULL,
gradient_palette_range = NULL,
x_label = waiver(),
x_label_shared = "column",
y_label = waiver(),
y_label_shared = "row",
legend_label = waiver(),
plot_title = waiver(),
plot_sub_title = waiver(),
caption = NULL,
y_range = NULL,
y_n_breaks = 3,
y_breaks = NULL,
rotate_x_tick_labels = waiver(),
show_dendrogram = c("top", "right"),
dendrogram_height = grid::unit(1.5, "cm"),
width = waiver(),
height = waiver(),
units = waiver(),
export_collection = FALSE,
...
)
NULL
or list of plot objects, if dir_path
is NULL
.
A familiarCollection
object, or other other objects from which
a familiarCollection
can be extracted. See details for more information.
The method used to perform clustering. These are
the same methods as for the cluster_method
configuration parameter:
none
, hclust
, agnes
, diana
and pam
.
none
cannot be used when extracting data regarding mutual correlation or
feature expressions.
If not provided explicitly, this parameter is read from settings used at
creation of the underlying familiarModel
objects.
The method used for agglomerative clustering in
hclust
and agnes
. These are the same methods as for the
cluster_linkage_method
configuration parameter: average
, single
,
complete
, weighted
, and ward
.
If not provided explicitly, this parameter is read from settings used at
creation of the underlying familiarModel
objects.
The method used to divide features into
separate clusters. The available methods are the same as for the
cluster_cut_method
configuration parameter: silhouette
, fixed_cut
and
dynamic_cut
.
silhouette
is available for all cluster methods, but fixed_cut
only
applies to methods that create hierarchical trees (hclust
, agnes
and
diana
). dynamic_cut
requires the dynamicTreeCut
package and can only
be used with agnes
and hclust
.
If not provided explicitly, this parameter is read from settings used at
creation of the underlying familiarModel
objects.
The threshold level for pair-wise
similarity that is required to form feature clusters with the fixed_cut
method.
If not provided explicitly, this parameter is read from settings used at
creation of the underlying familiarModel
objects.
(optional) Draws the plot if TRUE.
(optional) Path to the directory where created performance
plots are saved to. Output is saved in the feature_similarity
subdirectory. If NULL
no figures are saved, but are returned instead.
(optional) Splitting variables. This refers to column names on which datasets are split. A separate figure is created for each split. See details for available variables.
(optional) Variables used to determine how and if facets of
each figure appear. In case the facet_wrap_cols
argument is NULL
, the
first variable is used to define columns, and the remaing variables are
used to define rows of facets. The variables cannot overlap with those
provided to the split_by
argument, but may overlap with other arguments.
See details for available variables.
(optional) Number of columns to generate when facet wrapping. If NULL, a facet grid is produced instead.
(optional) ggplot
theme to use for plotting.
(optional) Sequential or divergent palette used to colour the similarity or distance between features in a heatmap.
(optional) Numerical range used to span the
gradient. This should be a range of two values, e.g. c(0, 1)
. Lower or
upper boundary can be unset by using NA
. If not set, the full
metric-specific range is used.
(optional) Label to provide to the x-axis. If NULL, no label is shown.
(optional) Sharing of x-axis labels between facets. One of three values:
overall
: A single label is placed at the bottom of the figure. Tick
text (but not the ticks themselves) is removed for all but the bottom facet
plot(s).
column
: A label is placed at the bottom of each column. Tick text (but
not the ticks themselves) is removed for all but the bottom facet plot(s).
individual
: A label is placed below each facet plot. Tick text is kept.
(optional) Label to provide to the y-axis. If NULL, no label is shown.
(optional) Sharing of y-axis labels between facets. One of three values:
overall
: A single label is placed to the left of the figure. Tick text
(but not the ticks themselves) is removed for all but the left-most facet
plot(s).
row
: A label is placed to the left of each row. Tick text (but not the
ticks themselves) is removed for all but the left-most facet plot(s).
individual
: A label is placed below each facet plot. Tick text is kept.
(optional) Label to provide to the legend. If NULL, the legend will not have a name.
(optional) Label to provide as figure title. If NULL, no title is shown.
(optional) Label to provide as figure subtitle. If NULL, no subtitle is shown.
(optional) Label to provide as figure caption. If NULL, no caption is shown.
(optional) Value range for the y-axis.
(optional) Number of breaks to show on the y-axis of the
plot. y_n_breaks
is used to determine the y_breaks
argument in case it
is unset.
(optional) Break points on the y-axis of the plot.
(optional) Rotate tick labels on the x-axis by
90 degrees. Defaults to TRUE
. Rotation of x-axis tick labels may also be
controlled through the ggtheme
. In this case, FALSE
should be provided
explicitly.
(optional) Show dendrogram around the main panel.
Can be TRUE
, FALSE
, NULL
, or a position, i.e. top
, bottom
, left
and right
. Up to two positions may be provided, but only as long as the
dendrograms are not on opposite sides of the heatmap: top
and bottom
,
and left
and right
cannot be used together.
A dendrogram can only be drawn from cluster methods that produce
dendrograms, such as hclust
. A dendrogram can for example not be
constructed using the partitioning around medioids method (pam
).
By default, a dendrogram is drawn to the top and right of the panel.
(optional) Height of the dendrogram. The height is
1.5 cm by default. Height is expected to be grid unit (see grid::unit
),
which also allows for specifying relative heights.
(optional) Width of the plot. A default value is derived from the number of facets.
(optional) Height of the plot. A default value is derived from the number of features and the number of facets.
(optional) Plot size unit. Either cm
(default), mm
or
in
.
(optional) Exports the collection if TRUE.
Arguments passed on to as_familiar_collection
, ggplot2::ggsave
, extract_feature_similarity
familiar_data_names
Names of the dataset(s). Only used if the object
parameter is one or more familiarData
objects.
collection_name
Name of the collection.
device
Device to use. Can either be a device function
(e.g. png), or one of "eps", "ps", "tex" (pictex),
"pdf", "jpeg", "tiff", "png", "bmp", "svg" or "wmf" (windows only). If
NULL
(default), the device is guessed based on the filename
extension.
scale
Multiplicative scaling factor.
dpi
Plot resolution. Also accepts a string input: "retina" (320), "print" (300), or "screen" (72). Applies only to raster output types.
limitsize
When TRUE
(the default), ggsave()
will not
save images larger than 50x50 inches, to prevent the common error of
specifying dimensions in pixels.
bg
Background colour. If NULL
, uses the plot.background
fill value
from the plot theme.
create.dir
Whether to create new directories if a non-existing
directory is specified in the filename
or path
(TRUE
) or return an
error (FALSE
, default). If FALSE
and run in an interactive session,
a prompt will appear asking to create a new directory when necessary.
data
A dataObject
object, data.table
or data.frame
that
constitutes the data that are assessed.
is_pre_processed
Flag that indicates whether the data was already
pre-processed externally, e.g. normalised and clustered. Only used if the
data
argument is a data.table
or data.frame
.
cl
Cluster created using the parallel
package. This cluster is then
used to speed up computation through parallellisation.
feature_similarity_metric
Metric to determine pairwise similarity
between features. Similarity is computed in the same manner as for
clustering, and feature_similarity_metric
therefore has the same options
as cluster_similarity_metric
: mcfadden_r2
, cox_snell_r2
,
nagelkerke_r2
, spearman
, kendall
and pearson
.
If not provided explicitly, this parameter is read from settings used at
creation of the underlying familiarModel
objects.
verbose
Flag to indicate whether feedback should be provided on the computation and extraction of various data elements.
message_indent
Number of indentation steps for messages shown during computation and extraction of various data elements.
estimation_type
(optional) Sets the type of estimation that should be possible. This has the following options:
point
: Point estimates.
bias_correction
or bc
: Bias-corrected estimates. A bias-corrected
estimate is computed from (at least) 20 point estimates, and familiar
may
bootstrap the data to create them.
bootstrap_confidence_interval
or bci
(default): Bias-corrected
estimates with bootstrap confidence intervals (Efron and Hastie, 2016). The
number of point estimates required depends on the confidence_level
parameter, and familiar
may bootstrap the data to create them.
As with detail_level
, a non-default estimation_type
parameter can be
specified for separate evaluation steps by providing a parameter value in a
named list with data elements, e.g. list("auc_data"="bci", "model_performance"="point")
. This parameter can be set for the following
data elements: auc_data
, decision_curve_analyis
, model_performance
,
permutation_vimp
, ice_data
, and prediction_data
.
aggregate_results
(optional) Flag that signifies whether results
should be aggregated during evaluation. If estimation_type
is
bias_correction
or bc
, aggregation leads to a single bias-corrected
estimate. If estimation_type
is bootstrap_confidence_interval
or bci
,
aggregation leads to a single bias-corrected estimate with lower and upper
boundaries of the confidence interval. This has no effect if
estimation_type
is point
.
The default value is equal to TRUE
except when assessing metrics to assess
model performance, as the default violin plot requires underlying data.
As with detail_level
and estimation_type
, a non-default
aggregate_results
parameter can be specified for separate evaluation steps
by providing a parameter value in a named list with data elements, e.g.
list("auc_data"=TRUE, , "model_performance"=FALSE)
. This parameter exists
for the same elements as estimation_type
.
confidence_level
(optional) Numeric value for the level at which
confidence intervals are determined. In the case bootstraps are used to
determine the confidence intervals bootstrap estimation, familiar
uses the
rule of thumb \(n = 20 / ci.level\) to determine the number of required
bootstraps.
The default value is 0.95
.
bootstrap_ci_method
(optional) Method used to determine bootstrap confidence intervals (Efron and Hastie, 2016). The following methods are implemented:
percentile
(default): Confidence intervals obtained using the percentile
method.
bc
: Bias-corrected confidence intervals.
Note that the standard method is not implemented because this method is often not suitable due to non-normal distributions. The bias-corrected and accelerated (BCa) method is not implemented yet.
This function generates area under the ROC curve plots.
Available splitting variables are: fs_method
, learner
, and data_set
.
By default, the data is split by fs_method
and learner
, with facetting
by data_set
.
Note that similarity is determined based on the underlying data. Hence the ordering of features may differ between facets, and tick labels are maintained for each panel.
Available palettes for gradient_palette
are those listed by
grDevices::palette.pals()
(requires R >= 4.0.0), grDevices::hcl.pals()
(requires R >= 3.6.0) and rainbow
, heat.colors
, terrain.colors
,
topo.colors
and cm.colors
, which correspond to the palettes of the same
name in grDevices
. If not specified, a default palette based on palettes
in Tableau are used. You may also specify your own palette by using colour
names listed by grDevices::colors()
or through hexadecimal RGB strings.
Labeling methods such as set_fs_method_names
or set_data_set_names
can
be applied to the familiarCollection
object to update labels, and order
the output in the figure.