Computes and extracts the feature distance table for features
used in a familiarEnsemble object. This table can be used to cluster
features, and is exported directly by export_feature_similarity.
extract_feature_similarity(
  object,
  data,
  cl = NULL,
  estimation_type = waiver(),
  aggregate_results = waiver(),
  confidence_level = waiver(),
  bootstrap_ci_method = waiver(),
  is_pre_processed = FALSE,
  feature_cluster_method = waiver(),
  feature_linkage_method = waiver(),
  feature_cluster_cut_method = waiver(),
  feature_similarity_threshold = waiver(),
  feature_similarity_metric = waiver(),
  verbose = FALSE,
  message_indent = 0L,
  ...
)A data.table containing pairwise distance between features. This data is only the upper triangular of the complete matrix (i.e. the sparse unitriangular representation). Diagonals will always be 0.0 and the lower triangular is mirrored.
A familiarEnsemble object, which is an ensemble of one or more
familiarModel objects.
A dataObject object, data.table or data.frame that
constitutes the data that are assessed.
Cluster created using the parallel package. This cluster is then
used to speed up computation through parallellisation.
(optional) Sets the type of estimation that should be possible. This has the following options:
point: Point estimates.
bias_correction or bc: Bias-corrected estimates. A bias-corrected
estimate is computed from (at least) 20 point estimates, and familiar may
bootstrap the data to create them.
bootstrap_confidence_interval or bci (default): Bias-corrected
estimates with bootstrap confidence intervals (Efron and Hastie, 2016). The
number of point estimates required depends on the confidence_level
parameter, and familiar may bootstrap the data to create them.
As with detail_level, a non-default estimation_type parameter can be
specified for separate evaluation steps by providing a parameter value in a
named list with data elements, e.g. list("auc_data"="bci", "model_performance"="point"). This parameter can be set for the following
data elements: auc_data, decision_curve_analyis, model_performance,
permutation_vimp, ice_data, and prediction_data.
(optional) Flag that signifies whether results
should be aggregated during evaluation. If estimation_type is
bias_correction or bc, aggregation leads to a single bias-corrected
estimate. If estimation_type is bootstrap_confidence_interval or bci,
aggregation leads to a single bias-corrected estimate with lower and upper
boundaries of the confidence interval. This has no effect if
estimation_type is point.
The default value is equal to TRUE except when assessing metrics to assess
model performance, as the default violin plot requires underlying data.
As with detail_level and estimation_type, a non-default
aggregate_results parameter can be specified for separate evaluation steps
by providing a parameter value in a named list with data elements, e.g.
list("auc_data"=TRUE, , "model_performance"=FALSE). This parameter exists
for the same elements as estimation_type.
(optional) Numeric value for the level at which
confidence intervals are determined. In the case bootstraps are used to
determine the confidence intervals bootstrap estimation, familiar uses the
rule of thumb \(n = 20 / ci.level\) to determine the number of required
bootstraps.
The default value is 0.95.
(optional) Method used to determine bootstrap confidence intervals (Efron and Hastie, 2016). The following methods are implemented:
percentile (default): Confidence intervals obtained using the percentile
method.
bc: Bias-corrected confidence intervals.
Note that the standard method is not implemented because this method is often not suitable due to non-normal distributions. The bias-corrected and accelerated (BCa) method is not implemented yet.
Flag that indicates whether the data was already
pre-processed externally, e.g. normalised and clustered. Only used if the
data argument is a data.table or data.frame.
The method used to perform clustering. These are
the same methods as for the cluster_method configuration parameter:
none, hclust, agnes, diana and pam.
none cannot be used when extracting data regarding mutual correlation or
feature expressions.
If not provided explicitly, this parameter is read from settings used at
creation of the underlying familiarModel objects.
The method used for agglomerative clustering in
hclust and agnes. These are the same methods as for the
cluster_linkage_method configuration parameter: average, single,
complete, weighted, and ward.
If not provided explicitly, this parameter is read from settings used at
creation of the underlying familiarModel objects.
The method used to divide features into
separate clusters. The available methods are the same as for the
cluster_cut_method configuration parameter: silhouette, fixed_cut and
dynamic_cut.
silhouette is available for all cluster methods, but fixed_cut only
applies to methods that create hierarchical trees (hclust, agnes and
diana). dynamic_cut requires the dynamicTreeCut package and can only
be used with agnes and hclust.
If not provided explicitly, this parameter is read from settings used at
creation of the underlying familiarModel objects.
The threshold level for pair-wise
similarity that is required to form feature clusters with the fixed_cut
method.
If not provided explicitly, this parameter is read from settings used at
creation of the underlying familiarModel objects.
Metric to determine pairwise similarity
between features. Similarity is computed in the same manner as for
clustering, and feature_similarity_metric therefore has the same options
as cluster_similarity_metric: mcfadden_r2, cox_snell_r2,
nagelkerke_r2, spearman, kendall and pearson.
If not provided explicitly, this parameter is read from settings used at
creation of the underlying familiarModel objects.
Flag to indicate whether feedback should be provided on the computation and extraction of various data elements.
Number of indentation steps for messages shown during computation and extraction of various data elements.
Unused arguments.