Computes and extracts the sample distance table for samples
analysed using a familiarEnsemble object to form a familiarData object.
This table can be used to cluster samples, and is exported directly by
extract_feature_expression.
extract_sample_similarity(
object,
data,
cl = NULL,
is_pre_processed = FALSE,
sample_limit = waiver(),
sample_cluster_method = waiver(),
sample_linkage_method = waiver(),
sample_similarity_metric = waiver(),
verbose = FALSE,
message_indent = 0L,
...
)A data.table containing pairwise distance between samples. This data is only the upper triangular of the complete matrix (i.e. the sparse unitriangular representation). Diagonals will always be 0.0 and the lower triangular is mirrored.
A familiarEnsemble object, which is an ensemble of one or more
familiarModel objects.
A dataObject object, data.table or data.frame that
constitutes the data that are assessed.
Cluster created using the parallel package. This cluster is then
used to speed up computation through parallellisation.
Flag that indicates whether the data was already
pre-processed externally, e.g. normalised and clustered. Only used if the
data argument is a data.table or data.frame.
(optional) Set the upper limit of the number of samples that are used during evaluation steps. Cannot be less than 20.
This setting can be specified per data element by providing a parameter
value in a named list with data elements, e.g.
list("sample_similarity"=100, "permutation_vimp"=1000).
This parameter can be set for the following data elements:
sample_similarity and ice_data.
The method used to perform clustering based on
distance between samples. These are the same methods as for the
cluster_method configuration parameter: hclust, agnes, diana and
pam.
none cannot be used when extracting data for feature expressions.
If not provided explicitly, this parameter is read from settings used at
creation of the underlying familiarModel objects.
The method used for agglomerative clustering in
hclust and agnes. These are the same methods as for the
cluster_linkage_method configuration parameter: average, single,
complete, weighted, and ward.
If not provided explicitly, this parameter is read from settings used at
creation of the underlying familiarModel objects.
Metric to determine pairwise similarity
between samples. Similarity is computed in the same manner as for
clustering, but sample_similarity_metric has different options that are
better suited to computing distance between samples instead of between
features: gower, euclidean.
The underlying feature data is scaled to the \([0, 1]\) range (for
numerical features) using the feature values across the samples. The
normalisation parameters required can optionally be computed from feature
data with the outer 5% (on both sides) of feature values trimmed or
winsorised. To do so append _trim (trimming) or _winsor (winsorising) to
the metric name. This reduces the effect of outliers somewhat.
If not provided explicitly, this parameter is read from settings used at
creation of the underlying familiarModel objects.
Flag to indicate whether feedback should be provided on the computation and extraction of various data elements.
Number of indentation steps for messages shown during computation and extraction of various data elements.
Unused arguments.