Measure to compare two or more sets w.r.t. their similarity.
phi(sets, p, na_value = NaN, ...)
Performance value as numeric(1)
.
(list()
)
List of character or integer vectors.
sets
must have at least 2 elements.
(integer(1)
)
Total number of possible elements.
(numeric(1)
)
Value that should be returned if the measure is not defined for the input
(as described in the note). Default is NaN
.
(any
)
Additional arguments. Currently ignored.
Type: "similarity"
Range: \([-1, 1]\)
Minimize: FALSE
The Phi Coefficient is defined as the Pearson correlation between the binary representation of two sets \(A\) and \(B\). The binary representation for \(A\) is a logical vector of length \(p\) with the i-th element being 1 if the corresponding element is in \(A\), and 0 otherwise.
If more than two sets are provided, the mean of all pairwise scores is calculated.
This measure is undefined if one set contains none or all possible elements.
Nogueira S, Brown G (2016). “Measuring the Stability of Feature Selection.” In Machine Learning and Knowledge Discovery in Databases, 442--457. Springer International Publishing. tools:::Rd_expr_doi("10.1007/978-3-319-46227-1_28").
Bommert A, Rahnenführer J, Lang M (2017). “A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data.” Computational and Mathematical Methods in Medicine, 2017, 1--18. tools:::Rd_expr_doi("10.1155/2017/7907163").
Bommert A, Lang M (2021). “stabm: Stability Measures for Feature Selection.” Journal of Open Source Software, 6(59), 3010. tools:::Rd_expr_doi("10.21105/joss.03010").
Package stabm which implements many more stability measures with included correction for chance.
Other Similarity Measures:
jaccard()
set.seed(1)
sets = list(
sample(letters[1:3], 1),
sample(letters[1:3], 2)
)
phi(sets, p = 3)
Run the code above in your browser using DataLab