This is an underlying function for textstat_dist
and
textstat_simil
but returns TsparseMatrix
.
textstat_proxy(
x,
y = NULL,
margin = c("documents", "features"),
method = c("cosine", "correlation", "jaccard", "ejaccard", "dice", "edice", "hamman",
"simple matching", "euclidean", "chisquared", "hamming", "kullback", "manhattan",
"maximum", "canberra", "minkowski"),
p = 2,
min_proxy = NULL,
rank = NULL,
use_na = FALSE
)
a dfm objects; y
is an optional target matrix matching
x
in the margin on which the similarity or distance will be computed.
if a dfm object is provided, proximity between documents or
features in x
and y
is computed.
identifies the margin of the dfm on which similarity or
difference will be computed: "documents"
for documents or
"features"
for word/term features.
character; the method identifying the similarity or distance measure to be used; see Details.
The power of the Minkowski distance.
the minimum proximity value to be recoded.
an integer value specifying top-n most proximity values to be recorded.
if TRUE
, return NA
for proximity to empty
vectors. Note that use of NA
makes the proximity matrices denser.