This is an underlying function for textstat_dist
and
textstat_simil
but returns TsparseMatrix
.
textstat_proxy(x, y = NULL, margin = c("documents", "features"),
method = c("cosine", "correlation", "jaccard", "ejaccard", "dice",
"edice", "hamman", "simple matching", "euclidean", "chisquared",
"hamming", "kullback", "manhattan", "maximum", "canberra", "minkowski"),
p = 2, min_proxy = NULL, rank = NULL, use_na = FALSE)
a dfm objects; y
is an optional target matrix matching
x
in the margin on which the similarity or distance will be computed.
if a dfm object is provided, proximity between documents or
features in x
and y
is computed.
identifies the margin of the dfm on which similarity or
difference will be computed: "documents"
for documents or
"features"
for word/term features.
character; the method identifying the similarity or distance measure to be used; see Details.
The power of the Minkowski distance.
the minimum proximity value to be recoded.
an integer value specifying top-n most proximity values to be recorded.
if TRUE
, return NA
for proximity to empty
vectors. Note that use of NA
makes the proximity matrices denser.