Compute Gower's distance, pairwise between records in two data sets x
and y
. Records from the smallest data set are recycled over.
gower_dist(
x,
y,
pair_x = NULL,
pair_y = NULL,
eps = 1e-08,
weights = NULL,
ignore_case = FALSE,
nthread = getOption("gd_num_thread")
)
A numeric
vector of length max(nrow(x),nrow(y))
.
When there are no columns to compare, a message is printed and both
numeric(0)
is returned invisibly.
[data.frame]
[data.frame]
[numeric|character] (optional)
Columns in x
used for comparison.
See Details below.
[numeric|character] (optional)
Columns in y
used for comparison.
See Details below.
[numeric] (optional)
Computed numbers (variable ranges)
smaller than eps
are treated as zero.
[numeric] (optional)
A vector of weights of length ncol(x)
that defines the weight applied to each component of the gower distance.
[logical]
Toggle ignore case when neither pair_x
nor pair_y
are user-defined.
Number of threads to use for parallelization. By default,
for a dual-core machine, 2 threads are used. For any other machine
n-1 cores are used so your machine doesn't freeze during a big computation.
The maximum nr of threads are determined using omp_get_max_threads
at C level.
There are three ways to specify which columns of x
should be compared
with what columns of y
. The first option is do give no specification.
In that case columns with matching names will be used. The second option
is to use only the pairs_y
argument, specifying for each column in x
in order, which column in y
must be used to pair it with (use 0
to skip a column in x
). The third option is to explicitly specify the
columns to be matched using pair_x
and pair_y
.
Gower, John C. "A general coefficient of similarity and some of its properties." Biometrics (1971): 857-871.
gower_topn