Compute Gower's distance, pairwise between records in two data sets x
and y. Records from the smallest data set are recycled over.
gower_dist(
x,
y,
pair_x = NULL,
pair_y = NULL,
eps = 1e-08,
weights = NULL,
ignore_case = FALSE,
nthread = getOption("gd_num_thread")
)A numeric vector of length max(nrow(x),nrow(y)).
When there are no columns to compare, a message is printed and both
numeric(0) is returned invisibly.
[data.frame]
[data.frame]
[numeric|character] (optional) Columns in x used for comparison.
See Details below.
[numeric|character] (optional) Columns in y used for comparison.
See Details below.
[numeric] (optional) Computed numbers (variable ranges)
smaller than eps are treated as zero.
[numeric] (optional) A vector of weights of length ncol(x)
that defines the weight applied to each component of the gower distance.
[logical] Toggle ignore case when neither pair_x
nor pair_y are user-defined.
Number of threads to use for parallelization. By default,
for a dual-core machine, 2 threads are used. For any other machine
n-1 cores are used so your machine doesn't freeze during a big computation.
The maximum nr of threads are determined using omp_get_max_threads at C level.
There are three ways to specify which columns of x should be compared
with what columns of y. The first option is do give no specification.
In that case columns with matching names will be used. The second option
is to use only the pairs_y argument, specifying for each column in x
in order, which column in y must be used to pair it with (use 0
to skip a column in x). The third option is to explicitly specify the
columns to be matched using pair_x and pair_y.
Gower, John C. "A general coefficient of similarity and some of its properties." Biometrics (1971): 857-871.
gower_topn