Learn R Programming

rearrr (version 0.3.4)

expand_distances: Expand the distances to an origin

Description

lifecycle::badge("experimental")

Moves the data points in n-dimensional space such that their distance to a specified origin is increased/decreased. A `multiplier` greater than 1 leads to expansion, while a positive `multiplier` lower than 1 leads to contraction.

The origin can be supplied as coordinates or as a function that returns coordinates. The latter can be useful when supplying a grouped data.frame and expanding around e.g. the centroid of each group.

The multiplier/exponent can be supplied as a constant or as a function that returns a constant. The latter can be useful when supplying a grouped data.frame and the multiplier/exponent depends on the data in the groups.

For expansion in each dimension separately, use expand_distances_each().

NOTE: When exponentiating, the default is to first add 1 to the distances, to ensure expansion even when the distance is between 0 and 1. If you need the purely exponentiated distances, disable `add_one_exp`.

Usage

expand_distances(
  data,
  cols = NULL,
  multiplier = NULL,
  multiplier_fn = NULL,
  origin = NULL,
  origin_fn = NULL,
  exponentiate = FALSE,
  add_one_exp = TRUE,
  suffix = "_expanded",
  keep_original = TRUE,
  mult_col_name = ifelse(isTRUE(exponentiate), ".exponent", ".multiplier"),
  origin_col_name = ".origin",
  overwrite = FALSE
)

Value

data.frame (tibble) with the expanded columns, along with the applied multiplier/exponent and origin coordinates.

Arguments

data

data.frame or vector.

cols

Names of columns in `data` to expand coordinates of. Each column is considered a dimension.

multiplier

Constant to multiply/exponentiate the distances to the origin by.

N.B. When `exponentiate` is TRUE, the `multiplier` becomes an exponent.

multiplier_fn

Function for finding the `multiplier`.

Input: Each column will be passed as a vector in the order of `cols`.

Output: A numeric scalar.

origin

Coordinates of the origin to expand around. A scalar to use in all dimensions or a vector with one scalar per dimension.

N.B. Ignored when `origin_fn` is not NULL.

origin_fn

Function for finding the origin coordinates.

Input: Each column will be passed as a vector in the order of `cols`.

Output: A vector with one scalar per dimension.

Can be created with create_origin_fn() if you want to apply the same function to each dimension.

E.g. `create_origin_fn(median)` would find the median of each column.

Built-in functions are centroid(), most_centered(), and midrange()

exponentiate

Whether to exponentiate instead of multiplying. (Logical)

add_one_exp

Whether to add 1 to the distances before exponentiating to ensure they don't contract when between 0 and 1. The added value is subtracted after the exponentiation. (Logical)

The distances to the origin (`d`) are exponentiated as such:

d <- d + 1

d <- d ^ multiplier

d <- d - 1

N.B. Ignored when `exponentiate` is FALSE.

suffix

Suffix to add to the names of the generated columns.

Use an empty string (i.e. "") to overwrite the original columns.

keep_original

Whether to keep the original columns. (Logical)

Some columns may have been overwritten, in which case only the newest versions are returned.

mult_col_name

Name of new column with the `multiplier`. If NULL, no column is added.

origin_col_name

Name of new column with the origin coordinates. If NULL, no column is added.

overwrite

Whether to allow overwriting of existing columns. (Logical)

Author

Ludvig Renbo Olsen, r-pkgs@ludvigolsen.dk

Details

Increases the distance to the origin in n-dimensional space by multiplying or exponentiating it by the multiplier.

We first move the origin to the zero-coordinates (e.g. c(0, 0, 0)) and normalize each vector to unit length. We then multiply this unit vector by the multiplied/exponentiated distance and moves the origin back to its original coordinates.

The distance to the specified origin is calculated with: $$d(P1, P2) = sqrt( (x2 - x1)^2 + (y2 - y1)^2 + (z2 - z1)^2 + ... )$$

Note: By default (when `add_one_exp` is TRUE), we add 1 to the distance before the exponentiation and subtract it afterwards. See `add_one_exp`.

See Also

Other mutate functions: apply_transformation_matrix(), cluster_groups(), dim_values(), expand_distances_each(), flip_values(), roll_values(), rotate_2d(), rotate_3d(), shear_2d(), shear_3d(), swirl_2d(), swirl_3d()

Other expander functions: expand_distances_each()

Other distance functions: closest_to(), dim_values(), distance(), expand_distances_each(), furthest_from(), swirl_2d(), swirl_3d()

Examples

Run this code
# Attach packages
library(rearrr)
library(dplyr)
library(purrr)
has_ggplot <- require(ggplot2)  # Attach if installed

# Set seed
set.seed(1)

# Create a data frame
df <- data.frame(
  "x" = runif(20),
  "y" = runif(20),
  "g" = rep(1:4, each = 5)
)

# Expand distances in the two dimensions (x and y)
# With the origin at x=0.5, y=0.5
# We multiply the distances by 2
expand_distances(
  data = df,
  cols = c("x", "y"),
  multiplier = 2,
  origin = c(0.5, 0.5)
)

# Expand distances in the two dimensions (x and y)
# With the origin at x=0.5, y=0.5
# We exponentiate the distances by 2
expand_distances(
  data = df,
  cols = c("x", "y"),
  multiplier = 2,
  exponentiate = TRUE,
  origin = 0.5
)

# Expand values in one dimension (x)
# With the origin at x=0.5
# We exponentiate the distances by 3
expand_distances(
  data = df,
  cols = c("x"),
  multiplier = 3,
  exponentiate = TRUE,
  origin = 0.5
)

# Expand x and y around the centroid
# We use exponentiation for a more drastic effect
# The add_one_exp makes sure it expands
# even when x or y is in the range [0, <1]
# To compare multiple exponents, we wrap the
# call in purrr::map_dfr
df_expanded <- purrr::map_dfr(
  .x = c(1, 3, 5),
  .f = function(exponent) {
    expand_distances(
      data = df,
      cols = c("x", "y"),
      multiplier = exponent,
      origin_fn = centroid,
      exponentiate = TRUE,
      add_one_exp = TRUE
    )
  }
)
df_expanded

# Plot the expansions of x and y around the overall centroid
if (has_ggplot){
  ggplot(df_expanded, aes(x = x_expanded, y = y_expanded, color = factor(.exponent))) +
    geom_vline(
      xintercept = df_expanded[[".origin"]][[1]][[1]],
      size = 0.2, alpha = .4, linetype = "dashed"
    ) +
    geom_hline(
      yintercept = df_expanded[[".origin"]][[1]][[2]],
      size = 0.2, alpha = .4, linetype = "dashed"
    ) +
    geom_path(size = 0.2) +
    geom_point() +
    theme_minimal() +
    labs(x = "x", y = "y", color = "Exponent")
}

# Expand x and y around the centroid using multiplication
# To compare multiple multipliers, we wrap the
# call in purrr::map_dfr
df_expanded <- purrr::map_dfr(
  .x = c(1, 3, 5),
  .f = function(multiplier) {
    expand_distances(df,
      cols = c("x", "y"),
      multiplier = multiplier,
      origin_fn = centroid,
      exponentiate = FALSE
    )
  }
)
df_expanded

# Plot the expansions of x and y around the overall centroid
if (has_ggplot){
  ggplot(df_expanded, aes(x = x_expanded, y = y_expanded, color = factor(.multiplier))) +
    geom_vline(
      xintercept = df_expanded[[".origin"]][[1]][[1]],
      size = 0.2, alpha = .4, linetype = "dashed"
    ) +
    geom_hline(
      yintercept = df_expanded[[".origin"]][[1]][[2]],
      size = 0.2, alpha = .4, linetype = "dashed"
    ) +
    geom_path(size = 0.2, alpha = .8) +
    geom_point() +
    theme_minimal() +
    labs(x = "x", y = "y", color = "Multiplier")
}

#
# Contraction
#

# Group-wise contraction to create clusters
df_contracted <- df %>%
  dplyr::group_by(g) %>%
  expand_distances(
    cols = c("x", "y"),
    multiplier = 0.07,
    suffix = "_contracted",
    origin_fn = centroid
  )

# Plot the clustered data point on top of the original data points
if (has_ggplot){
  ggplot(df_contracted, aes(x = x_contracted, y = y_contracted, color = factor(g))) +
    geom_point(aes(x = x, y = y, color = factor(g)), alpha = 0.3, shape = 16) +
    geom_point() +
    theme_minimal() +
    labs(x = "x", y = "y", color = "g")
}

Run the code above in your browser using DataLab