Learn R Programming

rearrr (version 0.3.4)

generate_clusters: Generate n-dimensional clusters

Description

lifecycle::badge("experimental")

Generates data.frame (tibble) with clustered groups.

Usage

generate_clusters(
  num_rows,
  num_cols,
  num_clusters,
  compactness = 1.6,
  generator = runif,
  name_prefix = "D",
  cluster_col_name = ".cluster"
)

Value

data.frame (tibble) with the clustered columns and the cluster grouping factor.

Arguments

num_rows

Number of rows.

num_cols

Number of columns (dimensions).

num_clusters

Number of clusters.

compactness

How compact the clusters should be. A larger value leads to more compact clusters (on average).

Technically, it is passed to the `multiplier` argument in cluster_groups() as \(`0.1 / compactness`\).

generator

Function to generate the numeric values.

Must have the number of values to generate as its first (and only required) argument, as that is the only argument we pass to it.

name_prefix

Prefix string for naming columns.

cluster_col_name

Name of cluster factor.

Author

Ludvig Renbo Olsen, r-pkgs@ludvigolsen.dk

Details

  • Generates data.frame with random values using the `generator`.

  • Divides the rows into groups (the clusters).

  • Contracts the distance from each data point to the centroid of its group.

  • Performs MinMax scaling such that the scale of the data points is similar to the generated data.

See Also

Other clustering functions: cluster_groups(), transfer_centroids()

Examples

Run this code
# Attach packages
library(rearrr)
library(dplyr)
has_ggplot <- require(ggplot2)  # Attach if installed

# Set seed
set.seed(10)

# Generate clusters
generate_clusters(num_rows = 20, num_cols = 3, num_clusters = 3, compactness = 1.6)
generate_clusters(num_rows = 20, num_cols = 5, num_clusters = 6, compactness = 2.5)

# Generate clusters and plot them
# Tip: Call this multiple times
# to see the behavior of `generate_clusters()`
if (has_ggplot){
  generate_clusters(
    num_rows = 50, num_cols = 2,
    num_clusters = 5, compactness = 1.6
  ) %>%
    ggplot(
      aes(x = D1, y = D2, color = .cluster)
    ) +
    geom_point() +
    theme_minimal() +
    labs(x = "D1", y = "D2", color = "Cluster")
}

#
# Plot clusters in 3d view
#

# Generate clusters
clusters <- generate_clusters(
  num_rows = 50, num_cols = 3,
  num_clusters = 5, compactness = 1.6
)

if (FALSE) {
# Plot 3d with plotly
plotly::plot_ly(
  x = clusters$D1,
  y = clusters$D2,
  z = clusters$D3,
  type = "scatter3d",
  mode = "markers",
  color = clusters$.cluster
)
}

Run the code above in your browser using DataLab