FixedGroupsPipeline: Chain multiple transformations with different argument values per group

Description

lifecycle::badge("experimental")

Build a pipeline of transformations to be applied sequentially.

Specify different argument values for each group in a fixed set of groups. E.g. if your data.frame contains 5 groups, you provide 5 argument values for each of the non-constant arguments (see `var_args`).

The number of expected groups is specified during initialization and the input `data` must be grouped such that it contains that exact number of groups.

Transformations are applied to groups separately, why the given transformation function only receives the subset of `data` belonging to the current group.

Standard workflow: Instantiate pipeline -> Add transformations -> Apply to data

To apply the same arguments to all groups, see Pipeline.

To apply generated argument values to an arbitrary number of groups, see GeneratedPipeline.

Arguments

Author

Ludvig Renbo Olsen, r-pkgs@ludvigolsen.dk

Super class

rearrr::Pipeline -> FixedGroupsPipeline

Public fields

transformations: list of transformations to apply.
names: Names of the transformations.
num_groups: Number of groups the pipeline will be applied to.

Methods

Public methods

Method `new()`

Initialize the pipeline with the number of groups the pipeline will be applied to.

Usage

FixedGroupsPipeline$new(num_groups)

Arguments

num_groups: Number of groups the pipeline will be applied to.

Method `add_transformation()`

Add a transformation to the pipeline.

Usage

FixedGroupsPipeline$add_transformation(fn, args, var_args, name)

Arguments

fn: Function that performs the transformation.

args

Named list with arguments for the `fn` function.

var_args

Named list of arguments with list of differing values for each group.

E.g. list("a" = list(1, 2, 3), "b" = list("a", "b", "c")) given 3 groups.

By adding ".apply" with a list of TRUE/FALSE flags, the transformation can be disabled for a specific group.

E.g. list(".apply" = list(TRUE, FALSE, TRUE), ....

name

Name of the transformation step. Must be unique.

Returns

The pipeline. To allow chaining of methods.

Method `apply()`

Apply the pipeline to a data.frame.

Usage

FixedGroupsPipeline$apply(data, verbose = FALSE)

Arguments

data

data.frame with the same number of groups as pre-registered in the pipeline.

You can find the number of groups in `data` with `dplyr::n_groups(data)`. The number of groups expected by the pipeline can be accessed with `pipe$num_groups`.

verbose

Whether to print the progress.

Returns

Transformed version of `data`.

Method `print()`

Print an overview of the pipeline.

Usage

FixedGroupsPipeline$print(...)

Arguments

...: further arguments passed to or from other methods.

Returns

The pipeline. To allow chaining of methods.

Method `clone()`

The objects of this class are cloneable with this method.

Usage

FixedGroupsPipeline$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

Run this code

# Attach package
library(rearrr)
library(dplyr)

# Create a data frame
# We group it by G so we have 3 groups
df <- data.frame(
  "Index" = 1:12,
  "A" = c(1:4, 9:12, 15:18),
  "G" = rep(1:3, each = 4)
) %>%
  dplyr::group_by(G)

# Create new pipeline
pipe <- FixedGroupsPipeline$new(num_groups = 3)

# Add 2D rotation transformation
pipe$add_transformation(
  fn = rotate_2d,
  args = list(
    x_col = "Index",
    y_col = "A",
    suffix = "",
    overwrite = TRUE
  ),
  var_args = list(
    degrees = list(45, 90, 180),
    origin = list(c(0, 0), c(1, 2), c(-1, 0))
  ),
  name = "Rotate"
)

# Add the `cluster_group` transformation
# As the function is fed an ungrouped subset of `data`,
# i.e. the rows of that group, we need to specify `group_cols` in `args`
# That is specific to `cluster_groups()` though
# Also note `.apply` in `var_args` which tells the pipeline *not*
# to apply this transformation to the second group
pipe$add_transformation(
  fn = cluster_groups,
  args = list(
    cols = c("Index", "A"),
    suffix = "",
    overwrite = TRUE,
    group_cols = "G"
  ),
  var_args = list(
    multiplier = list(0.5, 1, 5),
    .apply = list(TRUE, FALSE, TRUE)
  ),
  name = "Cluster"
)

# Check pipeline object
pipe

# Apply pipeline to already grouped data.frame
# Enable `verbose` to print progress
pipe$apply(df, verbose = TRUE)

Run the code above in your browser using DataLab

Description

Arguments

Author

Super class

Public fields

Methods

Public methods

Method new()

Usage

Arguments

Method add_transformation()

Usage

Arguments

Returns

Method apply()

Usage

Arguments

Returns

Method print()

Usage

Arguments

Returns

Method clone()

Usage

Arguments

See Also

Examples

Method `new()`

Method `add_transformation()`

Method `apply()`

Method `print()`

Method `clone()`