Learn R Programming

rearrr (version 0.3.4)

FixedGroupsPipeline: Chain multiple transformations with different argument values per group

Description

lifecycle::badge("experimental")

Build a pipeline of transformations to be applied sequentially.

Specify different argument values for each group in a fixed set of groups. E.g. if your data.frame contains 5 groups, you provide 5 argument values for each of the non-constant arguments (see `var_args`).

The number of expected groups is specified during initialization and the input `data` must be grouped such that it contains that exact number of groups.

Transformations are applied to groups separately, why the given transformation function only receives the subset of `data` belonging to the current group.

Standard workflow: Instantiate pipeline -> Add transformations -> Apply to data

To apply the same arguments to all groups, see Pipeline.

To apply generated argument values to an arbitrary number of groups, see GeneratedPipeline.

Arguments

Author

Ludvig Renbo Olsen, r-pkgs@ludvigolsen.dk

Super class

rearrr::Pipeline -> FixedGroupsPipeline

Public fields

transformations

list of transformations to apply.

names

Names of the transformations.

num_groups

Number of groups the pipeline will be applied to.

Methods


Method new()

Initialize the pipeline with the number of groups the pipeline will be applied to.

Usage

FixedGroupsPipeline$new(num_groups)

Arguments

num_groups

Number of groups the pipeline will be applied to.


Method add_transformation()

Add a transformation to the pipeline.

Usage

FixedGroupsPipeline$add_transformation(fn, args, var_args, name)

Arguments

fn

Function that performs the transformation.

args

Named list with arguments for the `fn` function.

var_args

Named list of arguments with list of differing values for each group.

E.g. list("a" = list(1, 2, 3), "b" = list("a", "b", "c")) given 3 groups.

By adding ".apply" with a list of TRUE/FALSE flags, the transformation can be disabled for a specific group.

E.g. list(".apply" = list(TRUE, FALSE, TRUE), ....

name

Name of the transformation step. Must be unique.

Returns

The pipeline. To allow chaining of methods.


Method apply()

Apply the pipeline to a data.frame.

Usage

FixedGroupsPipeline$apply(data, verbose = FALSE)

Arguments

data

data.frame with the same number of groups as pre-registered in the pipeline.

You can find the number of groups in `data` with `dplyr::n_groups(data)`. The number of groups expected by the pipeline can be accessed with `pipe$num_groups`.

verbose

Whether to print the progress.

Returns

Transformed version of `data`.


Method print()

Print an overview of the pipeline.

Usage

FixedGroupsPipeline$print(...)

Arguments

...

further arguments passed to or from other methods.

Returns

The pipeline. To allow chaining of methods.


Method clone()

The objects of this class are cloneable with this method.

Usage

FixedGroupsPipeline$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

See Also

Other pipelines: GeneratedPipeline, Pipeline

Examples

Run this code
# Attach package
library(rearrr)
library(dplyr)

# Create a data frame
# We group it by G so we have 3 groups
df <- data.frame(
  "Index" = 1:12,
  "A" = c(1:4, 9:12, 15:18),
  "G" = rep(1:3, each = 4)
) %>%
  dplyr::group_by(G)

# Create new pipeline
pipe <- FixedGroupsPipeline$new(num_groups = 3)

# Add 2D rotation transformation
pipe$add_transformation(
  fn = rotate_2d,
  args = list(
    x_col = "Index",
    y_col = "A",
    suffix = "",
    overwrite = TRUE
  ),
  var_args = list(
    degrees = list(45, 90, 180),
    origin = list(c(0, 0), c(1, 2), c(-1, 0))
  ),
  name = "Rotate"
)

# Add the `cluster_group` transformation
# As the function is fed an ungrouped subset of `data`,
# i.e. the rows of that group, we need to specify `group_cols` in `args`
# That is specific to `cluster_groups()` though
# Also note `.apply` in `var_args` which tells the pipeline *not*
# to apply this transformation to the second group
pipe$add_transformation(
  fn = cluster_groups,
  args = list(
    cols = c("Index", "A"),
    suffix = "",
    overwrite = TRUE,
    group_cols = "G"
  ),
  var_args = list(
    multiplier = list(0.5, 1, 5),
    .apply = list(TRUE, FALSE, TRUE)
  ),
  name = "Cluster"
)

# Check pipeline object
pipe

# Apply pipeline to already grouped data.frame
# Enable `verbose` to print progress
pipe$apply(df, verbose = TRUE)

Run the code above in your browser using DataLab