Learn R Programming

rearrr (version 0.3.4)

Pipeline: Chain multiple transformations

Description

lifecycle::badge("experimental")

Build a pipeline of transformations to be applied sequentially.

Uses the same arguments for all groups in `data`.

Groupings are reset between each transformation. See group_cols.

Standard workflow: Instantiate pipeline -> Add transformations -> Apply to data

To apply different argument values to each group, see GeneratedPipeline for generating argument values for an arbitrary number of groups and FixedGroupsPipeline for specifying specific values for a fixed set of groups.

Arguments

Author

Ludvig Renbo Olsen, r-pkgs@ludvigolsen.dk

Public fields

transformations

list of transformations to apply.

names

Names of the transformations.

Methods


Method add_transformation()

Add a transformation to the pipeline.

Usage

Pipeline$add_transformation(fn, args, name, group_cols = NULL)

Arguments

fn

Function that performs the transformation.

args

Named list with arguments for the `fn` function.

name

Name of the transformation step. Must be unique.

group_cols

Names of the columns to group the input data by before applying the transformation.

Note that the transformation function is applied separately to each group (subset). If the `fn` function requires access to the entire data.frame, the grouping columns should be specified as part of `args` and handled by the `fn` function.

Returns

The pipeline. To allow chaining of methods.


Method apply()

Apply the pipeline to a data.frame.

Usage

Pipeline$apply(data, verbose = FALSE)

Arguments

data

data.frame.

A grouped data.frame will raise a warning and the grouping will be ignored. Use the `group_cols` argument in the `add_transformation` method to specify how `data` should be grouped for each transformation.

verbose

Whether to print the progress.

Returns

Transformed version of `data`.


Method print()

Print an overview of the pipeline.

Usage

Pipeline$print(...)

Arguments

...

further arguments passed to or from other methods.

Returns

The pipeline. To allow chaining of methods.


Method clone()

The objects of this class are cloneable with this method.

Usage

Pipeline$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

See Also

Other pipelines: FixedGroupsPipeline, GeneratedPipeline

Examples

Run this code
# Attach package
library(rearrr)

# Create a data frame
df <- data.frame(
  "Index" = 1:12,
  "A" = c(1:4, 9:12, 15:18),
  "G" = rep(1:3, each = 4)
)

# Create new pipeline
pipe <- Pipeline$new()

# Add 2D rotation transformation
# Note that we specify the grouping via `group_cols`
pipe$add_transformation(
  fn = rotate_2d,
  args = list(
    x_col = "Index",
    y_col = "A",
    origin = c(0, 0),
    degrees = 45,
    suffix = "",
    overwrite = TRUE
  ),
  name = "Rotate",
  group_cols = "G"
)

# Add the `cluster_group` transformation
# Note that this function requires the entire input data
# to properly scale the groups. We therefore specify `group_cols`
# as part of `args`. This works as `cluster_groups()` accepts that
# argument.
pipe$add_transformation(
  fn = cluster_groups,
  args = list(
    cols = c("Index", "A"),
    suffix = "",
    overwrite = TRUE,
    multiplier = 0.05,
    group_cols = "G"
  ),
  name = "Cluster"
)

# Check pipeline object
pipe

# Apply pipeline to data.frame
# Enable `verbose` to print progress
pipe$apply(df, verbose = TRUE)

Run the code above in your browser using DataLab