lump_rows: Lump rows of a tibble

Description

A verb for a dplyr pipeline: In the given data frame, take the .level column as a set of levels and the .count column as corresponding counts. Return a data frame where the rows are lumped according to levels/counts using the parameters n, prop, other_level, ties.method like for lump(). The resulting row for other_level has level=other level, count=sum(count of all lumped rows). For the remaining columns, either a default concatenation is used, or you can provide custom summarising statements via the summarising_statements parameter. Provide a list named by the column you want to summarize, giving statements wrapped in quo(), using syntax as you would for a call to summarise().

Usage

lump_rows(
  .df,
  .level,
  .count,
  summarising_statements = quos(),
  n,
  prop,
  remaining_levels,
  other_level = "Other",
  ties.method = c("min", "average", "first", "last", "random", "max")
)

Arguments

.df

A data frame

.level

Column name (symbolic) containing a set of levels

.count

Column name (symbolic) containing counts of the levels

summarising_statements

The "lumped" rows need to have all their columns summarised into one row. This parameter is a vars() list of arguments as if used in a call to summarise(), name is column name, value is language. If not provided for a column, a default summary will be used which takes the sum if numeric, concatenates text, or uses any() if logical.

If specified, n rows shall be preserved.

prop

If specified, rows shall be preserved if their count >= prop

remaining_levels

Levels that should explicitly not be lumped

other_level

Name of the "other" level to be created from lumped rows

ties.method

Method to apply in case of ties

Value

The lumped data frame

Description

Usage

Arguments

Value

See Also