A verb for a dplyr pipeline:
In the given data frame, take the .level column as a set of levels and the .count column
as corresponding counts. Return a data frame where the rows are lumped according to levels/counts
using the parameters n, prop, other_level, ties.method like for lump()
.
The resulting row for other_level has level=other level
, count=sum(count of all lumped rows)
.
For the remaining columns, either a default concatenation is used, or you can provide
custom summarising statements via the summarising_statements parameter.
Provide a list named by the column you want to summarize, giving statements wrapped in quo(),
using syntax as you would for a call to summarise().
lump_rows(
.df,
.level,
.count,
summarising_statements = quos(),
n,
prop,
remaining_levels,
other_level = "Other",
ties.method = c("min", "average", "first", "last", "random", "max")
)
A data frame
Column name (symbolic) containing a set of levels
Column name (symbolic) containing counts of the levels
The "lumped" rows need to have all their columns summarised into one row.
This parameter is a vars() list of arguments as if used in a call to summarise()
,
name is column name, value is language.
If not provided for a column, a default summary will be used
which takes the sum if numeric, concatenates text, or uses any() if logical.
If specified, n rows shall be preserved.
If specified, rows shall be preserved if their count >= prop
Levels that should explicitly not be lumped
Name of the "other" level to be created from lumped rows
Method to apply in case of ties
The lumped data frame