lifecycle::badge("stable")
Divides data into groups by a wide range of methods. Splits data by these groups.
Wraps group()
with split()
.
splt(
data,
n,
method = "n_dist",
starts_col = NULL,
force_equal = FALSE,
allow_zero = FALSE,
descending = FALSE,
randomize = FALSE,
remove_missing_starts = FALSE
)
list
of the split `data`
.
N.B. If `data`
is a grouped
data.frame
, there's an outer list
for each group. The names are based on the group indices
(see dplyr::group_indices()
).
data.frame
or vector
.
When a grouped data.frame
, the function is applied group-wise.
Depends on `method`
.
Number of groups (default), group size, list of group sizes,
list of group starts, number of data points between group members,
step size or prime number to start at. See `method`
.
Passed as whole number(s) and/or percentage(s) (0
< n
< 1
)
and/or character.
Method "l_starts"
allows 'auto'
.
"greedy"
, "n_dist"
, "n_fill"
, "n_last"
,
"n_rand"
, "l_sizes"
, "l_starts"
, "every"
, "staircase"
, or
"primes"
.
Note: examples are sizes of the generated groups
based on a vector with 57
elements.
Divides up the data greedily given a specified group size \((e.g. 10, 10, 10, 10, 10, 7)\).
`n`
is group size.
Divides the data into a specified number of groups and distributes excess data points across groups \((e.g. 11, 11, 12, 11, 12)\).
`n`
is number of groups.
Divides the data into a specified number of groups and fills up groups with excess data points from the beginning \((e.g. 12, 12, 11, 11, 11)\).
`n`
is number of groups.
Divides the data into a specified number of groups. It finds the most equal group sizes possible, using all data points. Only the last group is able to differ in size \((e.g. 11, 11, 11, 11, 13)\).
`n`
is number of groups.
Divides the data into a specified number of groups. Excess data points are placed randomly in groups (max. 1 per group) \((e.g. 12, 11, 11, 11, 12)\).
`n`
is number of groups.
Divides up the data by a list
of group sizes.
Excess data points are placed in an extra group at the end.
\(E.g. n = list(0.2, 0.3) outputs groups with sizes (11, 17, 29)\).
`n`
is a list
of group sizes.
Starts new groups at specified values in the `starts_col`
vector.
n
is a list
of starting positions.
Skip values by c(value, skip_to_number)
where skip_to_number
is the
nth appearance of the value in the vector after the previous group start.
The first data point is automatically a starting position.
\(E.g. n = c(1, 3, 7, 25, 50) outputs groups with sizes (2, 4, 18, 25, 8)\).
To skip: \(given vector c("a", "e", "o", "a", "e", "o"), n = list("a", "e", c("o", 2)) outputs groups with sizes (1, 4, 1)\).
If passing \(n = 'auto'\) the starting positions are automatically found
such that a group is started whenever a value differs from the previous value
(see find_starts()
).
Note that all NA
s are first replaced by a single unique value,
meaning that they will also cause group starts.
See differs_from_previous()
to set a threshold for what is considered "different".
\(E.g. n = "auto" for c(10, 10, 7, 8, 8, 9) would start groups at the first 10, 7, 8 and 9, and give c(1, 1, 2, 3, 3, 4).\)
Combines every `n`
th data point into a group.
\((e.g. 12, 12, 11, 11, 11 with n = 5)\).
`n`
is the number of data points between group members ("every n").
Uses step size to divide up the data. Group size increases with 1 step for every group, until there is no more data \((e.g. 5, 10, 15, 20, 7)\).
`n`
is step size.
Uses prime numbers as group sizes. Group size increases to the next prime number until there is no more data. \((e.g. 5, 7, 11, 13, 17, 4)\).
`n`
is the prime number to start at.
Name of column with values to match in method "l_starts"
when `data`
is a data.frame
. Pass 'index'
to use row names. (Character)
Create equal groups by discarding excess data points. Implementation varies between methods. (Logical)
Whether `n`
can be passed as 0
.
Can be useful when programmatically finding n
. (Logical)
Change the direction of the method. (Not fully implemented) (Logical)
Randomize the grouping factor. (Logical)
Recursively remove elements from the
list of starts that are not found.
For method "l_starts"
only.
(Logical)
Ludvig Renbo Olsen, r-pkgs@ludvigolsen.dk
Other grouping functions:
all_groups_identical()
,
collapse_groups()
,
collapse_groups_by
,
fold()
,
group()
,
group_factor()
,
partition()
# Attach packages
library(groupdata2)
library(dplyr)
# Create data frame
df <- data.frame(
"x" = c(1:12),
"species" = factor(rep(c("cat", "pig", "human"), 4)),
"age" = sample(c(1:100), 12)
)
# Using splt()
df_list <- splt(df, 5, method = "n_dist")
Run the code above in your browser using DataLab