Divides data into groups by a range of methods. Splits data by these groups.
splt(data, n, method = "n_dist", starts_col = NULL, force_equal = FALSE,
allow_zero = FALSE, descending = FALSE, randomize = FALSE,
remove_missing_starts = FALSE)
Dataframe or Vector.
Dependent on method.
Number of groups (default), group size, list of group sizes,
list of group starts, step size or prime number to start at. See method
.
Passed as whole number(s) and/or percentage(s) (0
< n
< 1
)
and/or character.
Method l_starts
allows 'auto'
.
greedy
, n_dist
, n_fill
, n_last
,
n_rand
, l_sizes
, l_starts
, staircase
, or
primes
.
Notice: examples are sizes of the generated groups based on a vector with 57 elements.
n
is group size
n
is number of groups
n
is number of groups
n
is number of groups
n
is number of groups
n
is a list of group sizes
n
is a list of starting positions.
Skip values by c(value, skip_to_number) where skip_to_number is the nth appearance of the value
in the vector.
Groups automatically start from first data point.
\(E.g. n = c(1,3,7,25,50) outputs groups with sizes (2,4,18,25,8)\).
To skip: \(given vector c("a", "e", "o", "a", "e", "o"), n = list("a", "e", c("o", 2)) outputs groups with sizes (1,4,1)\).
If passing \(n = 'auto'\) the starting positions are automatically found with
find_starts()
.
n
is step size
n
is the prime number to start at
Name of column with values to match in method l_starts
when data is a dataframe. Pass 'index' to use row names. (Character)
Create equal groups by discarding excess data points. Implementation varies between methods. (Logical)
Whether n can be passed as 0
. (Logical)
Change direction of method. (Not fully implemented) (Logical)
Randomize the grouping factor (Logical)
Recursively remove elements from the
list of starts that are not found.
For method l_starts
only.
(Logical)
List of splitted data
Other grouping functions: group_factor
,
group
# NOT RUN {
# Attach packages
library(groupdata2)
library(dplyr)
# Create dataframe
df <- data.frame("x"=c(1:12),
"species" = rep(c('cat','pig', 'human'), 4),
"age" = sample(c(1:100), 12))
# Using splt()
df_list <- splt(df, 5, method = 'n_dist')
# }
Run the code above in your browser using DataLab