Learn R Programming

⚠️There's a newer version (0.11.1) of this package.Take me there.

tidytable

tidytable is a data frame manipulation library for users who need data.table speed but prefer tidyverse-like syntax.

Installation

Install the released version from CRAN with:

install.packages("tidytable")

Or install the development version from GitHub with:

# install.packages("pak")
pak::pak("markfairbanks/tidytable")

General syntax

tidytable replicates tidyverse syntax but uses data.table in the background. In general you can simply use library(tidytable) to replace your existing dplyr and tidyr code with data.table backed equivalents.

A full list of implemented functions can be found here.

library(tidytable)

df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))

df %>%
  select(x, y, z) %>%
  filter(x < 4, y > 1) %>%
  arrange(x, y) %>%
  mutate(double_x = x * 2,
         x_plus_y = x + y)
#> # A tidytable: 3 × 5
#>       x     y z     double_x x_plus_y
#>   <int> <int> <chr>    <dbl>    <int>
#> 1     1     4 a            2        5
#> 2     2     5 a            4        7
#> 3     3     6 b            6        9

Applying functions by group

You can use the normal tidyverse group_by()/ungroup() workflow, or you can use .by syntax to reduce typing. Using .by in a function is shorthand for df %>% group_by() %>% some_function() %>% ungroup().

  • A single column can be passed with .by = z
  • Multiple columns can be passed with .by = c(y, z)
df <- data.table(x = c("a", "a", "b"), y = c("a", "a", "b"), z = 1:3)

df %>%
  summarize(avg_z = mean(z),
            .by = c(x, y))
#> # A tidytable: 2 × 3
#>   x     y     avg_z
#>   <chr> <chr> <dbl>
#> 1 a     a       1.5
#> 2 b     b       3

All functions that can operate by group have a .by argument built in. (mutate(), filter(), summarize(), etc.)

The above syntax is equivalent to:

df %>%
  group_by(x, y) %>%
  summarize(avg_z = mean(z)) %>%
  ungroup()
#> # A tidytable: 2 × 3
#>   x     y     avg_z
#>   <chr> <chr> <dbl>
#> 1 a     a       1.5
#> 2 b     b       3

Both options are available for users, so you can use the syntax that you prefer.

tidyselect support

tidytable allows you to select/drop columns just like you would in the tidyverse by utilizing the tidyselect package in the background.

Normal selection can be mixed with all tidyselect helpers: everything(), starts_with(), ends_with(), any_of(), where(), etc.

df <- data.table(
  a = 1:3,
  b1 = 4:6,
  b2 = 7:9,
  c = c("a", "a", "b")
)

df %>%
  select(a, starts_with("b"))
#> # A tidytable: 3 × 3
#>       a    b1    b2
#>   <int> <int> <int>
#> 1     1     4     7
#> 2     2     5     8
#> 3     3     6     9

A full overview of selection options can be found here.

Using tidyselect in .by

tidyselect helpers also work when using .by:

df <- data.table(x = c("a", "a", "b"), y = c("a", "a", "b"), z = 1:3)

df %>%
  summarize(avg_z = mean(z),
            .by = where(is.character))
#> # A tidytable: 2 × 3
#>   x     y     avg_z
#>   <chr> <chr> <dbl>
#> 1 a     a       1.5
#> 2 b     b       3

Tidy evaluation compatibility

Tidy evaluation can be used to write custom functions with tidytable functions. The embracing shortcut {{ }} works, or you can use enquo() with !! if you prefer:

df <- data.table(x = c(1, 1, 1), y = 4:6, z = c("a", "a", "b"))

add_one <- function(data, add_col) {
  data %>%
    mutate(new_col = {{ add_col }} + 1)
}

df %>%
  add_one(x)
#> # A tidytable: 3 × 4
#>       x     y z     new_col
#>   <dbl> <int> <chr>   <dbl>
#> 1     1     4 a           2
#> 2     1     5 a           2
#> 3     1     6 b           2

The .data and .env pronouns also work within tidytable functions:

var <- 10

df %>%
  mutate(new_col = .data$x + .env$var)
#> # A tidytable: 3 × 4
#>       x     y z     new_col
#>   <dbl> <int> <chr>   <dbl>
#> 1     1     4 a          11
#> 2     1     5 a          11
#> 3     1     6 b          11

A full overview of tidy evaluation can be found here.

dt() helper

The dt() function makes regular data.table syntax pipeable, so you can easily mix tidytable syntax with data.table syntax:

df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))

df %>%
  dt(, .(x, y, z)) %>%
  dt(x < 4 & y > 1) %>%
  dt(order(x, y)) %>%
  dt(, double_x := x * 2) %>%
  dt(, .(avg_x = mean(x)), by = z)
#> # A tidytable: 2 × 2
#>   z     avg_x
#>   <chr> <dbl>
#> 1 a       1.5
#> 2 b       3

Speed Comparisons

For those interested in performance, speed comparisons can be found here.

Acknowledgements

tidytable is only possible because of the great contributions to R by the data.table and tidyverse teams. data.table is used as the main data frame engine in the background, while tidyverse packages like rlang, vctrs, and tidyselect are heavily relied upon to give users an experience similar to dplyr and tidyr.

Copy Link

Version

Install

install.packages('tidytable')

Monthly Downloads

2,522

Version

0.10.2

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Last Published

October 4th, 2023

Functions in tidytable (0.10.2)

c_across.

Combine values from multiple columns
bind_cols

Bind data.tables by row and column
case

data.table::fcase() with vectorized default
c_across

Combine values from multiple columns
between.

Do the values from x fall between the left and right bounds?
bind_rows.

Bind data.tables by row and column
bind_cols.

Bind data.tables by row and column
case.

data.table::fcase() with vectorized default
case_match.

Vectorized switch()
between

Do the values from x fall between the left and right bounds?
case_match

Vectorized switch()
case_when

Case when
consecutive_id

Generate a unique id for consecutive values
case_when.

Case when
coalesce.

Coalesce missing values
complete

Complete a data.table with missing combinations of data
consecutive_id.

Generate a unique id for consecutive values
count.

Count observations by group
context

Context functions
desc.

Descending order
cross_join

Cross join
expand

Expand a data.table to use all combinations of values
complete.

Complete a data.table with missing combinations of data
cur_group_rows.

Context functions
enframe.

Convert a vector to a data.table/tidytable
coalesce

Coalesce missing values
crossing

Create a data.table from all unique combinations of inputs
expand.

Expand a data.table to use all combinations of values
crossing.

Create a data.table from all unique combinations of inputs
desc

Descending order
cur_column.

Context functions
enframe

Convert a vector to a data.table/tidytable
distinct

Select distinct/unique rows
count

Count observations by group
extract

Extract a character column into multiple columns using regex
extract.

Extract a character column into multiple columns using regex
distinct.

Select distinct/unique rows
cur_group_id.

Context functions
first.

Extract the first, last, or nth value from a vector
drop_na.

Drop rows containing missing values
drop_na

Drop rows containing missing values
fill

Fill in missing values with previous or next value
fread.

Read/write files
fill.

Fill in missing values with previous or next value
full_join.

Join two data.tables together
cur_data.

Context functions
fread

Read/write files
group_split

Split data frame by groups
fwrite.

Read/write files
group_vars.

Get the grouping variables
dt

Pipeable data.table call
first

Extract the first, last, or nth value from a vector
group_by.

Grouping
if_else.

Fast if_else
if_all.

Create conditions on a selection of columns
filter

Filter rows on one or more conditions
group_vars

Get the grouping variables
group_cols

Selection helper for grouping columns
left_join

Join two data.tables together
group_split.

Split data frame by groups
lags.

Get lagging or leading values
map2_df.

Apply a function to each element of a vector or list
filter.

Filter rows on one or more conditions
last.

Extract the first, last, or nth value from a vector
if_else

Fast if_else
map2_dfc.

Apply a function to each element of a vector or list
map.

Apply a function to each element of a vector or list
leads.

Get lagging or leading values
group_by

Grouping
map

Apply a function to each element of a vector or list
map_dfc.

Apply a function to each element of a vector or list
left_join.

Join two data.tables together
map2.

Apply a function to each element of a vector or list
map_dfr.

Apply a function to each element of a vector or list
if_all

Create conditions on a selection of columns
map_dbl.

Apply a function to each element of a vector or list
if_any.

Create conditions on a selection of columns
is_grouped_df.

Check if the tidytable is grouped
is_grouped_df

Check if the tidytable is grouped
map2_dfr.

Apply a function to each element of a vector or list
mutate.

Add/modify/delete columns
map2_int.

Apply a function to each element of a vector or list
lag

Get lagging or leading values
%in%

Fast %in% and %notin% operators
inv_gc

Run invisible garbage collection
is_tidytable

Test if the object is a tidytable
expand_grid

Create a data.table from all combinations of inputs
n_distinct

Count the number of unique values in a vector
get_dummies.

Convert character and factor columns to dummy variables
expand_grid.

Create a data.table from all combinations of inputs
inner_join.

Join two data.tables together
map2_chr.

Apply a function to each element of a vector or list
map_lgl.

Apply a function to each element of a vector or list
ifelse.

Fast ifelse
map_int.

Apply a function to each element of a vector or list
get_dummies

Convert character and factor columns to dummy variables
new_tidytable

Create a tidytable from a list
pivot_wider

Pivot data from long to wide
nesting.

Expand a data.table to use all combinations of values
na_if.

Convert values to NA
mutate

Add/modify/delete columns
pmap.

Apply a function to each element of a vector or list
map_df.

Apply a function to each element of a vector or list
n

Number of observations in each group
map2_dbl.

Apply a function to each element of a vector or list
mutate_rowwise

Add/modify columns by row
pivot_longer

Pivot data from wide to long
pivot_longer.

Pivot data from wide to long
%>%

Pipe operator
map_chr.

Apply a function to each element of a vector or list
map2_lgl.

Apply a function to each element of a vector or list
n.

Number of observations in each group
nest_join

Nest join
nest_by

Nest data.tables
na_if

Convert values to NA
mutate_across.

Mutate multiple columns simultaneously
pull.

Pull out a single variable
pmap_lgl.

Apply a function to each element of a vector or list
nth.

Extract the first, last, or nth value from a vector
pmap_int.

Apply a function to each element of a vector or list
mutate_rowwise.

Add/modify columns by row
nest.

Nest columns into a list-column
pmap_dfr.

Apply a function to each element of a vector or list
n_distinct.

Count the number of unique values in a vector
nest

Nest columns into a list-column
pivot_wider.

Pivot data from long to wide
replace_na

Replace missing values
separate.

Separate a character column into multiple columns
reframe

Reframe a data frame
separate

Separate a character column into multiple columns
pick

Selection version of across()
right_join.

Join two data.tables together
slice_max.

Choose rows in a data.table
tally.

Count observations by group
slice_min.

Choose rows in a data.table
pmap_chr.

Apply a function to each element of a vector or list
tidytable-vctrs

Internal vctrs methods
unnest_longer

Unnest a list-column of vectors into regular columns
unnest_wider.

Unnest a list-column of vectors into a wide data frame
relocate

Relocate a column to a new position
separate_wider_delim

Separate a character column into multiple columns
rowwise

Convert to a rowwise tidytable
pull

Pull out a single variable
tribble

Rowwise tidytable creation
separate_rows

Separate a collapsed column into multiple rows
transmute

Add new variables and drop all others
summarise.

Aggregate data using summary statistics
rowwise.

Convert to a rowwise tidytable
slice_head.

Choose rows in a data.table
pmap_dbl.

Apply a function to each element of a vector or list
summarize.

Aggregate data using summary statistics
select.

Select or drop columns
row_number

Ranking functions
slice_head

Choose rows in a data.table
select

Select or drop columns
rename_with

Rename multiple columns
relocate.

Relocate a column to a new position
row_number.

Ranking functions
semi_join.

Join two data.tables together
nest_by.

Nest data.tables
rename.

Rename variables by name
uncount.

Uncount a data.table
replace_na.

Replace missing values
ungroup.

Grouping
uncount

Uncount a data.table
unite.

Unite multiple columns by pasting strings together
unite

Unite multiple columns by pasting strings together
unnest.

Unnest list-columns
separate_rows.

Separate a collapsed column into multiple rows
recode

Recode values
pmap_df.

Apply a function to each element of a vector or list
separate_longer_delim

Split a string into rows
reexports

Objects exported from other packages
pmap_dfc.

Apply a function to each element of a vector or list
unnest_wider

Unnest a list-column of vectors into a wide data frame
top_n

Select top (or bottom) n rows (by value)
slice_tail.

Choose rows in a data.table
walk.

Apply a function to each element of a vector or list
transmute.

Add new variables and drop all others
slice_sample.

Choose rows in a data.table
rename

Rename variables by name
summarize

Aggregate data using summary statistics
separate_wider_regex

Separate a character column into multiple columns using regex patterns
slice.

Choose rows in a data.table
rename_with.

Rename multiple columns
summarize_across.

Summarize multiple columns
tidytable

Build a data.table/tidytable
top_n.

Select top (or bottom) n rows (by value)
unnest

Unnest list-columns
unnest_longer.

Unnest a list-column of vectors into regular columns
anti_join.

Join two data.tables together
arrange

Arrange/reorder rows
add_count.

Add a count column to the data frame
arrange.

Arrange/reorder rows
add_tally.

Add a count column to the data frame
add_count

Add a count column to the data frame
across

Apply a function across a selection of columns
across.

Apply a function across a selection of columns
arrange_across.

Arrange by a selection of variables
as_tidytable

Coerce an object to a data.table/tidytable