Learn R Programming

tidytable

tidytable is a data frame manipulation library for users who need data.table speed but prefer tidyverse-like syntax.

Installation

Install the released version from CRAN with:

install.packages("tidytable")

Or install the development version from GitHub with:

# install.packages("pak")
pak::pak("markfairbanks/tidytable")

General syntax

tidytable replicates tidyverse syntax but uses data.table in the background. In general you can simply use library(tidytable) to replace your existing dplyr and tidyr code with data.table backed equivalents.

A full list of implemented functions can be found here.

library(tidytable)

df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))

df %>%
  select(x, y, z) %>%
  filter(x < 4, y > 1) %>%
  arrange(x, y) %>%
  mutate(double_x = x * 2,
         x_plus_y = x + y)
#> # A tidytable: 3 × 5
#>       x     y z     double_x x_plus_y
#>   <int> <int> <chr>    <dbl>    <int>
#> 1     1     4 a            2        5
#> 2     2     5 a            4        7
#> 3     3     6 b            6        9

Applying functions by group

You can use the normal tidyverse group_by()/ungroup() workflow, or you can use .by syntax to reduce typing. Using .by in a function is shorthand for df %>% group_by() %>% some_function() %>% ungroup().

  • A single column can be passed with .by = z
  • Multiple columns can be passed with .by = c(y, z)
df <- data.table(x = c("a", "a", "b"), y = c("a", "a", "b"), z = 1:3)

df %>%
  summarize(avg_z = mean(z),
            .by = c(x, y))
#> # A tidytable: 2 × 3
#>   x     y     avg_z
#>   <chr> <chr> <dbl>
#> 1 a     a       1.5
#> 2 b     b       3

All functions that can operate by group have a .by argument built in. (mutate(), filter(), summarize(), etc.)

The above syntax is equivalent to:

df %>%
  group_by(x, y) %>%
  summarize(avg_z = mean(z)) %>%
  ungroup()
#> # A tidytable: 2 × 3
#>   x     y     avg_z
#>   <chr> <chr> <dbl>
#> 1 a     a       1.5
#> 2 b     b       3

Both options are available for users, so you can use the syntax that you prefer.

tidyselect support

tidytable allows you to select/drop columns just like you would in the tidyverse by utilizing the tidyselect package in the background.

Normal selection can be mixed with all tidyselect helpers: everything(), starts_with(), ends_with(), any_of(), where(), etc.

df <- data.table(
  a = 1:3,
  b1 = 4:6,
  b2 = 7:9,
  c = c("a", "a", "b")
)

df %>%
  select(a, starts_with("b"))
#> # A tidytable: 3 × 3
#>       a    b1    b2
#>   <int> <int> <int>
#> 1     1     4     7
#> 2     2     5     8
#> 3     3     6     9

A full overview of selection options can be found here.

Using tidyselect in .by

tidyselect helpers also work when using .by:

df <- data.table(x = c("a", "a", "b"), y = c("a", "a", "b"), z = 1:3)

df %>%
  summarize(avg_z = mean(z),
            .by = where(is.character))
#> # A tidytable: 2 × 3
#>   x     y     avg_z
#>   <chr> <chr> <dbl>
#> 1 a     a       1.5
#> 2 b     b       3

Tidy evaluation compatibility

Tidy evaluation can be used to write custom functions with tidytable functions. The embracing shortcut {{ }} works, or you can use enquo() with !! if you prefer:

df <- data.table(x = c(1, 1, 1), y = 4:6, z = c("a", "a", "b"))

add_one <- function(data, add_col) {
  data %>%
    mutate(new_col = {{ add_col }} + 1)
}

df %>%
  add_one(x)
#> # A tidytable: 3 × 4
#>       x     y z     new_col
#>   <dbl> <int> <chr>   <dbl>
#> 1     1     4 a           2
#> 2     1     5 a           2
#> 3     1     6 b           2

The .data and .env pronouns also work within tidytable functions:

var <- 10

df %>%
  mutate(new_col = .data$x + .env$var)
#> # A tidytable: 3 × 4
#>       x     y z     new_col
#>   <dbl> <int> <chr>   <dbl>
#> 1     1     4 a          11
#> 2     1     5 a          11
#> 3     1     6 b          11

A full overview of tidy evaluation can be found here.

dt() helper

The dt() function makes regular data.table syntax pipeable, so you can easily mix tidytable syntax with data.table syntax:

df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))

df %>%
  dt(, .(x, y, z)) %>%
  dt(x < 4 & y > 1) %>%
  dt(order(x, y)) %>%
  dt(, double_x := x * 2) %>%
  dt(, .(avg_x = mean(x)), by = z)
#> # A tidytable: 2 × 2
#>   z     avg_x
#>   <chr> <dbl>
#> 1 a       1.5
#> 2 b       3

Speed Comparisons

For those interested in performance, speed comparisons can be found here.

Acknowledgements

tidytable is only possible because of the great contributions to R by the data.table and tidyverse teams. data.table is used as the main data frame engine in the background, while tidyverse packages like rlang, vctrs, and tidyselect are heavily relied upon to give users an experience similar to dplyr and tidyr.

Copy Link

Version

Install

install.packages('tidytable')

Monthly Downloads

2,522

Version

0.11.1

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Last Published

July 18th, 2024

Functions in tidytable (0.11.1)

between

Do the values from x fall between the left and right bounds?
extract

Extract a character column into multiple columns using regex
fill

Fill in missing values with previous or next value
c_across

Combine values from multiple columns
crossing

Create a data.table from all unique combinations of inputs
bind_cols

Bind data.tables by row and column
desc

Descending order
dt

Pipeable data.table call
enframe

Convert a vector to a data.table/tidytable
%in%

Fast %in% and %notin% operators
case

data.table::fcase() with vectorized default
consecutive_id

Generate a unique id for consecutive values
context

Context functions
is_grouped_df

Check if the tidytable is grouped
coalesce

Coalesce missing values
distinct

Select distinct/unique rows
add_count

Add a count column to the data frame
inv_gc

Run invisible garbage collection
drop_na

Drop rows containing missing values
filter

Filter rows on one or more conditions
case_match

Vectorized switch()
case_when

Case when
complete

Complete a data.table with missing combinations of data
is_tidytable

Test if the object is a tidytable
cross_join

Cross join
count

Count observations by group
first

Extract the first, last, or nth value from a vector
if_all

Create conditions on a selection of columns
n_distinct

Count the number of unique values in a vector
nest

Nest columns into a list-column
nest_by

Nest data.tables
na_if

Convert values to NA
select

Select or drop columns
if_else

Fast if_else
map

Apply a function to each element of a vector or list
pull

Pull out a single variable
expand

Expand a data.table to use all combinations of values
row_number

Ranking functions
reframe

Reframe a data frame
expand_grid

Create a data.table from all combinations of inputs
separate

Separate a character column into multiple columns
tribble

Rowwise tidytable creation
mutate

Add/modify/delete columns
recode

Recode values
uncount

Uncount a data.table
reexports

Objects exported from other packages
new_tidytable

Create a tidytable from a list
nest_join

Nest join
separate_longer_delim

Split a string into rows
fread

Read/write files
relocate

Relocate a column to a new position
top_n

Select top (or bottom) n rows (by value)
group_split

Split data frame by groups
group_vars

Get the grouping variables
get_dummies

Convert character and factor columns to dummy variables
group_by

Grouping
lag

Get lagging or leading values
separate_rows

Separate a collapsed column into multiple rows
separate_wider_delim

Separate a character column into multiple columns
transmute

Add new variables and drop all others
left_join

Join two data.tables together
unite

Unite multiple columns by pasting strings together
unnest

Unnest list-columns
group_cols

Selection helper for grouping columns
pivot_longer

Pivot data from wide to long
pivot_wider

Pivot data from long to wide
replace_na

Replace missing values
rowwise

Convert to a rowwise tidytable
unnest_longer

Unnest a list-column of vectors into regular columns
unnest_wider

Unnest a list-column of vectors into a wide data frame
slice_head

Choose rows in a data.table
separate_wider_regex

Separate a character column into multiple columns using regex patterns
summarize

Aggregate data using summary statistics
mutate_rowwise

Add/modify columns by row
n

Number of observations in each group
pick

Selection version of across()
%>%

Pipe operator
rename

Rename variables by name
rename_with

Rename multiple columns
tidytable-vctrs

Internal vctrs methods
tidytable

Build a data.table/tidytable
arrange

Arrange/reorder rows
across

Apply a function across a selection of columns
as_tidytable

Coerce an object to a data.table/tidytable