Learn R Programming

⚠️There's a newer version (1.3.1) of this package.Take me there.

tidyr

Overview

The goal of tidyr is to help you create tidy data. Tidy data is data where:

  1. Each variable is in a column.
  2. Each observation is a row.
  3. Each value is a cell.

Tidy data describes a standard way of storing data that is used wherever possible throughout the tidyverse. If you ensure that your data is tidy, you'll spend less timing fighting with the tools and more time working on your analysis.

Installation

# The easiest way to get tidyr is to install the whole tidyverse:
install.packages("tidyverse")

# Alternatively, install just tidyr:
install.packages("tidyr")

# Or the the development version from GitHub:
# install.packages("devtools")
devtools::install_github("tidyverse/tidyr")

Getting started

library(tidyr)

There are two fundamental verbs of data tidying:

  • gather() takes multiple columns, and gathers them into key-value pairs: it makes "wide" data longer.

  • spread(). takes two columns (key & value) and spreads in to multiple columns, it makes "long" data wider.

tidyr also provides separate() and extract() functions which makes it easier to pull apart a column that represents multiple variables. The complement to separate() is unite().

To get started, read the tidy data vignette (vignette("tidy-data")) and check out the demos, demo(package = "tidyr")).

Related work

tidyr replaces reshape2 (2010-2014) and reshape (2005-2010). Somewhat counterintuitively each iteration of the the package has done less. tidyr is designed specifically for tidying data, not general reshaping (reshape2), or the general aggregation (reshape).

If you'd like to read more about data reshaping from a CS perspective, I'd recommend the following three papers:

To guide your reading, here's translation between the terminology used in different places:

tidyrgatherspread
reshape(2)meltcast
spreadsheetsunpivotpivot
databasesfoldunfold

Copy Link

Version

Install

install.packages('tidyr')

Monthly Downloads

1,202,383

Version

0.6.1

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Last Published

January 10th, 2017

Functions in tidyr (0.6.1)

expand

Expand data frame to include all combinations of values
drop_na

Drop rows containing missing values
extract_

Standard-evaluation version of extract.
fill_

Standard-evaluation version of fill.
complete_

Standard-evaluation version of complete.
complete

Complete a data frame with missing combinations of data.
extract_numeric

Extract numeric component of variable.
expand_

Expand (standard evaluation).
extract

Extract one column into multiple columns.
drop_na_

Standard-evaluation version of drop_na.
gather_

Gather (standard-evaluation).
nest_

Standard-evaluation version of nest.
nest

Nest repeated values in a list-variable.
gather

Gather columns into key-value pairs.
%>%

Pipe operator
fill

Fill in missing values.
full_seq

Create the full sequence of values in a vector.
replace_na

Replace missing values
separate_

Standard-evaluation version of separate.
separate_rows_

Standard-evaluation version of separate_rows.
table1

Example tabular representations
spread

Spread a key-value pair across multiple columns.
unnest

Unnest a list column.
unnest_

Standard-evaluation version of unnest.
unite_

Standard-evaluation version of unite
spread_

Standard-evaluation version of spread.
smiths

Some data about the Smith family.
unite

Unite multiple columns into one.
separate_rows

Separate a collapsed column into multiple rows.
separate

Separate one column into multiple columns.
who

World Health Organization TB data