Learn R Programming

tidyr (version 0.5.1)

expand: Expand data frame to include all combinations of values

Description

expand() is often useful in conjunction with left_join if you want to convert implicit missing values to explicit missing values. Or you can use it in conjunction with anti_join() to figure out which combinations are missing.

Usage

expand(data, ...)
crossing(...)
nesting(...)

Arguments

data
A data frame
...
Specification of columns to expand.

To find all unique combinations of x, y and z, including those not found in the data, supply each variable as a separate argument. To find only the combinations that occur in the data, use nest: expand(df, nesting(x, y, z)).

You can combine the two forms. For example, expand(df, nesting(school_id, student_id), date) would produce a row for every student for each date.

To fill in values that are missing altogether, use expressions like year = 2010:2020 or year = full_seq(year).

Details

crossing() is similar to expand.grid(), this never converts strings to factors, returns a tbl_df without additional attributes, and first factors vary slowest. nesting() is the complement to crossing(): it only keeps combinations of all variables that appear in the data.

See Also

complete for a common application of expand: completing a data frame with missing combinations.

expand_ for a version that uses regular evaluation and is suitable for programming with.

Examples

Run this code
library(dplyr)
# All possible combinations of vs & cyl, even those that aren't
# present in the data
expand(mtcars, vs, cyl)

# Only combinations of vs and cyl that appear in the data
expand(mtcars, nesting(vs, cyl))

# Implicit missings ---------------------------------------------------------
df <- data_frame(
  year   = c(2010, 2010, 2010, 2010, 2012, 2012, 2012),
  qtr    = c(   1,    2,    3,    4,    1,    2,    3),
  return = rnorm(7)
)
df %>% expand(year, qtr)
df %>% expand(year = 2010:2012, qtr)
df %>% expand(year = full_seq(year, 1), qtr)
df %>% complete(year = full_seq(year, 1), qtr)

# Nesting -------------------------------------------------------------------
# Each person was given one of two treatments, repeated three times
# But some of the replications haven't happened yet, so we have
# incomplete data:
experiment <- data_frame(
  name = rep(c("Alex", "Robert", "Sam"), c(3, 2, 1)),
  trt  = rep(c("a", "b", "a"), c(3, 2, 1)),
  rep = c(1, 2, 3, 1, 2, 1),
  measurment_1 = runif(6),
  measurment_2 = runif(6)
)

# We can figure out the complete set of data with expand()
# Each person only gets one treatment, so we nest name and trt together:
all <- experiment %>% expand(nesting(name, trt), rep)
all

# We can use anti_join to figure out which observations are missing
all %>% anti_join(experiment)

# And use right_join to add in the appropriate missing values to the
# original data
all %>% right_join(experiment)
# Or use the complete() short-hand
experiment %>% complete(nesting(name, trt), rep)

Run the code above in your browser using DataLab