Learn R Programming

tidyr (version 0.8.0)

deprecated-se: Deprecated SE versions of main verbs

Description

tidyr used to offer twin versions of each verb suffixed with an underscore. These versions had standard evaluation (SE) semantics: rather than taking arguments by code, like NSE verbs, they took arguments by value. Their purpose was to make it possible to program with tidyr. However, tidyr now uses tidy evaluation semantics. NSE verbs still capture their arguments, but you can now unquote parts of these arguments. This offers full programmability with NSE verbs. Thus, the underscored versions are now superfluous.

Usage

complete_(data, cols, fill = list(), ...)

drop_na_(data, vars)

expand_(data, dots, ...)

crossing_(x)

nesting_(x)

extract_(data, col, into, regex = "([[:alnum:]]+)", remove = TRUE, convert = FALSE, ...)

fill_(data, fill_cols, .direction = c("down", "up"))

gather_(data, key_col, value_col, gather_cols, na.rm = FALSE, convert = FALSE, factor_key = FALSE)

nest_(data, key_col, nest_cols = character())

separate_rows_(data, cols, sep = "[^[:alnum:].]+", convert = FALSE)

separate_(data, col, into, sep = "[^[:alnum:]]+", remove = TRUE, convert = FALSE, extra = "warn", fill = "warn", ...)

spread_(data, key_col, value_col, fill = NA, convert = FALSE, drop = TRUE, sep = NULL)

unite_(data, col, from, sep = "_", remove = TRUE)

unnest_(data, unnest_cols, .drop = NA, .id = NULL, .sep = NULL, .preserve = NULL)

Arguments

data

A data frame

fill

A named list that for each variable supplies a single value to use instead of NA for missing combinations.

...

Specification of columns to expand.

To find all unique combinations of x, y and z, including those not found in the data, supply each variable as a separate argument. To find only the combinations that occur in the data, use nest: expand(df, nesting(x, y, z)).

You can combine the two forms. For example, expand(df, nesting(school_id, student_id), date) would produce a row for every student for each date.

For factors, the full set of levels (not just those that appear in the data) are used. For continuous variables, you may need to fill in values that don't appear in the data: to do so use expressions like year = 2010:2020 or year = full_seq(year,1).

Length-zero (empty) elements are automatically dropped.

vars, cols, col

Name of columns.

x

For nesting_ and crossing_ a list of variables.

into

Names of new variables to create as character vector.

regex

a regular expression used to extract the desired values. The should be one group (defined by ()) for each element of into.

remove

If TRUE, remove input column from output data frame.

convert

If TRUE, will run type.convert() with as.is = TRUE on new columns. This is useful if the component columns are integer, numeric or logical.

fill_cols

Character vector of column names.

.direction

Direction in which to fill missing values. Currently either "down" (the default) or "up".

key_col, value_col

Strings giving names of key and value columns to create.

gather_cols

Character vector giving column names to be gathered into pair of key-value columns.

na.rm

If TRUE, will remove rows from output where the value column in NA.

factor_key

If FALSE, the default, the key values will be stored as a character vector. If TRUE, will be stored as a factor, which preserves the original ordering of the columns.

nest_cols

Character vector of columns to nest.

sep

Separator delimiting collapsed values.

extra

If sep is a character vector, this controls what happens when there are too many pieces. There are three valid options:

  • "warn" (the default): emit a warning and drop extra values.

  • "drop": drop any extra values without a warning.

  • "merge": only splits at most length(into) times

drop

If FALSE, will keep factor levels that don't appear in the data, filling in missing combinations with fill.

from

Names of existing columns as character vector

unnest_cols

Name of columns that needs to be unnested.

.drop

Should additional list columns be dropped? By default, unnest will drop them if unnesting the specified columns requires the rows to be duplicated.

.id

Data frame identifier - if supplied, will create a new column with name .id, giving a unique identifier. This is most useful if the list column is named.

.sep

If non-NULL, the names of unnested data frame columns will combine the name of the original list-col with the names from nested data frame, separated by .sep.

.preserve

Optionally, list-columns to preserve in the output. These will be duplicated in the same way as atomic vectors. This has dplyr::select semantics so you can preserve multiple variables with .preserve = c(x, y) or .preserve = starts_with("list").

expand_cols

Character vector of column names to be expanded.

key_col

Name of the column that will contain the nested data frames.

key_col, value_col

Strings giving names of key and value cols.

Details

Unquoting triggers immediate evaluation of its operand and inlines the result within the captured expression. This result can be a value or an expression to be evaluated later with the rest of the argument. See vignette("programming", "dplyr") for more information.