This is a method for the tidyr pivot_longer()
generic. It is translated to
data.table::melt()
# S3 method for dtplyr_step
pivot_longer(
data,
cols,
names_to = "name",
names_prefix = NULL,
names_sep = NULL,
names_pattern = NULL,
names_ptypes = NULL,
names_transform = NULL,
names_repair = "check_unique",
values_to = "value",
values_drop_na = FALSE,
values_ptypes = NULL,
values_transform = NULL,
...
)
A lazy_dt()
.
<tidy-select
> Columns to pivot into
longer format.
A character vector specifying the new column or columns to
create from the information stored in the column names of data
specified
by cols
.
If length 0, or if NULL
is supplied, no columns will be created.
If length 1, a single column will be created which will contain the
column names specified by cols
.
If length >1, multiple columns will be created. In this case, one of
names_sep
or names_pattern
must be supplied to specify how the
column names should be split. There are also two additional character
values you can take advantage of:
NA
will discard the corresponding component of the column name.
".value"
indicates that the corresponding component of the column
name defines the name of the output column containing the cell values,
overriding values_to
entirely.
A regular expression used to remove matching text from the start of each variable name.
If names_to
contains multiple values,
these arguments control how the column name is broken up.
names_sep
takes the same specification as separate()
, and can either
be a numeric vector (specifying positions to break on), or a single string
(specifying a regular expression to split on).
names_pattern
takes the same specification as extract()
, a regular
expression containing matching groups (()
).
If these arguments do not give you enough control, use
pivot_longer_spec()
to create a spec object and process manually as
needed.
Not currently supported by dtplyr.
What happens if the output has invalid column names?
The default, "check_unique"
is to error if the columns are duplicated.
Use "minimal"
to allow duplicates in the output, or "unique"
to
de-duplicated by adding numeric suffixes. See vctrs::vec_as_names()
for more options.
A string specifying the name of the column to create
from the data stored in cell values. If names_to
is a character
containing the special .value
sentinel, this value will be ignored,
and the name of the value column will be derived from part of the
existing column names.
If TRUE
, will drop rows that contain only NA
s
in the value_to
column. This effectively converts explicit missing values
to implicit missing values, and should generally be used only when missing
values in data
were created by its structure.
Additional arguments passed on to methods.
library(tidyr)
# Simplest case where column names are character data
relig_income_dt <- lazy_dt(relig_income)
relig_income_dt %>%
pivot_longer(!religion, names_to = "income", values_to = "count")
# Slightly more complex case where columns have common prefix,
# and missing missings are structural so should be dropped.
billboard_dt <- lazy_dt(billboard)
billboard %>%
pivot_longer(
cols = starts_with("wk"),
names_to = "week",
names_prefix = "wk",
values_to = "rank",
values_drop_na = TRUE
)
# Multiple variables stored in column names
lazy_dt(who) %>%
pivot_longer(
cols = new_sp_m014:newrel_f65,
names_to = c("diagnosis", "gender", "age"),
names_pattern = "new_?(.*)_(.)(.*)",
values_to = "count"
)
# Multiple observations per row
anscombe_dt <- lazy_dt(anscombe)
anscombe_dt %>%
pivot_longer(
everything(),
names_to = c(".value", "set"),
names_pattern = "(.)(.)"
)
Run the code above in your browser using DataLab