These selection helpers match variables according to a given pattern.
starts_with()
: Starts with an exact prefix.
ends_with()
: Ends with an exact suffix.
contains()
: Contains a literal string.
matches()
: Matches a regular expression.
num_range()
: Matches a numerical range like x01, x02, x03.
starts_with(match, ignore.case = TRUE, vars = NULL)ends_with(match, ignore.case = TRUE, vars = NULL)
contains(match, ignore.case = TRUE, vars = NULL)
matches(match, ignore.case = TRUE, perl = FALSE, vars = NULL)
num_range(prefix, range, suffix = "", width = NULL, vars = NULL)
A character vector. If length > 1, the union of the matches is taken.
For starts_with()
, ends_with()
, and contains()
this is an exact
match. For matches()
this is a regular expression, and can be a
stringr pattern.
If TRUE
, the default, ignores case when matching
names.
A character vector of variable names. If not supplied,
the variables are taken from the current selection context (as
established by functions like select()
or pivot_longer()
).
Should Perl-compatible regexps be used?
A prefix/suffix added before/after the numeric range.
A sequence of integers, like 1:5
.
Optionally, the "width" of the numeric range. For example, a range of 2 gives "01", a range of three "001", etc.
Selection helpers can be used in functions like dplyr::select()
or tidyr::pivot_longer()
. Let's first attach the tidyverse:
library(tidyverse)# For better printing
iris <- as_tibble(iris)
starts_with()
selects all variables matching a prefix and
ends_with()
matches a suffix:
iris %>% select(starts_with("Sepal"))
#> # A tibble: 150 x 2
#> Sepal.Length Sepal.Width
#> <dbl> <dbl>
#> 1 5.1 3.5
#> 2 4.9 3
#> 3 4.7 3.2
#> 4 4.6 3.1
#> # i 146 more rowsiris %>% select(ends_with("Width"))
#> # A tibble: 150 x 2
#> Sepal.Width Petal.Width
#> <dbl> <dbl>
#> 1 3.5 0.2
#> 2 3 0.2
#> 3 3.2 0.2
#> 4 3.1 0.2
#> # i 146 more rows
You can supply multiple prefixes or suffixes. Note how the order of variables depends on the order of the suffixes and prefixes:
iris %>% select(starts_with(c("Petal", "Sepal")))
#> # A tibble: 150 x 4
#> Petal.Length Petal.Width Sepal.Length Sepal.Width
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1.4 0.2 5.1 3.5
#> 2 1.4 0.2 4.9 3
#> 3 1.3 0.2 4.7 3.2
#> 4 1.5 0.2 4.6 3.1
#> # i 146 more rowsiris %>% select(ends_with(c("Width", "Length")))
#> # A tibble: 150 x 4
#> Sepal.Width Petal.Width Sepal.Length Petal.Length
#> <dbl> <dbl> <dbl> <dbl>
#> 1 3.5 0.2 5.1 1.4
#> 2 3 0.2 4.9 1.4
#> 3 3.2 0.2 4.7 1.3
#> 4 3.1 0.2 4.6 1.5
#> # i 146 more rows
contains()
selects columns whose names contain a word:
iris %>% select(contains("al"))
#> # A tibble: 150 x 4
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> <dbl> <dbl> <dbl> <dbl>
#> 1 5.1 3.5 1.4 0.2
#> 2 4.9 3 1.4 0.2
#> 3 4.7 3.2 1.3 0.2
#> 4 4.6 3.1 1.5 0.2
#> # i 146 more rows
starts_with()
, ends_with()
, and contains()
do not use regular expressions. To select with a
regexp use matches()
:
# [pt] is matched literally:
iris %>% select(contains("[pt]al"))
#> # A tibble: 150 x 0# [pt] is interpreted as a regular expression
iris %>% select(matches("[pt]al"))
#> # A tibble: 150 x 4
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> <dbl> <dbl> <dbl> <dbl>
#> 1 5.1 3.5 1.4 0.2
#> 2 4.9 3 1.4 0.2
#> 3 4.7 3.2 1.3 0.2
#> 4 4.6 3.1 1.5 0.2
#> # i 146 more rows
starts_with()
selects all variables starting with a prefix. To
select a range, use num_range()
. Compare:
billboard %>% select(starts_with("wk"))
#> # A tibble: 317 x 76
#> wk1 wk2 wk3 wk4 wk5 wk6 wk7 wk8 wk9 wk10 wk11 wk12 wk13
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 87 82 72 77 87 94 99 NA NA NA NA NA NA
#> 2 91 87 92 NA NA NA NA NA NA NA NA NA NA
#> 3 81 70 68 67 66 57 54 53 51 51 51 51 47
#> 4 76 76 72 69 67 65 55 59 62 61 61 59 61
#> # i 313 more rows
#> # i 63 more variables: wk14 <dbl>, wk15 <dbl>, wk16 <dbl>, wk17 <dbl>,
#> # wk18 <dbl>, wk19 <dbl>, wk20 <dbl>, wk21 <dbl>, ...billboard %>% select(num_range("wk", 10:15))
#> # A tibble: 317 x 6
#> wk10 wk11 wk12 wk13 wk14 wk15
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 NA NA NA NA NA NA
#> 2 NA NA NA NA NA NA
#> 3 51 51 51 47 44 38
#> 4 61 61 59 61 66 72
#> # i 313 more rows
The selection language page, which includes links to other selection helpers.