These selection helpers match variables according to a given pattern.
starts_with()
: Starts with a prefix.
ends_with()
: Ends with a suffix.
contains()
: Contains a literal string.
matches()
: Matches a regular expression.
num_range()
: Matches a numerical range like x01, x02, x03.
starts_with(match, ignore.case = TRUE, vars = NULL)ends_with(match, ignore.case = TRUE, vars = NULL)
contains(match, ignore.case = TRUE, vars = NULL)
matches(match, ignore.case = TRUE, perl = FALSE, vars = NULL)
num_range(prefix, range, width = NULL, vars = NULL)
A character vector. If length > 1, the union of the matches is taken.
If TRUE
, the default, ignores case when matching
names.
A character vector of variable names. If not supplied,
the variables are taken from the current selection context (as
established by functions like select()
or pivot_longer()
).
Should Perl-compatible regexps be used?
A prefix that starts the numeric range.
A sequence of integers, like 1:5
.
Optionally, the "width" of the numeric range. For example, a range of 2 gives "01", a range of three "001", etc.
Selection helpers can be used in functions like dplyr::select()
or tidyr::pivot_longer()
. Let's first attach the tidyverse:
library(tidyverse)# For better printing iris <- as_tibble(iris)
starts_with()
selects all variables matching a prefix and
ends_with()
matches a suffix:
iris %>% select(starts_with("Sepal")) #> # A tibble: 150 x 2 #> Sepal.Length Sepal.Width #> <dbl> <dbl> #> 1 5.1 3.5 #> 2 4.9 3 #> 3 4.7 3.2 #> 4 4.6 3.1 #> # ... with 146 more rowsiris %>% select(ends_with("Width")) #> # A tibble: 150 x 2 #> Sepal.Width Petal.Width #> <dbl> <dbl> #> 1 3.5 0.2 #> 2 3 0.2 #> 3 3.2 0.2 #> 4 3.1 0.2 #> # ... with 146 more rows
You can supply multiple prefixes or suffixes. Note how the order of variables depends on the order of the suffixes and prefixes:
iris %>% select(starts_with(c("Petal", "Sepal"))) #> # A tibble: 150 x 4 #> Petal.Length Petal.Width Sepal.Length Sepal.Width #> <dbl> <dbl> <dbl> <dbl> #> 1 1.4 0.2 5.1 3.5 #> 2 1.4 0.2 4.9 3 #> 3 1.3 0.2 4.7 3.2 #> 4 1.5 0.2 4.6 3.1 #> # ... with 146 more rowsiris %>% select(ends_with(c("Width", "Length"))) #> # A tibble: 150 x 4 #> Sepal.Width Petal.Width Sepal.Length Petal.Length #> <dbl> <dbl> <dbl> <dbl> #> 1 3.5 0.2 5.1 1.4 #> 2 3 0.2 4.9 1.4 #> 3 3.2 0.2 4.7 1.3 #> 4 3.1 0.2 4.6 1.5 #> # ... with 146 more rows
contains()
selects columns whose names contain a word:
iris %>% select(contains("al")) #> # A tibble: 150 x 4 #> Sepal.Length Sepal.Width Petal.Length Petal.Width #> <dbl> <dbl> <dbl> <dbl> #> 1 5.1 3.5 1.4 0.2 #> 2 4.9 3 1.4 0.2 #> 3 4.7 3.2 1.3 0.2 #> 4 4.6 3.1 1.5 0.2 #> # ... with 146 more rows
These helpers do not use regular expressions. To select with a
regexp use matches()
# [pt] is matched literally: iris %>% select(contains("[pt]al")) #> # A tibble: 150 x 0# [pt] is interpreted as a regular expression iris %>% select(matches("[pt]al")) #> # A tibble: 150 x 4 #> Sepal.Length Sepal.Width Petal.Length Petal.Width #> <dbl> <dbl> <dbl> <dbl> #> 1 5.1 3.5 1.4 0.2 #> 2 4.9 3 1.4 0.2 #> 3 4.7 3.2 1.3 0.2 #> 4 4.6 3.1 1.5 0.2 #> # ... with 146 more rows
starts_with()
selects all variables starting with a prefix. To
select a range, use num_range()
. Compare:
billboard %>% select(starts_with("wk")) #> # A tibble: 317 x 76 #> wk1 wk2 wk3 wk4 wk5 wk6 wk7 wk8 wk9 wk10 wk11 wk12 wk13 #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 87 82 72 77 87 94 99 NA NA NA NA NA NA #> 2 91 87 92 NA NA NA NA NA NA NA NA NA NA #> 3 81 70 68 67 66 57 54 53 51 51 51 51 47 #> 4 76 76 72 69 67 65 55 59 62 61 61 59 61 #> # ... with 313 more rows, and 63 more variables: wk14 <dbl>, wk15 <dbl>, #> # wk16 <dbl>, wk17 <dbl>, wk18 <dbl>, wk19 <dbl>, wk20 <dbl>, wk21 <dbl>, ...billboard %>% select(num_range("wk", 10:15)) #> # A tibble: 317 x 6 #> wk10 wk11 wk12 wk13 wk14 wk15 #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 NA NA NA NA NA NA #> 2 NA NA NA NA NA NA #> 3 51 51 51 47 44 38 #> 4 61 61 59 61 66 72 #> # ... with 313 more rows
The selection language page, which includes links to other selection helpers.