Learn R Programming

tidyselect (version 1.2.1)

starts_with: Select variables that match a pattern

Description

These selection helpers match variables according to a given pattern.

  • starts_with(): Starts with an exact prefix.

  • ends_with(): Ends with an exact suffix.

  • contains(): Contains a literal string.

  • matches(): Matches a regular expression.

  • num_range(): Matches a numerical range like x01, x02, x03.

Usage

starts_with(match, ignore.case = TRUE, vars = NULL)

ends_with(match, ignore.case = TRUE, vars = NULL)

contains(match, ignore.case = TRUE, vars = NULL)

matches(match, ignore.case = TRUE, perl = FALSE, vars = NULL)

num_range(prefix, range, suffix = "", width = NULL, vars = NULL)

Arguments

match

A character vector. If length > 1, the union of the matches is taken.

For starts_with(), ends_with(), and contains() this is an exact match. For matches() this is a regular expression, and can be a stringr pattern.

ignore.case

If TRUE, the default, ignores case when matching names.

vars

A character vector of variable names. If not supplied, the variables are taken from the current selection context (as established by functions like select() or pivot_longer()).

perl

Should Perl-compatible regexps be used?

prefix, suffix

A prefix/suffix added before/after the numeric range.

range

A sequence of integers, like 1:5.

width

Optionally, the "width" of the numeric range. For example, a range of 2 gives "01", a range of three "001", etc.

Examples

Selection helpers can be used in functions like dplyr::select() or tidyr::pivot_longer(). Let's first attach the tidyverse:

library(tidyverse)

# For better printing iris <- as_tibble(iris)

starts_with() selects all variables matching a prefix and ends_with() matches a suffix:

iris %>% select(starts_with("Sepal"))
#> # A tibble: 150 x 2
#>   Sepal.Length Sepal.Width
#>          <dbl>       <dbl>
#> 1          5.1         3.5
#> 2          4.9         3  
#> 3          4.7         3.2
#> 4          4.6         3.1
#> # i 146 more rows

iris %>% select(ends_with("Width")) #> # A tibble: 150 x 2 #> Sepal.Width Petal.Width #> <dbl> <dbl> #> 1 3.5 0.2 #> 2 3 0.2 #> 3 3.2 0.2 #> 4 3.1 0.2 #> # i 146 more rows

You can supply multiple prefixes or suffixes. Note how the order of variables depends on the order of the suffixes and prefixes:

iris %>% select(starts_with(c("Petal", "Sepal")))
#> # A tibble: 150 x 4
#>   Petal.Length Petal.Width Sepal.Length Sepal.Width
#>          <dbl>       <dbl>        <dbl>       <dbl>
#> 1          1.4         0.2          5.1         3.5
#> 2          1.4         0.2          4.9         3  
#> 3          1.3         0.2          4.7         3.2
#> 4          1.5         0.2          4.6         3.1
#> # i 146 more rows

iris %>% select(ends_with(c("Width", "Length"))) #> # A tibble: 150 x 4 #> Sepal.Width Petal.Width Sepal.Length Petal.Length #> <dbl> <dbl> <dbl> <dbl> #> 1 3.5 0.2 5.1 1.4 #> 2 3 0.2 4.9 1.4 #> 3 3.2 0.2 4.7 1.3 #> 4 3.1 0.2 4.6 1.5 #> # i 146 more rows

contains() selects columns whose names contain a word:

iris %>% select(contains("al"))
#> # A tibble: 150 x 4
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width
#>          <dbl>       <dbl>        <dbl>       <dbl>
#> 1          5.1         3.5          1.4         0.2
#> 2          4.9         3            1.4         0.2
#> 3          4.7         3.2          1.3         0.2
#> 4          4.6         3.1          1.5         0.2
#> # i 146 more rows

starts_with(), ends_with(), and contains() do not use regular expressions. To select with a regexp use matches():

# [pt] is matched literally:
iris %>% select(contains("[pt]al"))
#> # A tibble: 150 x 0

# [pt] is interpreted as a regular expression iris %>% select(matches("[pt]al")) #> # A tibble: 150 x 4 #> Sepal.Length Sepal.Width Petal.Length Petal.Width #> <dbl> <dbl> <dbl> <dbl> #> 1 5.1 3.5 1.4 0.2 #> 2 4.9 3 1.4 0.2 #> 3 4.7 3.2 1.3 0.2 #> 4 4.6 3.1 1.5 0.2 #> # i 146 more rows

starts_with() selects all variables starting with a prefix. To select a range, use num_range(). Compare:

billboard %>% select(starts_with("wk"))
#> # A tibble: 317 x 76
#>     wk1   wk2   wk3   wk4   wk5   wk6   wk7   wk8   wk9  wk10  wk11  wk12  wk13
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1    87    82    72    77    87    94    99    NA    NA    NA    NA    NA    NA
#> 2    91    87    92    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
#> 3    81    70    68    67    66    57    54    53    51    51    51    51    47
#> 4    76    76    72    69    67    65    55    59    62    61    61    59    61
#> # i 313 more rows
#> # i 63 more variables: wk14 <dbl>, wk15 <dbl>, wk16 <dbl>, wk17 <dbl>,
#> #   wk18 <dbl>, wk19 <dbl>, wk20 <dbl>, wk21 <dbl>, ...

billboard %>% select(num_range("wk", 10:15)) #> # A tibble: 317 x 6 #> wk10 wk11 wk12 wk13 wk14 wk15 #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 NA NA NA NA NA NA #> 2 NA NA NA NA NA NA #> 3 51 51 51 47 44 38 #> 4 61 61 59 61 66 72 #> # i 313 more rows

See Also

The selection language page, which includes links to other selection helpers.