Learn R Programming

tidyselect (version 1.0.0)

faq-external-vector: FAQ - Note: Using an external vector in selections is ambiguous

Description

Ambiguity between columns and external variables

With selecting functions like dplyr::select() or tidyr::pivot_longer(), you can refer to variables by name:

mtcars %>% select(cyl, am, vs)
#> # A tibble: 32 x 3
#>     cyl    am    vs
#>   <dbl> <dbl> <dbl>
#> 1     6     1     0
#> 2     6     1     0
#> 3     4     1     1
#> 4     6     0     1
#> # ... with 28 more rows

mtcars %>% select(mpg:disp) #> # A tibble: 32 x 3 #> mpg cyl disp #> <dbl> <dbl> <dbl> #> 1 21 6 160 #> 2 21 6 160 #> 3 22.8 4 108 #> 4 21.4 6 258 #> # ... with 28 more rows

For historical reasons, it is also possible to refer an external vector of variable names. You get the correct result, but with a note informing you that selecting with an external variable is ambiguous because it is not clear whether you want a data frame column or an external object.

vars <- c("cyl", "am", "vs")
result <- mtcars %>% select(vars)
#> Note: Using an external vector in selections is ambiguous.
#> i Use `all_of(vars)` instead of `vars` to silence this message.
#> i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
#> This message is displayed once per session.

This note will become a warning in the future, and then an error. We have decided to deprecate this particular approach to using external vectors because they introduce ambiguity. Imagine that the data frame contains a column with the same name as your external variable.

some_df <- mtcars
some_df$vars <- 1:nrow(mtcars)

These are very different objects but it isn<U+2019>t a problem if the context forces you to be specific about where to find vars:

vars
#> [1] "cyl" "am"  "vs"

some_df$vars #> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 #> [29] 29 30 31 32

In a selection context however, the column wins:

some_df %>% select(vars)
#> # A tibble: 32 x 1
#>    vars
#>   <int>
#> 1     1
#> 2     2
#> 3     3
#> 4     4
#> # ... with 28 more rows

Fixing the ambiguity

To make your selection code more robust and silence the message, use all_of() to force the external vector:

some_df %>% select(all_of(vars))
#> # A tibble: 32 x 3
#>     cyl    am    vs
#>   <dbl> <dbl> <dbl>
#> 1     6     1     0
#> 2     6     1     0
#> 3     4     1     1
#> 4     6     0     1
#> # ... with 28 more rows

For more information or if you have comments about this, please see the Github issue tracking the deprecation process.

Arguments