This selection helper selects the variables for which a
function returns TRUE
.
where(fn)
A function that returns TRUE
or FALSE
(technically, a
predicate function). Can also be a purrr-like formula.
Selection helpers can be used in functions like dplyr::select()
or tidyr::pivot_longer()
. Let's first attach the tidyverse:
library(tidyverse)# For better printing iris <- as_tibble(iris)
where()
takes a function and returns all variables for which the
function returns TRUE
:
is.factor(iris[[4]]) #> [1] FALSEis.factor(iris[[5]]) #> [1] TRUE
iris %>% select(where(is.factor)) #> # A tibble: 150 x 1 #> Species #> <fct> #> 1 setosa #> 2 setosa #> 3 setosa #> 4 setosa #> # ... with 146 more rows
is.numeric(iris[[4]]) #> [1] TRUE
is.numeric(iris[[5]]) #> [1] FALSE
iris %>% select(where(is.numeric)) #> # A tibble: 150 x 4 #> Sepal.Length Sepal.Width Petal.Length Petal.Width #> <dbl> <dbl> <dbl> <dbl> #> 1 5.1 3.5 1.4 0.2 #> 2 4.9 3 1.4 0.2 #> 3 4.7 3.2 1.3 0.2 #> 4 4.6 3.1 1.5 0.2 #> # ... with 146 more rows
You can use purrr-like formulas as a shortcut for creating a function on the spot. These expressions are equivalent:
iris %>% select(where(is.numeric)) #> # A tibble: 150 x 4 #> Sepal.Length Sepal.Width Petal.Length Petal.Width #> <dbl> <dbl> <dbl> <dbl> #> 1 5.1 3.5 1.4 0.2 #> 2 4.9 3 1.4 0.2 #> 3 4.7 3.2 1.3 0.2 #> 4 4.6 3.1 1.5 0.2 #> # ... with 146 more rows
iris %>% select(where(function(x) is.numeric(x))) #> # A tibble: 150 x 4 #> Sepal.Length Sepal.Width Petal.Length Petal.Width #> <dbl> <dbl> <dbl> <dbl> #> 1 5.1 3.5 1.4 0.2 #> 2 4.9 3 1.4 0.2 #> 3 4.7 3.2 1.3 0.2 #> 4 4.6 3.1 1.5 0.2 #> # ... with 146 more rows
iris %>% select(where(~ is.numeric(.x))) #> # A tibble: 150 x 4 #> Sepal.Length Sepal.Width Petal.Length Petal.Width #> <dbl> <dbl> <dbl> <dbl> #> 1 5.1 3.5 1.4 0.2 #> 2 4.9 3 1.4 0.2 #> 3 4.7 3.2 1.3 0.2 #> 4 4.6 3.1 1.5 0.2 #> # ... with 146 more rows
The shorthand is useful for adding logic inline. Here we select all numeric variables whose mean is greater than 3.5:
iris %>% select(where(~ is.numeric(.x) && mean(.x) > 3.5)) #> # A tibble: 150 x 2 #> Sepal.Length Petal.Length #> <dbl> <dbl> #> 1 5.1 1.4 #> 2 4.9 1.4 #> 3 4.7 1.3 #> 4 4.6 1.5 #> # ... with 146 more rows