filter_taxa: Filter taxa with a list of conditions

Description

Filter taxa in a [taxonomy()] or [taxmap()] object with a series of conditions. Any variable name that appears in [all_names()] can be used as if it was a vector on its own. See [dplyr::filter()] for the inspiration for this function and more information. Calling the function using the `obj$filter_taxa(...)` style edits "obj" in place, unlike most R functions. However, calling the function using the `filter_taxa(obj, ...)` imitates R's traditional copy-on-modify semantics, so "obj" would not be changed; instead a changed version would be returned, like most R functions.


filter_taxa(obj, ..., subtaxa = FALSE, supertaxa = FALSE,
  drop_obs = TRUE, reassign_obs = TRUE, reassign_taxa = TRUE,
  invert = FALSE, keep_order = TRUE)
obj$filter_taxa(..., subtaxa = FALSE, supertaxa = FALSE,
  drop_obs = TRUE, reassign_obs = TRUE, reassign_taxa = TRUE,
  invert = FALSE, keep_order = TRUE)

Value

An object of type [taxonomy()] or [taxmap()]

Arguments

obj: An object of class [taxonomy()] or [taxmap()]
...: One or more filtering conditions. Any variable name that appears in [all_names()] can be used as if it was a vector on its own. Each filtering condition must resolve to one of three things: * `character`: One or more taxon IDs contained in `obj$edge_list$to` * `integer`: One or more row indexes of `obj$edge_list` * `logical`: A `TRUE`/`FALSE` vector of length equal to the number of rows in `obj$edge_list` * `NULL`: ignored
subtaxa: (`logical` or `numeric` of length 1) If `TRUE`, include subtaxa of taxa passing the filter. Positive numbers indicate the number of ranks below the target taxa to return. `0` is equivalent to `FALSE`. Negative numbers are equivalent to `TRUE`.
supertaxa: (`logical` or `numeric` of length 1) If `TRUE`, include supertaxa of taxa passing the filter. Positive numbers indicate the number of ranks above the target taxa to return. `0` is equivalent to `FALSE`. Negative numbers are equivalent to `TRUE`.
drop_obs: (`logical`) This option only applies to [taxmap()] objects. If `FALSE`, include observations (i.e. user-defined data in `obj$data`) even if the taxon they are assigned to is filtered out. Observations assigned to removed taxa will be assigned to NA. This option can be either simply `TRUE`/`FALSE`, meaning that all data sets will be treated the same, or a logical vector can be supplied with names corresponding one or more data sets in `obj$data`. For example, `c(abundance = FALSE, stats = TRUE)` would include observations whose taxon was filtered out in `obj$data$abundance`, but not in `obj$data$stats`. See the `reassign_obs` option below for further complications.
reassign_obs: (`logical` of length 1) This option only applies to [taxmap()] objects. If `TRUE`, observations (i.e. user-defined data in `obj$data`) assigned to removed taxa will be reassigned to the closest supertaxon that passed the filter. If there are no supertaxa of such an observation that passed the filter, they will be filtered out if `drop_obs` is `TRUE`. This option can be either simply `TRUE`/`FALSE`, meaning that all data sets will be treated the same, or a logical vector can be supplied with names corresponding one or more data sets in `obj$data`. For example, `c(abundance = TRUE, stats = FALSE)` would reassign observations in `obj$data$abundance`, but not in `obj$data$stats`.
reassign_taxa: (`logical` of length 1) If `TRUE`, subtaxa of removed taxa will be reassigned to the closest supertaxon that passed the filter. This is useful for removing intermediate levels of a taxonomy.
invert: (`logical` of length 1) If `TRUE`, do NOT include the selection. This is different than just replacing a `==` with a `!=` because this option negates the selection after taking into account the `subtaxa` and `supertaxa` options. This is useful for removing a taxon and all its subtaxa for example.
keep_order: (`logical` of length 1) If `TRUE`, keep relative order of taxa not filtered out. For example, the result of `filter_taxa(ex_taxmap, 1:3)` and `filter_taxa(ex_taxmap, 3:1)` would be the same. Does not affect dataset order, only taxon order. This is useful for maintaining order correspondence with a dataset that has one value per taxon.

Examples

Run this code

# Filter by index
filter_taxa(ex_taxmap, 1:3)

# Filter by taxon ID
filter_taxa(ex_taxmap, c("b", "c", "d"))

# Fiter by TRUE/FALSE
filter_taxa(ex_taxmap, taxon_names == "Plantae", subtaxa = TRUE)
filter_taxa(ex_taxmap, n_obs > 3)
filter_taxa(ex_taxmap, ! taxon_ranks %in% c("species", "genus"))
filter_taxa(ex_taxmap, taxon_ranks == "genus", n_obs > 1)

# Filter by an observation characteristic
dangerous_taxa <- sapply(ex_taxmap$obs("info"),
                         function(i) any(ex_taxmap$data$info$dangerous[i]))
filter_taxa(ex_taxmap, dangerous_taxa)

# Include supertaxa
filter_taxa(ex_taxmap, 12, supertaxa = TRUE)
filter_taxa(ex_taxmap, 12, supertaxa = 2)

# Include subtaxa
filter_taxa(ex_taxmap, 1, subtaxa = TRUE)
filter_taxa(ex_taxmap, 1, subtaxa = 2)

# Dont remove rows in user-defined data corresponding to removed taxa
filter_taxa(ex_taxmap, 2, drop_obs = FALSE)
filter_taxa(ex_taxmap, 2, drop_obs = c(info = FALSE))

# Remove a taxon and it subtaxa
filter_taxa(ex_taxmap, taxon_names == "Mammalia",
            subtaxa = TRUE, invert = TRUE)

Run the code above in your browser using DataLab

Description

Value

Arguments

See Also

Examples