filter_ambiguous_taxa: Filter ambiguous taxon names

Description

Filter out taxa with ambiguous names, such as "unknown" or "uncultured". NOTE: some parameters of this function are passed to filter_taxa with the "invert" option set to TRUE. Works the same way as filter_taxa for the most part.

Usage

filter_ambiguous_taxa(
  obj,
  unknown = TRUE,
  uncultured = TRUE,
  name_regex = ".",
  ignore_case = TRUE,
  subtaxa = FALSE,
  drop_obs = TRUE,
  reassign_obs = TRUE,
  reassign_taxa = TRUE
)

Value

A taxmap object

Arguments

obj: A taxmap object
unknown: If TRUE, Remove taxa with names the suggest they are placeholders for unknown taxa (e.g. "unknown ...").
uncultured: If TRUE, Remove taxa with names the suggest they are assigned to uncultured organisms (e.g. "uncultured ...").
name_regex: The regex code to match a valid character in a taxon name. For example, "[a-z]" would mean taxon names can only be lower case letters.
ignore_case: If TRUE, dont consider the case of the text when determining a match.
subtaxa: (`logical` or `numeric` of length 1) If `TRUE`, include subtaxa of taxa passing the filter. Positive numbers indicate the number of ranks below the target taxa to return. `0` is equivalent to `FALSE`. Negative numbers are equivalent to `TRUE`.
drop_obs: (`logical`) This option only applies to [taxmap()] objects. If `FALSE`, include observations (i.e. user-defined data in `obj$data`) even if the taxon they are assigned to is filtered out. Observations assigned to removed taxa will be assigned to NA. This option can be either simply `TRUE`/`FALSE`, meaning that all data sets will be treated the same, or a logical vector can be supplied with names corresponding one or more data sets in `obj$data`. For example, `c(abundance = FALSE, stats = TRUE)` would include observations whose taxon was filtered out in `obj$data$abundance`, but not in `obj$data$stats`. See the `reassign_obs` option below for further complications.
reassign_obs: (`logical` of length 1) This option only applies to [taxmap()] objects. If `TRUE`, observations (i.e. user-defined data in `obj$data`) assigned to removed taxa will be reassigned to the closest supertaxon that passed the filter. If there are no supertaxa of such an observation that passed the filter, they will be filtered out if `drop_obs` is `TRUE`. This option can be either simply `TRUE`/`FALSE`, meaning that all data sets will be treated the same, or a logical vector can be supplied with names corresponding one or more data sets in `obj$data`. For example, `c(abundance = TRUE, stats = FALSE)` would reassign observations in `obj$data$abundance`, but not in `obj$data$stats`.
reassign_taxa: (`logical` of length 1) If `TRUE`, subtaxa of removed taxa will be reassigned to the closest supertaxon that passed the filter. This is useful for removing intermediate levels of a taxonomy.

Details

If you encounter a taxon name that represents an ambiguous taxon that is not filtered out by this function, let us know and we will add it.

Examples

Run this code

obj <- parse_tax_data(c("Plantae;Solanaceae;Solanum;lycopersicum",
                        "Plantae;Solanaceae;Solanum;tuberosum",
                        "Plantae;Solanaceae;Solanum;unknown",
                        "Plantae;Solanaceae;Solanum;uncultured",
                        "Plantae;UNIDENTIFIED"))
filter_ambiguous_taxa(obj)

Run the code above in your browser using DataLab