Filter out taxa with ambiguous names, such as "unknown" or "uncultured".
NOTE: some parameters of this function are passed to
filter_taxa
with the "invert" option set to TRUE
.
Works the same way as filter_taxa
for the most part.
filter_ambiguous_taxa(
obj,
unknown = TRUE,
uncultured = TRUE,
name_regex = ".",
ignore_case = TRUE,
subtaxa = FALSE,
drop_obs = TRUE,
reassign_obs = TRUE,
reassign_taxa = TRUE
)
A taxmap
object
A taxmap
object
If TRUE
, Remove taxa with names the suggest they are
placeholders for unknown taxa (e.g. "unknown ...").
If TRUE
, Remove taxa with names the suggest they are
assigned to uncultured organisms (e.g. "uncultured ...").
The regex code to match a valid character in a taxon name. For example, "[a-z]" would mean taxon names can only be lower case letters.
If TRUE
, dont consider the case of the text when
determining a match.
(`logical` or `numeric` of length 1) If `TRUE`, include subtaxa of taxa passing the filter. Positive numbers indicate the number of ranks below the target taxa to return. `0` is equivalent to `FALSE`. Negative numbers are equivalent to `TRUE`.
(`logical`) This option only applies to [taxmap()] objects.
If `FALSE`, include observations (i.e. user-defined data in `obj$data`)
even if the taxon they are assigned to is filtered out. Observations
assigned to removed taxa will be assigned to NA
. This option can be
either simply `TRUE`/`FALSE`, meaning that all data sets will be treated
the same, or a logical vector can be supplied with names corresponding one
or more data sets in `obj$data`. For example, `c(abundance = FALSE, stats =
TRUE)` would include observations whose taxon was filtered out in
`obj$data$abundance`, but not in `obj$data$stats`. See the `reassign_obs`
option below for further complications.
(`logical` of length 1) This option only applies to [taxmap()] objects. If `TRUE`, observations (i.e. user-defined data in `obj$data`) assigned to removed taxa will be reassigned to the closest supertaxon that passed the filter. If there are no supertaxa of such an observation that passed the filter, they will be filtered out if `drop_obs` is `TRUE`. This option can be either simply `TRUE`/`FALSE`, meaning that all data sets will be treated the same, or a logical vector can be supplied with names corresponding one or more data sets in `obj$data`. For example, `c(abundance = TRUE, stats = FALSE)` would reassign observations in `obj$data$abundance`, but not in `obj$data$stats`.
(`logical` of length 1) If `TRUE`, subtaxa of removed taxa will be reassigned to the closest supertaxon that passed the filter. This is useful for removing intermediate levels of a taxonomy.
If you encounter a taxon name that represents an ambiguous taxon that is not filtered out by this function, let us know and we will add it.
obj <- parse_tax_data(c("Plantae;Solanaceae;Solanum;lycopersicum",
"Plantae;Solanaceae;Solanum;tuberosum",
"Plantae;Solanaceae;Solanum;unknown",
"Plantae;Solanaceae;Solanum;uncultured",
"Plantae;UNIDENTIFIED"))
filter_ambiguous_taxa(obj)
Run the code above in your browser using DataLab