The ConTax data sets are tables in the FASTA format (see readFasta),
where the Header column contains texts according to a strict format.
The header always starts with a short text, a Tag, which is a unique identifier for every sequence.
The function getTag will extract this from the header.
After the Tag follows one or more tokens. One of these tokens must be a string with the
following format:
"k__<...>;p__<...>;c__<...>;o__<...>;f__<...>;g__<...>;"
where <...> is some proper text. Here is an example of a proper string:
"k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Staphylococcaceae;g__Staphylococcus;"
The functions getDomain, ..., getGenus extracts the
corresponding information from the header. getTaxonomy
combines all taxonomy extractors, combines these in a table
and imputes missing taxa with parent taxa.