import_qiime_otu_tax: Import now legacy-format QIIME OTU table as a list of two matrices.

Description

Now a legacy-format, older versions of QIIME produced an OTU file that typically contains both OTU-abundance and taxonomic identity information in a tab-delimted table. If your file ends with the extension .biom, or if you happen to know that it is a biom-format file, or if you used default settings in a version of QIIME of 1.7 or greater, then YOU SHOULD USE THE BIOM-IMPORT FUNCTION instead, import_biom.

Usage

import_qiime_otu_tax(file, parseFunction = parse_taxonomy_qiime,
  verbose = TRUE, parallel = FALSE)

Arguments

file

(Required). The path to the qiime-formatted file you want to import into R. Can be compressed (e.g. .gz, etc.), though the details may be OS-specific. That is, Windows-beware.

parseFunction

(Optional). An optional custom function for parsing the character string that contains the taxonomic assignment of each OTU. The default parsing function is parse_taxonomy_qiime, specialized for splitting the ";"-delimited strings and also attempting to interpret greengenes prefixes, if any, as that is a common format of the taxonomy string produced by QIIME.

verbose

(Optional). A logical. Default is TRUE. Should progresss messages be catted to standard out?

parallel

(Optional). Logical. Should the parsing be performed in parallel?. Default is FALSE. Only a few steps are actually parallelized, and for most datasets it will actually be faster and more efficient to keep this set to FALSE. Also, to get any benefit at all, you will need to register a parallel ``backend'' through one of the backend packages supported by the foreach-package.

Value

A list of two matrices. $otutab contains the OTU Table as a numeric matrix, while $taxtab contains a character matrix of the taxonomy assignments.

Details

This function uses chunking to perform both the reading and parsing in blocks of optional size, thus constrain the peak memory usage. feature should make this importer accessible to machines with modest memory, but with the caveat that the full numeric matrix must be a manageable size at the end, too. In principle, the final tables will be large, but much more efficiently represented than the character-stored numbers. If total memory for storing the numeric matrix becomes problematic, a switch to a sparse matrix representation of the abundance -- which is typically well-suited to this data -- might provide a solution.

Examples

Run this code

otufile <- system.file("extdata", "GP_otu_table_rand_short.txt.gz", package="phyloseq")
 import_qiime_otu_tax(otufile)

Run the code above in your browser using DataLab