phyloseq-package
.
Originally, QIIME produced its own custom format table
that contained both OTU-abundance
and taxonomic identity information.
This function is still included in phyloseq mainly to accommodate these
now-outdated files. Recent versions of QIIME store output in the
biom-format, an emerging file format standard for microbiome data.
If your data is in the biom-format, if it ends with a .biom
file name extension, then you should use the import_biom
function instead.import_qiime(otufilename = NULL, mapfilename = NULL, treefilename = NULL,
refseqfilename = NULL, refseqFunction = readDNAStringSet,
refseqArgs = NULL, parseFunction = parse_taxonomy_qiime, verbose = TRUE,
...)
NULL
.NULL
.NULL
.
A file representing a phylogenetic tree
or a phylo
object.
Files can be NEXUS or Newick format.
See read_tree
for more details.
Also, if using a recent release of the GreenGenes database tree,
try the read_tree_greengenes
function --
this should solve some issues specific to importing that tree.
If provided, the tree should have the same OTUs/tip-labels
as the OTUs in the other files.
Any taxa or samples missing in one of the files is removed from all.
As an example from the QIIME pipeline,
this tree would be a tree of the representative 16S rRNA sequences from each OTU
cluster, with the number of leaves/tips equal to the number of taxa/species/OTUs,
or the complete reference database tree that contains the OTU identifiers
of every OTU in your abundance table.
Note that this argument can be a tree object (phylo
-class)
for cases where the tree has been --- or needs to be --- imported separately,
as in the case of the GreenGenes tree mentioned earlier (code{read_tree_greengenes}).NULL
.
The file path of the biological sequence file that contains at a minimum
a sequence for each OTU in the dataset.
Alternatively, you may provide an already-imported
XStringSet
object that satisfies this condition.
In either case, the names
of each OTU need to match exactly the
taxa_names
of the other components of your data.
If this is not the case, for example if the data file is a FASTA format but
contains additional information after the OTU name in each sequence header,
then some additional parsing is necessary,
which you can either perform separately before calling this function,
or describe explicitly in a custom function provided in the (next) argument,
refseqFunction
.
Note that the XStringSet
class can represent any
arbitrary sequence, including user-defined subclasses, but is most-often
used to represent RNA, DNA, or amino acid sequences.
The only constraint is that this special list of sequences
has exactly one named element for each OTU in the dataset.readDNAStringSet
,
which expects to read a fasta-formatted DNA sequence file.
If your reference sequences for each OTU are amino acid, RNA, or something else,
then you will need to specify a different function here.
This is the function used to read the file connection provided as the
the previous argument, refseqfilename
.
This argument is ignored if refseqfilename
is already a
XStringSet
class.NULL
.
Additional arguments to refseqFunction
.
See XStringSet-io
for details about
additional arguments to the standard read functions in the Biostrings package.parse_taxonomy_qiime
,
specialized for splitting the ";"
-delimited strings and also
attempting to interpret greengenes prefixes, if any, as that is a common
format of the taxonomy string produced by QIIME.read_tree
phyloseq-class
object.sample_data-class
component data type in the phyloseq-package.
QIIME may also produce a
phylogenetic tree with a tip for each OTU, which can also be imported
specified here or imported separately using read_tree
.See
The different files useful for import to phyloseq are not collocated in a typical run of the QIIME pipeline. See the main phyloseq vignette for an example of where ot find the relevant files in the output directory.
``QIIME allows analysis of high-throughput community sequencing data.'' J Gregory Caporaso, Justin Kuczynski, Jesse Stombaugh, Kyle Bittinger, Frederic D Bushman, Elizabeth K Costello, Noah Fierer, Antonio Gonzalez Pena, Julia K Goodrich, Jeffrey I Gordon, Gavin A Huttley, Scott T Kelley, Dan Knights, Jeremy E Koenig, Ruth E Ley, Catherine A Lozupone, Daniel McDonald, Brian D Muegge, Meg Pirrung, Jens Reeder, Joel R Sevinsky, Peter J Turnbaugh, William A Walters, Jeremy Widmann, Tanya Yatsunenko, Jesse Zaneveld and Rob Knight; Nature Methods, 2010; doi:10.1038/nmeth.f.303
phyloseq
otufile <- system.file("extdata", "GP_otu_table_rand_short.txt.gz", package="phyloseq")
mapfile <- system.file("extdata", "master_map.txt", package="phyloseq")
trefile <- system.file("extdata", "GP_tree_rand_short.newick.gz", package="phyloseq")
import_qiime(otufile, mapfile, trefile)
Run the code above in your browser using DataLab