import_biom(BIOMfilename,
treefilename=NULL, refseqfilename=NULL, refseqFunction=readDNAStringSet, refseqArgs=NULL,
parseFunction=parse_taxonomy_default, parallel=FALSE, version=1.0, ...)
import_biom
, and then ``merge'' the remaining data after you have
imported with other tools using the relatively general-purpose data
merging function called merge_phyloseq
.NULL
.
A file representing a phylogenetic tree
or a phylo
object.
Files can be NEXUS or Newick format.
See read_tree
for more details.
Also, if using a recent release of the GreenGenes database tree,
try the read_tree_greengenes
function --
this should solve some issues specific to importing that tree.
If provided, the tree should have the same OTUs/tip-labels
as the OTUs in the other files.
Any taxa or samples missing in one of the files is removed from all.
As an example from the QIIME pipeline,
this tree would be a tree of the representative 16S rRNA sequences from each OTU
cluster, with the number of leaves/tips equal to the number of taxa/species/OTUs,
or the complete reference database tree that contains the OTU identifiers
of every OTU in your abundance table.
Note that this argument can be a tree object (phylo
-class)
for cases where the tree has been --- or needs to be --- imported separately,
as in the case of the GreenGenes tree mentioned earlier (code{read_tree_greengenes}).NULL
.
The file path of the biological sequence file that contains at a minimum
a sequence for each OTU in the dataset.
Alternatively, you may provide an already-imported
XStringSet
object that satisfies this condition.
In either case, the names
of each OTU need to match exactly the
taxa_names
of the other components of your data.
If this is not the case, for example if the data file is a FASTA format but
contains additional information after the OTU name in each sequence header,
then some additional parsing is necessary,
which you can either perform separately before calling this function,
or describe explicitly in a custom function provided in the (next) argument,
refseqFunction
.
Note that the XStringSet
class can represent any
arbitrary sequence, including user-defined subclasses, but is most-often
used to represent RNA, DNA, or amino acid sequences.
The only constraint is that this special list of sequences
has exactly one named element for each OTU in the dataset.readDNAStringSet
,
which expects to read a fasta-formatted DNA sequence file.
If your reference sequences for each OTU are amino acid, RNA, or something else,
then you will need to specify a different function here.
This is the function used to read the file connection provided as the
the previous argument, refseqfilename
.
This argument is ignored if refseqfilename
is already a
XStringSet
class.NULL
.
Additional arguments to refseqFunction
.
See XStringSet-io
for details about
additional arguments to the standard read functions in the Biostrings package.parse_taxonomy_default
.
There are many variations on taxonomic nomenclature, and naming
conventions used to store that information in various taxonomic
databases and phylogenetic assignment algorithms. A popular database,
parse_taxonomy_greengenes
,
and more can be contributed or posted as code snippets as needed.
They can be custom-defined by a user immediately prior to the the call to
import_biom
, and this is a suggested first step to take
when trouble-shooting taxonomy-related errors during file import..parallel
parameter in plyr-package
functions. If TRUE
, apply
parsing functions in parallel, using parallel backend provided by
foreach
and its supporting backend packages. One caveat,
plyr-parallelization currently works most-cleanly with multicore
-like
backends (Mac OS X, Unix?), and may throw warnings for SNOW-like backends.
See the example below for code invoking multicore-style backend within
the doParallel
package. Finally, for many datasets a parallel import should not be necessary
because a serial import will be just as fast and the import is often only
performed one time; after which the data should be saved as an RData file
using the save
function.
1.0
.
Not yet implemented. Parsing of the biom-format is done mostly
by the biom package now available in CRAN.read_tree
.phyloseq-class
object.import
# An included example of a rich dense biom file
rich_dense_biom <- system.file("extdata", "rich_dense_otu_table.biom", package="phyloseq")
import_biom(rich_dense_biom, parseFunction=parse_taxonomy_greengenes)
# An included example of a sparse dense biom file
rich_sparse_biom <- system.file("extdata", "rich_sparse_otu_table.biom", package="phyloseq")
import_biom(rich_sparse_biom, parseFunction=parse_taxonomy_greengenes)
# # # Example code for importing large file with parallel backend
# library("doParallel")
# registerDoParallel(cores=6)
# import_biom("my/file/path/file.biom", parseFunction=parse_taxonomy_greengenes, parallel=TRUE)
Run the code above in your browser using DataLab