".clust"
file or its apparent sequence naming convention.import_RDP_cluster(RDP_cluster_file)
".clust"
file produced by the
the complete linkage clustering step of the RDP pipeline.otu_table
object parsed from the ".clust"
file.http://pyro.cme.msu.edu/index.jsp
The cluster file itself contains
the names of all sequences contained in input alignment. If the upstream
barcode and aligment processing steps are also done with the RDP pipeline,
then the sequence names follow a predictable naming convention wherein each
sequence is named by its sample and sequence ID, separated by a "_"
as
delimiter:
"sampleName_sequenceIDnumber"
This import function assumes that the sequence names in the cluster file follow
this convention, and that the sample name does not contain any "_"
. It
is unlikely to work if this is not the case. It is likely to work if you used
the upstream steps in the RDP pipeline to process your raw (barcoded, untrimmed)
fasta/fastq data.
This function first loops through the ".clust"
file and collects all
of the sample names that appear. It secondly loops through each OTU ("cluster"
;
each row of the cluster file) and sums the number of sequences (reads) from
each sample. The resulting abundance table of OTU-by-sample is trivially
coerced to an otu_table
object, and returned.