phyloseq_to_deseq2
for a recommended alternative to rarefying
directly supported in the phyloseq package, as well as
sample
function to
resample from the abundance values
in the otu_table
component of the first argument,
physeq
.
Often one of the major goals of this procedure is to achieve parity in
total number of counts between samples, as an alternative to other formal
normalization procedures, which is why a single value for the
sample.size
is expected.
This kind of resampling can be performed with and without replacement,
with replacement being the more computationally-efficient, default setting.
See the replace
parameter documentation for more details.
We recommended that you explicitly select a random number generator seed
before invoking this function, or, alternatively, that you
explicitly provide a single positive integer argument as rngseed
.rarefy_even_depth(physeq, sample.size = min(sample_sums(physeq)),
rngseed = FALSE, replace = TRUE, trimOTUs = TRUE, verbose = TRUE)
phyloseq-class
object that you
want to trim/filter.sample_sums
on the output.set.seed
, which is used to fix a seed for reproducibly
random number generation (in this case, reproducibly random subsampling).
The default value is 711
.
If set to FALSE
, then no fiddling with the RNG seed is performed,
and it is up to the user to appropriately call set.seed
beforehand to achieve reproducible results.TRUE
) or without replacement (FALSE
).
The default is with replacement (replace=TRUE
).
Two implications to consider are that
(1) sampling with replacement is faster and more memory efficient
as currently implemented; and
(2), sampling with replacement means that there is a chance that the
number of reads for a given OTU in a given sample could be larger
than the original count value, as opposed to sampling without replacement
where the original count value is the maximum possible.
Prior to phyloseq package version number 1.5.20
,
this parameter did not exist and sampling with replacement was the only
random subsampling implemented in the rarefy_even_depth
function.
Note that this default behavior was selected for computational efficiency,
but differs from analogous functions in related packages
(e.g. subsampling in QIIME).logical(1)
.
Whether to trim OTUs
from the dataset that are no longer observed in any sample
(have a count of zero in every sample).
The number of OTUs trimmed, if any, is printed to
standard out as a reminder.TRUE
.
If TRUE
, extra non-warning, non-error messages are printed
to standard out, describing steps in the rarefying process,
the OTUs and samples removed, etc. This can be useful the
first few times the function is executed, but can be set
to FALSE
as-needed once behavior has been verified
as expected.phyloseq
.
Only the otu_table
component is modified.rarefy
, that has also been used recently
to describe this process
and, to our knowledge, not previously used in ecology.Make sure to use set.seed
for exactly-reproducible results
of the random subsampling.
sample
# Test with esophagus dataset
data("esophagus")
esorepT = rarefy_even_depth(esophagus, replace=TRUE)
esorepF = rarefy_even_depth(esophagus, replace=FALSE)
sample_sums(esophagus)
sample_sums(esorepT)
sample_sums(esorepF)
## NRun Manually: Too slow!
# data("GlobalPatterns")
# GPrepT = rarefy_even_depth(GlobalPatterns, 1E5, replace=TRUE)
## Actually just this one is slow
# system.time(GPrepF <- rarefy_even_depth(GlobalPatterns, 1E5, replace=FALSE))
Run the code above in your browser using DataLab