Display taxa abundances as a heatmap.
taxa_heatmap(
biom,
rank = -1,
taxa = 6,
tracks = NULL,
grid = "bilbao",
other = FALSE,
unc = "singly",
lineage = FALSE,
label = TRUE,
label_size = NULL,
rescale = "none",
trees = TRUE,
clust = "complete",
dist = "euclidean",
asp = 1,
tree_height = 10,
track_height = 10,
legend = "right",
title = TRUE,
xlab.angle = "auto",
...
)
A ggplot2
plot. The computed data points and ggplot
command are available as $data
and $code
,
respectively.
An rbiom object, such as from as_rbiom()
.
Any value accepted by as_rbiom()
can also be given here.
What rank(s) of taxa to display. E.g. "Phylum"
,
"Genus"
, ".otu"
, etc. An integer vector can also be
given, where 1
is the highest rank, 2
is the second
highest, -1
is the lowest rank, -2
is the second
lowest, and 0
is the OTU "rank". Run biom$ranks
to
see all options for a given rbiom object. Default: -1
.
Which taxa to display. An integer value will show the top n
most abundant taxa. A value 0 <= n < 1 will show any taxa with that
mean abundance or greater (e.g. 0.1
implies >= 10%). A
character vector of taxa names will show only those named taxa.
Default: 6
.
A character vector of metadata fields to display as tracks
at the top of the plot. Or, a list as expected by the tracks
argument of plot_heatmap()
. Default: NULL
Color palette name, or a list as expected plot_heatmap()
.
Default: "bilbao"
Sum all non-itemized taxa into an "Other" taxa. When
FALSE
, only returns taxa matched by the taxa
argument. Specifying TRUE
adds "Other" to the returned set.
A string can also be given to imply TRUE
, but with that
value as the name to use instead of "Other".
Default: FALSE
How to handle unclassified, uncultured, and similarly ambiguous taxa names. Options are:
"singly"
- Replaces them with the OTU name.
"grouped"
- Replaces them with a higher rank's name.
"drop"
- Excludes them from the result.
"asis"
- To not check/modify any taxa names.
Abbreviations are allowed. Default: "singly"
Include all ranks in the name of the taxa. For instance,
setting to TRUE
will produce
Bacteria; Actinobacteria; Coriobacteriia; Coriobacteriales
.
Otherwise the taxa name will simply be Coriobacteriales
. You
want to set this to TRUE when unc = "asis"
and you have taxa
names (such as Incertae_Sedis) that map to multiple higher
level ranks. Default: FALSE
Label the matrix rows and columns. You can supply a list
or logical vector of length two to control row labels and column
labels separately, for example
label = c(rows = TRUE, cols = FALSE)
, or simply
label = c(TRUE, FALSE)
. Other valid options are "rows"
,
"cols"
, "both"
, "bottom"
, "right"
,
and "none"
.
Default: TRUE
The font size to use for the row and column labels. You
can supply a numeric vector of length two to control row label sizes
and column label sizes separately, for example
c(rows = 20, cols = 8)
, or simply c(20, 8)
.
Default: NULL
, which computes:
pmax(8, pmin(20, 100 / dim(mtx)))
Rescale rows or columns to all have a common min/max.
Options: "none"
, "rows"
, or "cols"
.
Default: "none"
Draw a dendrogram for rows (left) and columns (top). You can
supply a list or logical vector of length two to control the row tree
and column tree separately, for example
trees = c(rows = TRUE, cols = FALSE)
,
or simply trees = c(TRUE, FALSE)
.
Other valid options are "rows"
, "cols"
, "both"
,
"left"
, "top"
, and "none"
.
Default: TRUE
Clustering algorithm for reordering the rows and columns by
similarity. You can supply a list or character vector of length two to
control the row and column clustering separately, for example
clust = c(rows = "complete", cols = NA)
, or simply
clust = c("complete", NA)
. Options are:
FALSE
or NA
- Disable reordering.
hclust
class objectE.g. from stats::hclust()
.
"ward.D"
,
"ward.D2"
, "single"
, "complete"
,
"average"
, "mcquitty"
, "median"
, or
"centroid"
.
Default: "complete"
Distance algorithm to use when reordering the rows and columns
by similarity. You can supply a list or character vector of length
two to control the row and column clustering separately, for example
dist = c(rows = "euclidean", cols = "maximum")
, or simply
dist = c("euclidean", "maximum")
. Options are:
dist
class objectE.g. from stats::dist()
or bdiv_distmat()
.
"euclidean"
,
"maximum"
, "manhattan"
, "canberra"
,
"binary"
, or "minkowski"
.
Default: "euclidean"
Aspect ratio (height/width) for entire grid.
Default: 1
(square)
The height of the dendrogram or annotation
tracks as a percentage of the overall grid size. Use a numeric vector
of length two to assign c(top, left)
independently.
Default: 10
(10% of the grid's height)
Where to place the legend. Options are: "right"
or
"bottom"
. Default: "right"
Plot title. Set to TRUE
for a default title, NULL
for
no title, or any character string. Default: TRUE
Angle of the labels at the bottom of the plot.
Options are "auto"
, '0'
, '30'
, and '90'
.
Default: "auto"
.
Additional arguments to pass on to ggplot2::theme().
Metadata can be displayed as colored tracks above the heatmap. Common use cases are provided below, with more thorough documentation available at https://cmmr.github.io/rbiom .
## Categorical ----------------------------
tracks = "Body Site"
tracks = list('Body Site' = "bright")
tracks = list('Body Site' = c('Stool' = "blue", 'Saliva' = "green"))## Numeric --------------------------------
tracks = "Age"
tracks = list('Age' = "reds")
## Multiple Tracks ------------------------
tracks = c("Body Site", "Age")
tracks = list('Body Site' = "bright", 'Age' = "reds")
tracks = list(
'Body Site' = c('Stool' = "blue", 'Saliva' = "green"),
'Age' = list('colors' = "reds") )
The following entries in the track definitions are understood:
colors
- A pre-defined palette name or custom set of colors to map to.
range
- The c(min,max) to use for scale values.
label
- Label for this track. Defaults to the name of this list element.
side
- Options are "top"
(default) or "left"
.
na.color
- The color to use for NA
values.
bins
- Bin a gradient into this many bins/steps.
guide
- A list of arguments for guide_colorbar() or guide_legend().
All built-in color palettes are colorblind-friendly.
Categorical palette names: "okabe"
, "carto"
, "r4"
,
"polychrome"
, "tol"
, "bright"
, "light"
,
"muted"
, "vibrant"
, "tableau"
, "classic"
,
"alphabet"
, "tableau20"
, "kelly"
, and "fishy"
.
Numeric palette names: "reds"
, "oranges"
, "greens"
,
"purples"
, "grays"
, "acton"
, "bamako"
,
"batlow"
, "bilbao"
, "buda"
, "davos"
,
"devon"
, "grayC"
, "hawaii"
, "imola"
,
"lajolla"
, "lapaz"
, "nuuk"
, "oslo"
,
"tokyo"
, "turku"
, "bam"
, "berlin"
,
"broc"
, "cork"
, "lisbon"
, "roma"
,
"tofino"
, "vanimo"
, and "vik"
.
Other taxa_abundance:
sample_sums()
,
taxa_boxplot()
,
taxa_clusters()
,
taxa_corrplot()
,
taxa_stacked()
,
taxa_stats()
,
taxa_sums()
,
taxa_table()
Other visualization:
adiv_boxplot()
,
adiv_corrplot()
,
bdiv_boxplot()
,
bdiv_corrplot()
,
bdiv_heatmap()
,
bdiv_ord_plot()
,
plot_heatmap()
,
rare_corrplot()
,
rare_multiplot()
,
rare_stacked()
,
stats_boxplot()
,
stats_corrplot()
,
taxa_boxplot()
,
taxa_corrplot()
,
taxa_stacked()
library(rbiom)
# Keep and rarefy the 10 most deeply sequenced samples.
hmp10 <- rarefy(hmp50, n = 10)
taxa_heatmap(hmp10, rank = "Phylum", tracks = "Body Site")
taxa_heatmap(hmp10, rank = "Genus", tracks = c("sex", "bo"))
taxa_heatmap(hmp10, rank = "Phylum", tracks = list(
'Sex' = list(colors = c(m = "#0000FF", f = "violetred")),
'Body Site' = list(colors = "muted", label = "Source") ))
Run the code above in your browser using DataLab