Bar charts for categorical data with statistical details included in the plot as a subtitle.
ggbarstats(
data,
x,
y,
counts = NULL,
type = "parametric",
paired = FALSE,
results.subtitle = TRUE,
label = "percentage",
label.args = list(alpha = 1, fill = "white"),
sample.size.label.args = list(size = 4),
digits = 2L,
proportion.test = results.subtitle,
digits.perc = 0L,
bf.message = TRUE,
ratio = NULL,
conf.level = 0.95,
sampling.plan = "indepMulti",
fixed.margin = "rows",
prior.concentration = 1,
title = NULL,
subtitle = NULL,
caption = NULL,
legend.title = NULL,
xlab = NULL,
ylab = NULL,
ggtheme = ggstatsplot::theme_ggstatsplot(),
package = "RColorBrewer",
palette = "Dark2",
ggplot.component = NULL,
...
)
A data frame (or a tibble) from which variables specified are to
be taken. Other data types (e.g., matrix,table, array, etc.) will not
be accepted. Additionally, grouped data frames from {dplyr}
should be
ungrouped before they are entered as data
.
The variable to use as the rows in the contingency table. Please note that if there are empty factor levels in your variable, they will be dropped.
The variable to use as the columns in the contingency table.
Please note that if there are empty factor levels in your variable, they
will be dropped. Default is NULL
. If NULL
, one-sample proportion test
(a goodness of fit test) will be run for the x
variable. Otherwise an
appropriate association test will be run. This argument can not be NULL
for ggbarstats
function.
The variable in data containing counts, or NULL
if each row
represents a single observation.
A character specifying the type of statistical approach:
"parametric"
"nonparametric"
"robust"
"bayes"
You can specify just the initial letter.
Logical indicating whether data came from a within-subjects or
repeated measures design study (Default: FALSE
).
Decides whether the results of statistical tests are
to be displayed as a subtitle (Default: TRUE
). If set to FALSE
, only
the plot will be returned.
Character decides what information needs to be displayed
on the label in each pie slice. Possible options are "percentage"
(default), "counts"
, "both"
.
Additional aesthetic arguments that will be passed to
ggplot2::geom_label()
.
Additional aesthetic arguments that will be passed to
ggplot2::geom_text()
.
Number of digits for rounding or significant figures. May also
be "signif"
to return significant figures or "scientific"
to return scientific notation. Control the number of digits by adding the
value as suffix, e.g. digits = "scientific4"
to have scientific
notation with 4 decimal places, or digits = "signif5"
for 5
significant figures (see also signif()
).
Decides whether proportion test for x
variable is to
be carried out for each level of y
. Defaults to results.subtitle
. In
ggbarstats
, only p-values from this test will be displayed.
Numeric that decides number of decimal places for percentage
labels (Default: 0L
).
Logical that decides whether to display Bayes Factor in
favor of the null hypothesis. This argument is relevant only for
parametric test (Default: TRUE
).
A vector of proportions: the expected proportions for the
proportion test (should sum to 1
). Default is NULL
, which means the null
is equal theoretical proportions across the levels of the nominal variable.
E.g., ratio = c(0.5, 0.5)
for two levels,
ratio = c(0.25, 0.25, 0.25, 0.25)
for four levels, etc.
Scalar between 0
and 1
(default: 95%
confidence/credible intervals, 0.95
). If NULL
, no confidence intervals
will be computed.
Character describing the sampling plan. Possible options
are "indepMulti"
(independent multinomial; default), "poisson"
,
"jointMulti"
(joint multinomial), "hypergeom"
(hypergeometric). For
more, see ?BayesFactor::contingencyTableBF()
.
For the independent multinomial sampling plan, which
margin is fixed ("rows"
or "cols"
). Defaults to "rows"
.
Specifies the prior concentration parameter, set
to 1
by default. It indexes the expected deviation from the null
hypothesis under the alternative, and corresponds to Gunel and Dickey's
(1974) "a"
parameter.
The text for the plot title.
The text for the plot subtitle. Will work only if
results.subtitle = FALSE
.
The text for the plot caption. This argument is relevant only
if bf.message = FALSE
.
Title text for the legend.
Label for x
axis variable. If NULL
(default),
variable name for x
will be used.
Labels for y
axis variable. If NULL
(default),
variable name for y
will be used.
A {ggplot2}
theme. Default value is
ggstatsplot::theme_ggstatsplot()
. Any of the {ggplot2}
themes (e.g.,
theme_bw()
), or themes from extension packages are allowed (e.g.,
ggthemes::theme_fivethirtyeight()
, hrbrthemes::theme_ipsum_ps()
, etc.).
But note that sometimes these themes will remove some of the details that
{ggstatsplot}
plots typically contains. For example, if relevant,
ggbetweenstats()
shows details about multiple comparison test as a label
on the secondary Y-axis. Some themes (e.g.
ggthemes::theme_fivethirtyeight()
) will remove the secondary Y-axis and
thus the details as well.
Name of the package from which the given palette is to
be extracted. The available palettes and packages can be checked by running
View(paletteer::palettes_d_names)
.
A ggplot
component to be added to the plot prepared
by {ggstatsplot}
. This argument is primarily helpful for grouped_
variants of all primary functions. Default is NULL
. The argument should
be entered as a {ggplot2}
function or a list of {ggplot2}
functions.
Currently ignored.
graphical element | geom used | argument for further modification |
bars | ggplot2::geom_bar() | NA |
descriptive labels | ggplot2::geom_label() | label.args |
sample size labels | ggplot2::geom_text() | sample.size.label.args |
The table below provides summary about:
statistical test carried out for inferential statistics
type of effect size estimate and a measure of uncertainty for this estimate
functions used internally to compute these details
Hypothesis testing
Type | Design | Test | Function used |
Parametric/Non-parametric | Unpaired | Pearson's chi-squared test | stats::chisq.test() |
Bayesian | Unpaired | Bayesian Pearson's chi-squared test | BayesFactor::contingencyTableBF() |
Parametric/Non-parametric | Paired | McNemar's chi-squared test | stats::mcnemar.test() |
Bayesian | Paired | No | No |
Effect size estimation
Type | Design | Effect size | CI available? | Function used |
Parametric/Non-parametric | Unpaired | Cramer's V | Yes | effectsize::cramers_v() |
Bayesian | Unpaired | Cramer's V | Yes | effectsize::cramers_v() |
Parametric/Non-parametric | Paired | Cohen's g | Yes | effectsize::cohens_g() |
Bayesian | Paired | No | No | No |
Hypothesis testing
Type | Test | Function used |
Parametric/Non-parametric | Goodness of fit chi-squared test | stats::chisq.test() |
Bayesian | Bayesian Goodness of fit chi-squared test | (custom) |
Effect size estimation
Type | Effect size | CI available? | Function used |
Parametric/Non-parametric | Pearson's C | Yes | effectsize::pearsons_c() |
Bayesian | No | No | No |
For details, see: https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/ggpiestats.html
grouped_ggbarstats
, ggpiestats
,
grouped_ggpiestats
if (FALSE) { # identical(Sys.getenv("NOT_CRAN"), "true")
# for reproducibility
set.seed(123)
# creating a plot
p <- ggbarstats(mtcars, x = vs, y = cyl)
# looking at the plot
p
# extracting details from statistical tests
extract_stats(p)
}
Run the code above in your browser using DataLab