Grouped scatterplots from ggplot2
combined with marginal
histograms/boxplots/density plots with statistical details added as a
subtitle.
grouped_ggscatterstats(
data,
x,
y,
grouping.var,
label.var = NULL,
label.expression = NULL,
title.prefix = NULL,
output = "plot",
...,
plotgrid.args = list(),
title.text = NULL,
title.args = list(size = 16, fontface = "bold"),
caption.text = NULL,
caption.args = list(size = 10),
sub.text = NULL,
sub.args = list(size = 12)
)
for use with formula
, a data frame containing all the
data
The column in data
containing the explanatory variable to be
plotted on the x
-axis. Can be entered either as a character string (e.g.,
"x"
) or as a bare expression (e.g, x
).
The column in data
containing the response (outcome) variable to
be plotted on the y
-axis. Can be entered either as a character string
(e.g., "y"
) or as a bare expression (e.g, y
).
A single grouping variable (can be entered either as a
bare name x
or as a string "x"
).
Variable to use for points labels. Can be entered either as
a character string (e.g., "var1"
) or as a bare expression (e.g, var1
).
An expression evaluating to a logical vector that
determines the subset of data points to label. This argument can be entered
either as a character string (e.g., "y < 4 & z < 20"
) or as a bare
expression (e.g., y < 4 & z < 20
).
Character string specifying the prefix text for the fixed
plot title (name of each factor level) (Default: NULL
). If NULL
, the
variable name entered for grouping.var
will be used.
Character that describes what is to be returned: can be
"plot"
(default) or "subtitle"
or "caption"
. Setting this to
"subtitle"
will return the expression containing statistical results. If
you have set results.subtitle = FALSE
, then this will return a NULL
.
Setting this to "caption"
will return the expression containing details
about Bayes Factor analysis, but valid only when type = "parametric"
and
bf.message = TRUE
, otherwise this will return a NULL
. For functions
ggpiestats
and ggbarstats
, setting output = "proptest"
will return a
dataframe containing results from proportion tests.
Arguments passed on to ggscatterstats
point.label.args
A list of additional aesthetic arguments to be passed
to ggrepel::geom_label_repel
geom used to display the labels.
smooth.line.args
A list of additional aesthetic arguments to be passed
to ggplot2::geom_smooth
geom used to display the regression line.
point.args
A list of additional aesthetic arguments to be passed
to ggplot2::geom_point
geom used to display the raw data points.
marginal
Decides whether ggExtra::ggMarginal()
plots will be
displayed; the default is TRUE
.
point.width.jitter
Degree of jitter in x
and y
direction, respectively. Defaults to 0
(0%) of the resolution of the
data. Note that the jitter should not be specified in the point.args
because this information will be passed to two different geom
s: one
displaying the points and the other displaying the labels for these points.
point.height.jitter
Degree of jitter in x
and y
direction, respectively. Defaults to 0
(0%) of the resolution of the
data. Note that the jitter should not be specified in the point.args
because this information will be passed to two different geom
s: one
displaying the points and the other displaying the labels for these points.
marginal.type
Type of marginal distribution to be plotted on the axes
("histogram"
, "boxplot"
, "density"
, "violin"
, "densigram"
).
marginal.size
Integer describing the relative size of the marginal
plots compared to the main plot. A size of 5
means that the main plot is
5x wider and 5x taller than the marginal plots.
xfill
Character describing color fill for x
and y
axes
marginal distributions (default: "#009E73"
(for x
) and "#D55E00"
(for
y
)). The same colors will also be used for the lines denoting centrality
parameters if centrality.parameter
argument is set to TRUE
. Note that
the defaults are colorblind-friendly.
yfill
Character describing color fill for x
and y
axes
marginal distributions (default: "#009E73"
(for x
) and "#D55E00"
(for
y
)). The same colors will also be used for the lines denoting centrality
parameters if centrality.parameter
argument is set to TRUE
. Note that
the defaults are colorblind-friendly.
centrality.parameter
Decides which measure of central tendency ("mean"
or "median"
) is to be displayed as vertical (for x
) and horizontal (for
y
) lines. Note that mean values corresponds to arithmetic mean and not
geometric mean.
vline.args
A list of additional aesthetic arguments to be
passed to ggplot2::geom_vline
and ggplot2::geom_hline
geoms used to
display the centrality parameter labels on vertical and horizontal lines.
hline.args
A list of additional aesthetic arguments to be
passed to ggplot2::geom_vline
and ggplot2::geom_hline
geoms used to
display the centrality parameter labels on vertical and horizontal lines.
type
Type of association between paired samples required
(""parametric"
: Pearson's product moment correlation coefficient" or
""nonparametric"
: Spearman's rho" or ""robust"
: percentage bend
correlation coefficient" or ""bayes"
: Bayes Factor for Pearson's r").
Corresponding abbreviations are also accepted: "p"
(for
parametric/pearson's), "np"
(nonparametric/spearman), "r"
(robust),
"bf"
(for bayes factor), resp.
conf.level
Scalar between 0 and 1. If unspecified, the defaults return
95%
lower and upper confidence intervals (0.95
).
bf.prior
A number between 0.5
and 2
(default 0.707
), the prior
width to use in calculating Bayes factors.
nboot
Number of bootstrap samples for computing confidence interval
for the effect size (Default: 100
).
beta
bending constant (Default: 0.1
). For more, see ?WRS2::pbcor
.
k
Number of digits after decimal point (should be an integer)
(Default: k = 2
).
formula
Formula to use in smoothing function, eg. y ~ x
,
y ~ poly(x, 2)
, y ~ log(x)
. NULL
by default, in which case
method = NULL
implies formula = y ~ x
when there are fewer than 1,000
observations and formula = y ~ s(x, bs = "cs")
otherwise.
method
Smoothing method (function) to use, accepts either
NULL
or a character vector, e.g. "lm"
, "glm"
, "gam"
, "loess"
or a function, e.g. MASS::rlm
or mgcv::gam
, stats::lm
, or stats::loess
.
"auto"
is also accepted for backwards compatibility. It is equivalent to
NULL
.
For method = NULL
the smoothing method is chosen based on the
size of the largest group (across all panels). stats::loess()
is
used for less than 1,000 observations; otherwise mgcv::gam()
is
used with formula = y ~ s(x, bs = "cs")
with method = "REML"
. Somewhat anecdotally,
loess
gives a better appearance, but is \(O(N^{2})\) in memory,
so does not work for larger datasets.
If you have fewer than 1,000 observations but want to use the same gam()
model that method = NULL
would use, then set
method = "gam", formula = y ~ s(x, bs = "cs")
.
method.args
List of additional arguments passed on to the modelling
function defined by method
.
ggtheme
A function, ggplot2
theme name. Default value is
ggplot2::theme_bw()
. Any of the ggplot2
themes, or themes from
extension packages are allowed (e.g., ggthemes::theme_fivethirtyeight()
,
hrbrthemes::theme_ipsum_ps()
, etc.).
ggstatsplot.layer
Logical that decides whether theme_ggstatsplot
theme elements are to be displayed along with the selected ggtheme
(Default: TRUE
). theme_ggstatsplot
is an opinionated theme layer that
override some aspects of the selected ggtheme
.
bf.message
Logical that decides whether to display Bayes Factor in
favor of the null hypothesis. This argument is relevant only for
parametric test (Default: TRUE
).
results.subtitle
Decides whether the results of statistical tests are
to be displayed as a subtitle (Default: TRUE
). If set to FALSE
, only
the plot will be returned.
xlab
Labels for x
and y
axis variables. If NULL
(default),
variable names for x
and y
will be used.
ylab
Labels for x
and y
axis variables. If NULL
(default),
variable names for x
and y
will be used.
subtitle
The text for the plot subtitle. Will work only if
results.subtitle = FALSE
.
caption
The text for the plot caption.
ggplot.component
A ggplot
component to be added to the plot prepared
by ggstatsplot
. This argument is primarily helpful for grouped_
variant
of the current function. Default is NULL
. The argument should be entered
as a function.
messages
Decides whether messages references, notes, and warnings are
to be displayed (Default: TRUE
).
margins
Along which margins to show the plots. One of: [both, x, y].
xparams
List of extra parameters to use only for the marginal plot along the x axis.
yparams
List of extra parameters to use only for the marginal plot along the y axis.
centrality.label.args
A list of additional
aesthetic arguments to be passed to the geom_label
used to display the
label corresponding to the centrality parameter and test value.
A list of additional arguments to cowplot::plot_grid
.
String or plotmath expression to be drawn as title for the combined plot.
A list of additional arguments
provided to title
, caption
and sub
, resp.
String or plotmath expression to be drawn as the caption for the combined plot.
A list of additional arguments
provided to title
, caption
and sub
, resp.
The label with which the combined plot should be annotated. Can be a plotmath expression.
A list of additional arguments
provided to title
, caption
and sub
, resp.
https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/ggscatterstats.html
# NOT RUN {
# to ensure reproducibility
set.seed(123)
library(ggstatsplot)
# basic function call
ggstatsplot::grouped_ggscatterstats(
data = dplyr::filter(movies_long, genre == "Comedy" | genre == "Drama"),
x = length,
y = rating,
method = "lm",
formula = y ~ x + I(x^3),
grouping.var = genre
)
# using labeling
# (also show how to modify basic plot from within function call)
grouped_ggscatterstats(
data = dplyr::filter(ggplot2::mpg, cyl != 5),
x = displ,
y = hwy,
grouping.var = cyl,
title.prefix = "Cylinder count",
type = "robust",
label.var = manufacturer,
label.expression = hwy > 25 & displ > 2.5,
ggplot.component = ggplot2::scale_y_continuous(sec.axis = ggplot2::dup_axis()),
messages = FALSE
)
# labeling without expression
ggstatsplot::grouped_ggscatterstats(
data = dplyr::filter(
.data = movies_long,
rating == 7,
genre %in% c("Drama", "Comedy")
),
x = budget,
y = length,
grouping.var = genre,
bf.message = FALSE,
label.var = "title",
marginal = FALSE,
title.prefix = "Genre",
caption.text = "All movies have IMDB rating equal to 7."
)
# }
Run the code above in your browser using DataLab