This function computes and plots (1) Fisher \(z'\) confidence intervals for Pearson product-moment correlation coefficients (a) without non-normality adjustment, (1b) adjusted via sample joint moments method or (1c) adjusted via approximate distribution method (Bishara et al., 2018), (2) Spearman's rank-order correlation coefficients with (2a) Fieller et al. (1957) standard error, (2b) Bonett and Wright (2000) standard error, or (2c) rank-based inverse normal transformation, (3) Kendall's Tau-b, and (4) Kendall-Stuart's Tau-c correlation coefficients with Fieller et al. (1957) standard error, optionally by a grouping and/or split variable. The function also supports five types of bootstrap confidence intervals (e.g., bias-corrected (BC) percentile bootstrap or bias-corrected and accelerated (BCa) bootstrap confidence intervals) and plots the bootstrap samples with histograms and density curves. By default, the function computes Pearson product-moment correlation coefficients adjusted via approximate distribution method.
ci.cor(data, ...,
method = c("pearson", "spearman", "kendall-b", "kendall-c"),
adjust = c("none", "joint", "approx"),
se = c("fisher", "fieller", "bonett", "rin"),
sample = TRUE, seed = NULL, maxtol = 1e-05, nudge = 0.001,
boot = c("none", "norm", "basic", "perc", "bc", "bca"), R = 1000,
fisher = TRUE, alternative = c("two.sided", "less", "greater"),
conf.level = 0.95, group = NULL, split = NULL, na.omit = FALSE, digits = 2,
as.na = NULL, plot = c("none", "ci", "boot"), point.size = 2.5,
point.shape = 19, errorbar.width = 0.3, dodge.width = 0.5, hist = TRUE,
binwidth = NULL, bins = NULL, hist.alpha = 0.4, fill = "gray85", density = TRUE,
density.col = "#0072B2", density.linewidth = 0.5, density.linetype = "solid",
point = TRUE, point.col = "#CC79A7", point.linewidth = 0.6,
point.linetype = "solid", ci = TRUE, ci.col = "black",
ci.linewidth = 0.6, ci.linetype = "dashed", line = TRUE, intercept = 0,
linetype = "solid", line.col = "gray65", xlab = NULL, ylab = NULL,
xlim = NULL, ylim = NULL, xbreaks = ggplot2::waiver(), ybreaks = ggplot2::waiver(),
axis.title.size = 11, axis.text.size = 10, strip.text.size = 11, title = NULL,
subtitle = NULL, group.col = NULL, plot.margin = NA, legend.title = "",
legend.position = c("right", "top", "left", "bottom", "none"),
legend.box.margin = c(-10, 0, 0, 0), facet.ncol = NULL, facet.nrow = NULL,
facet.scales = "free_y", filename = NULL, width = NA, height = NA,
units = c("in", "cm", "mm", "px"), dpi = 600, write = NULL,
append = TRUE, check = TRUE, output = TRUE)
Returns an object of class misty.object
, which is a list with following
entries:
call
function call
type
type of analysis
data
list with the input specified in ...
, data
, group
, and split
args
specification of function arguments
boot
data frame with bootstrap replicates of the correlation coefficient when bootstrapping was requested
plot
ggplot2 object for plotting the results
result
result table
a data frame with numeric variables, i.e.,
factors and character variables are excluded from
data
before conducting the analysis.
an expression indicating the variable names in data
e.g., ci.cor(x1, x2, data = dat)
. Note that the
operators .
, +
, -
, ~
,
:
, ::
, and !
can also be used
to select variables, see 'Details' in the
df.subset
function.
a character string indicating which correlation
coefficient is to be computed, i.e., "pearson"
for Pearson product-moment correlation coefficient
(default), "spearman"
for Spearman's rank-order
correlation coefficient, "kendall-b"
for Kendall's
Tau-b correlation coefficient, "kendall-c"
for
Kendall-Stuart's Tau-c correlation coefficient. Note
that confidence intervals are only computed given
at least 4 pairs of observations.
a character string specifying the non-normality
adjustment method, i.e., "none"
for the Fisher
\(z'\) confidence interval for the Pearson
product-moment correlation coefficient without
non-normality adjustment, "joint"
for the
confidence interval with non-normality adjustment via
sample joint moments, and "approx"
(default)
for the confidence interval with non-normality adjustment
via approximate distribution by skewness and kurtosis.
Note that this argument only applies to the Pearson
product-moment correlation coefficient, i.e.,
method = "pearson"
a character string specifying the method for computing
the standard error of the correlation coefficient,
i.e., "fisher"
for the Fisher \(z'\) confidence
interval, "fieller"
(default) for the confidence
interval for Spearman's rank-order correlation
coefficient based on approximate standard error by
Fieller et al. (1957), "bonett"
for the confidence
interval based on approximate standard error by Bonett
and Wright (2000), and "rin"
for the confidence
interval for Spearman's rank-order correlation coefficient
based on rank-based inverse normal (RIN) transformation.
Note that this argument only applies to Spearman's
rank-order correlation coefficient, i.e.,
method = "spearman"
.
logical: if TRUE
(default), the univariate
sample skewness and kurtosis is used when applying
the approximate distribution method and reported in
the result table, while the population skewness and
kurtosis is used when sample = FALSE
.
a numeric value specifying the seed of the pseudo-random
number generator when generating a random set of
starting parameter value when the parameters led to
a sum of squares greater than the maximum tolerance
after optimization when applying the approximate
distribution method (adjust = approx
) when
computing the confidence interval for the Pearson
product-moment correlation coefficient, or the seeds
of the pseudo-random numbers used when conducting
bootstrapping.
a numeric value indicating the tolerance for total
squared error when applying the approximate distribution
method (adjust = approx
).
a numeric value indicating the nudge proportion of
their original values by which sample skewness, kurtosis,
and r are nudged towards 0 when applying the approximate
distribution method (adjust = approx
).
are only computed given at least 10 pairs of observations.
a character string specifying the type of bootstrap
confidence intervals (CI), i.e., "none"
(default)
for not conducting bootstrapping, "norm"
for
the bias-corrected normal approximation bootstrap CI,
"basic"
for the basic bootstrap CI, "perc"
,
for the percentile bootstrap CI "bc"
(default)
for the bias-corrected (BC) percentile bootstrap CI
(without acceleration), and "bca"
for the
bias-corrected and accelerated (BCa) bootstrap CI.
a numeric value indicating the number of bootstrap replicates (default is 1000).
logical: if TRUE
(default), Fisher \(z\)
transformation is applied before computing the
confidence intervals to reverse-transformed the limits
of the interval using the inverse of the Fisher
\(z\) transformation. Note that this argument applies
only when boot
is "norm"
or "basic"
.
a character string specifying the alternative hypothesis,
must be one of "two.sided"
(default),
"greater"
or "less"
.
a numeric value between 0 and 1 indicating the confidence level of the interval.
either a character string indicating the variable name
of the grouping variable in ...
or data
,
or a vector representing the grouping variable. The
grouping variable is excluded from the data
frame specified in data
.
either a character string indicating the variable name
of the split variable in ...
or data
,
or a vector representing the split variable. The split
variable is excluded from the data frame
specified in data
.
logical: if TRUE
, incomplete cases are removed
before conducting the analysis (i.e., listwise deletion);
if FALSE
(default), pairwise deletion is used.
an integer value indicating the number of decimal places to be used.
a numeric vector indicating user-defined missing values,
i.e. these values are converted to NA
before
conducting the analysis.
a character string indicating the type of the plot
to display, i.e., "none"
(default) for not
displaying any plots, "ci"
for displaying
confidence intervals for the correlation coefficient,
"boot"
for displaying bootstrap samples with
histograms and density curves when the argument
"boot"
is other than "none"
.
a numeric value indicating the size
argument
in the geom_point
function for controlling the
size of points when plotting confidence intervals
(plot = "ci"
).
a numeric value between 0 and 25 or a character string
as plotting symbol indicating the shape
argument
in the geom_point
function for controlling the
symbols of points. when plotting confidence intervals
(plot = "ci"
).
a numeric value indicating the width
argument
in the geom_errorbar
function for controlling
the width of the whiskers in the geom_errorbar
function when plotting confidence intervals
(plot = "ci"
).
a numeric value indicating the width
argument
controlling the width of the geom
elements to
be dodged when specifying a grouping variable using
the argument group
when plotting confidence
intervals (plot = "ci"
).
logical: if TRUE
(default), histograms are
drawn when plotting bootstrap samples (plot = "boot"
).
a numeric value or a function for specifying the
binwidth
argument in the geom_histogram
function for controlling the width of the bins when
plotting bootstrap samples (plot = "boot"
).
a numeric value for specifying the bins
argument
in the geom_histogram
function for controlling
the number of bins when plotting bootstrap samples
(plot = "boot"
).
a numeric value between 0 and 1 for specifying the
alpha
argument in the geom_histogram
function for controlling the opacity of the bars
when plotting bootstrap samples (plot = "boot"
).
a character string specifying the fill
argument
in the geom_histogram
function controlling the
fill aesthetic when plotting bootstrap samples
(plot = "boot"
). Note that this argument applied
only when no grouping variable was specified
group = NULL
.
logical: if TRUE
(default), density curves are
drawn when plotting bootstrap samples
(plot = "boot"
).
a character string specifying the color
argument
in the geom_density
function controlling the
color of the density curves when plotting bootstrap samples
(plot = "boot"
). Note that this argument applied
only when no grouping variable was specified
group = NULL
.
a numeric value specifying the linewidth
argument in the geom_density
function controlling
the line width of the density curves when plotting
bootstrap samples (plot = "boot"
).
a numeric value or character string specifying the
linetype
argument in the geom_density
function controlling the line type of the density
curves when plotting bootstrap samples
(plot = "boot"
).
logical: if TRUE
(default), vertical lines
representing the point estimate of the correlation
coefficients are drawn when plotting bootstrap samples
(plot = "boot"
).
a character string specifying the color
argument
in the geom_vline
function for controlling the
color of the vertical line displaying the correlation
coefficient when plotting bootstrap samples
(plot = "boot"
). Note that this argument applied
only when no grouping variable was specified
group = NULL
.
a numeric value specifying the linewdith
argument
in the geom_vline
function for controlling the
line width of the vertical line displaying the
correlation coefficient when plotting bootstrap samples
(plot = "boot"
).
a numeric value or character string specifying the
linetype
argument in the geom_vline
function controlling the line type of the vertical
line displaying the correlation coefficient when
plotting bootstrap samples (plot = "boot"
).
logical: if TRUE
(default), vertical lines
representing the bootstrap confidence intervals of
the correlation coefficient are drawn when plotting
bootstrap samples (plot = "boot"
).
character string specifying the color
argument
in the geom_vline
function for controlling the
color of the vertical line displaying bootstrap
confidence intervals when plotting bootstrap samples
(plot = "boot"
). Note that this argument applied
only when no grouping variable was specified
group = NULL
.
a numeric value specifying the linewdith
argument
in the geom_vline
function for controlling the
line width of the vertical line displaying bootstrap
confidence intervals when plotting bootstrap samples
(plot = "boot"
).
a numeric value or character string specifying the
linetype
argument in the geom_vline
function controlling the line type of the vertical
line displaying bootstrap confidence intervals when
plotting bootstrap samples (plot = "boot"
).
logical: if TRUE
(default), a horizontal line
is drawn when plot = "ci"
or a vertical line
is drawn when plot = "boot"
a numeric value indicating the yintercept
or
xintercept
argument in the geom_hline
or geom_vline
function controlling the position
of the horizontal or vertical line when plot = "ci"
and line = TRUE
or when plot = "boot"
and line = TRUE
. By default, the horizontal or
vertical line is drawn at 0.
a character string indicating the linetype
argument in the geom_hline
or geom_vline
function controlling the line type of the horizontal
or vertical line (default is linetype = "dashed"
).
a character string indicating the color
argument
in the geom_hline
or geom_vline
function
for controlling the color of the horizontal or vertical
line.
a character string indicating the name
argument
in the scale_x_continuous
function for labeling
the x-axis. The default setting is xlab = NULL
when plot = "ci"
and xlab = "Correlation Coefficient"
when plot = "boot"
.
a character string indicating the name
argument
in the scale_y_continuous
function for labeling
the y-axis. The default setting is ylab = "Correlation Coefficient"
when plot = "ci"
and ylab = "Probability Density, f(x)"
when plot = "boot"
.
a numeric vector with two elements indicating the
limits
argument in the scale_x_continuous
function for controlling the scale range of the x-axis.
The default setting is xlim = NULL
when plot = "ci"
and xlim = c(-1, 1)
when plot = "boot"
.
a numeric vector with two elements indicating the
limits
argument in the scale_y_continuous
function for controlling the scale range of the y-axis.
The default setting is ylim = c(-1, 1)
when
plot = "ci"
and xlim = NULL
when
plot = "boot"
.
a numeric vector indicating the breaks
argument
in the scale_x_continuous
function for controlling
the x-axis breaks.
a numeric vector indicating the breaks
argument
in the scale_y_continuous
function for controlling
the y-axis breaks.
a numeric value indicating the size
argument
in the element_text
function for specifying the
function controlling the font size of the axis title,
i.e., theme(axis.title = element_text(size = axis.text.size))
.
a numeric value indicating the size
argument
in the element_text
function for specifying the
function controlling the font size of the axis text,
i.e., theme(axis.text = element_text(size = axis.text.size))
.
a numeric value indicating the size
argument
in the element_text
function for specifying the
function controlling the font size of the strip text,
i.e., theme(strip.text = element_text(size = strip.text.size))
.
a character string indicating the title
argument
in the labs
function for the subtitle of the plot.
a character string indicating the subtite
argument
in the labs
function for the subtitle of the plot.
a character vector indicating the color
argument
in the scale_color_manual
and scale_fill_manual
functions when specifying a grouping variable using
the argument group
.
a numeric vector with four elements indicating the
plot.margin
argument in the theme
function
controlling the plot margins . The default setting
is c(5.5, 5.5, 5.5, 5.5)
, but switches
to c(5.5, 5.5, -2.5, 5.5)
when specifying a
grouping variable using the argument group
.
a character string indicating the color
argument
in the labs
function for specifying the legend
title when specifying a grouping variable using the
argument group
.
a character string indicating the legend.position
in the theme
argument for controlling the
position of the legend function when specifying a
grouping variable using the argument group
.
By default, the legend is placed at the bottom the
plot.
a numeric vector with four elements indicating the
legend.box.margin
argument in the theme
function for controlling the margins around the full
legend area when specifying a grouping variable using
the argument group
.
a numeric value indicating the ncol
argument
in the facet_wrap
function for controlling
the number of columns when specifying a split variable
using the argument split
.
a numeric value indicating the nrow
argument
in the facet_wrap
function for controlling the
number of rows when specifying a split variable using
the argument split
.
a character string indicating the scales
argument
in the facet_wrap
function for controlling the
scales shared across facets, i.e., "fixed"
,
"free_x"
, "free_y"
(default), or
"free"
when specifying a split variable using
the argument split
.
a character string indicating the filename
argument including the file extension in the ggsave
function. Note that one of ".eps"
, ".ps"
,
".tex"
, ".pdf"
(default),
".jpeg"
, ".tiff"
, ".png"
,
".bmp"
, ".svg"
or ".wmf"
needs
to be specified as file extension in the filename
argument. Note that plots can only be saved when
plot = "ci"
or plot = "boot"
.
a numeric value indicating the width
argument
(default is the size of the current graphics device)
in the ggsave
function.
a numeric value indicating the height
argument
(default is the size of the current graphics device)
in the ggsave
function.
a character string indicating the units
argument
(default is in
) in the ggsave
function.
a numeric value indicating the dpi
argument
(default is 600
) in the ggsave
function.
a character string naming a file for writing the output
into either a text file with file extension ".txt"
(e.g., "Output.txt"
) or Excel file with file
extension ".xlsx"
(e.g., "Output.xlsx"
).
If the file name does not contain any file extension,
an Excel file will be written.
logical: if TRUE
(default), output will be
appended to an existing text file with extension
.txt
specified in write
, if FALSE
existing text file will be overwritten.
logical: if TRUE
(default), argument specification
is checked.
logical: if TRUE
(default), output is shown
on the console.
Takuya Yanagida takuya.yanagida@univie.ac.at
The Fisher
\(z'\) confidence interval method for the Pearson product-moment correlation
coefficient is based on the assumption that \(X\) and \(Y\) have a bivariate
normal distribution in the population. Non-normality resulting from either
high kurtosis or high absolute skewness can distort the Fisher \(z'\)
confidence interval that produces a coverage rate that does not equal the one
intended. The distortion is largest when population correlation is large and
both variables \(X\) and \(Y\) were non-normal (Bishara et al., 2017).
Note that increasing sample size improves coverage only when the population
correlation is zero, while increasing the sample size worsens coverage with a
non-zero population correlation (Bishara & Hittner, 2017). The ci.cor
function computes the Fisher \(z'\) confidence interval without non-normality
adjustment (adjust = "none"
), with non-normality adjustment via sample
joint moments (adjust = "joint"
), or with non-normality adjustment via
approximate distribution (adjust = "approx"
):
Fisher \(z'\) confidence interval method uses the \(r\)-to\(z'\) transformation for the correlation coefficient \(r\):
$$z' = 0.5\cdot \ln\left(\frac{1 + r}{1 - r}\right)$$
The sampling distribution of \(z\) is approximately normal with a standard error of approximately
$$\sigma_z' = \sqrt{\frac{1}{(n - 3)}}$$
The two-sided 95% confidence interval is defined as
$$z' \pm 1.96\cdot\sigma_{z'}$$
These confidence interval bounds are transformed back to the scale of \(r\):
$$r = \frac{exp(2z') - 1}{exp(2z') + 1}$$
The resulting confidence interval of the correlation coefficient is an approximation and is only accurate when \(X\) and \(Y\) have a bivariate normal distribution in the population or when the population correlation is zero.
The Joint Moments Method multiplies the asymptotic variance of \(z'\) by \(\tau_f^2\) (Hawkins, 1989):
$$\tau_f^2 = \frac{(\mu_{40} + 2\mu_{22} + \mu_{04})\rho^2 - 4(\mu_{31} + \mu_{13})\rho + 4\mu_{22})}{4(1 - \rho^2)^2}$$
where \(\mu_{jk}\) represents a population joint moment defined as
$$\mu_{jk} = E[X^jY^k]$$
where \(X\) and \(Y\) are assumed to be standardized (\(\mu_{10} = \mu_{01} = 0\), \(\mu_{20} = \mu_{02} = 1\)). The standard error of \(z\)' can then be approximated as \(\tilde{\sigma}_{z'}\):
$$\tilde{\sigma}_{z'} = \tau_f\sqrt{\frac{1}{n - 3}}$$
The corresponding sample moments, \(m_{jk}\) can be used to estimate \(\tau_f^2\):
$$\hat{\mu}_{jk} = m_{jk} = \frac{1}{n}\sum_{i=1}^{n}(x_i^jy_i^k)$$
However, the higher-order sample joint moments may be unstable estimators of their population counterparts unless the sample size is extremely large. Thus, this estimate of \(\tau_f^2\) may be inaccurate, leading to inaccurate confidence intervals.
The Approximate Distribution Method estimates an approximate
distribution that the sample appears to be drawn from to analytically
solve for \(\tau_f^2\) based on that distribution's parameters. The
ci.cor
function uses a third-order polynomial family allowing
estimation of distribution parameters using marginal skewness and kurtosis
that are estimated using the marginal sample skewness and kurtosis
statistics (Bishara et al., 2018).
Bishara et al. (2018) conducted two Monte Carlo simulations that showed that the approximate distribution method was effective in dealing with violations of the bivariate normality assumption for a wide range of sample sizes, while the joint moments method was effective mainly when the sample size was extremely large, in the thousands. However, the third-order polynomial family used for the approximate distribution method cannot deal with absolute skewness above 4.4 or kurtosis above 43.4. Note that the approximate distribution method is accurate even when the bivariate normality assumption is satisfied, while the sample joint moments method sometimes fails to achieve the intended coverage even when the bivariate normality was satisfied.
The confidence
interval for Spearman's rank-order correlation coefficient is based on the
Fisher's \(z\) method (se = "fisher"
), Fieller et al. (1957)
approximate standard error (se = "fieller"
, default), Bonett and Wright
(2000) approximate standard error (se = "bonett"
) or rank-based inverse
normal (RIN) transformation (se = "rin"
) :
Fisher's \(z\) Standard Error $$\sqrt{\frac{1}{(n - 3)}}$$
Fieller et al. (1957) Approximate Standard Error
$$\sqrt{\frac{1.06}{(n - 3)}}$$
Note that this approximation for the standard error is recommended for \(n > 10\) and \(|rs| < 0.8\).
Bonett and Wright (2000) Approximate Standard Error
$$\sqrt{\frac{1 + \frac{\hat{\theta}^2}{2}}{(n - 3)}}$$
where \(\hat{\theta}\) is the point estimate of the Spearman's rank-order correlation coefficient. Note that this approximation for the standard error is recommended for \(|\tau| \le 0.9\).
Rin Transformation involves three steps. First, the variable is converted to ranks. Second, the ranks are converted to a 0-to-1 scale using a linear function. Third, this distribution is transformed via the inverse of the normal cumulative distribution function (i.e., via probit transformation). The result is an approximately normal distribution regardless of the original shape of the data, so long as ties are infrequent and \(n\) is not too small.
The confidence interval for Kendall's Tau-b and Tau-c correlation coefficient is based on the approximate standard error by Fieller et al. (1957):
$$\sigma_z' = \sqrt{\frac{0.437}{(n - 4)}}$$
Note that this approximation for the standard error is recommended for \(n > 10\) and \(|\tau| < 0.8\).
The ci.cor
function supports
bootstrap confidence intervals (CI) for the correlation coefficient by changing
the default setting boot = "none"
to request one of five different types
of bootstrap CI (see Efron & Tibshirani, 1993; Davidson & Hinkley, 1997):
"norm"
: The bias-corrected normal approximation bootstrap CI
relies on the normal distribution based on the standard deviation of the
bootstrap samples \(\hat{\mathit{SE}}^*\). The function corrects for the
bootstrap bias, i.e., difference between the bootstrap estimate \(\hat{\theta}^*\)
and the sample statistic \(\hat{\theta}\) centering the interval at \(2\hat{\theta} - \hat{\theta}^*\).
The BC normal CI of intended coverage of \(1 - 2(\alpha/2)\) is given by
$$Normal: (\hat{\theta}_{low}, \hat{\theta}_{upp} = \hat{\theta} - (\hat{\theta}^* - \hat{\theta}) + z^{\alpha/2} \cdot \hat{\mathit{SE}}^*, \hat{\theta} - (\hat{\theta}^* - \hat{\theta}) + z^{1 - \alpha/2} \cdot \hat{\mathit{SE}}^*$$
where \(z^{\alpha/2}\) and \(z^{1 - \alpha/2}\) denotes the \(\alpha\) and the \(1 - \alpha\) quantile from the standard normal distribution.
"basic"
: The basic bootstrap (aka reverse bootstrap percentile)
CI is based on the distribution of \(\hat{\delta} = \hat{\theta} - \theta\)
which is approximated with the bootstrap distribution of
\(\hat{\delta}^* = \hat{\theta}^* - \hat{\theta}\).
$$Basic: (\hat{\theta}_{low}, \hat{\theta}_{upp} = \hat{\theta} - \hat{\delta}^{*1 - (\alpha/2)}, \hat{\theta} - \hat{\delta}^{*\alpha/2} = 2\hat{\theta} - \hat{\theta}^{*(1 - \alpha/2)} , 2\hat{\theta} - \hat{\theta}^{*(\alpha/2)})$$
"perc"
: The percentile bootstrap CI is computed by ordering the
bootstrap estimates \(\hat{\theta}^*_1, \ldots, \hat{\theta}^*_B\) to determine
the \((100(\alpha)/2)\)th and \((100(1 - \alpha)/2)\)th empirical percentile
with intended coverage of \(1 - 2(\alpha/2)\):
$$Percentile: (\hat{\theta}_{low}, \hat{\theta}_{upp} = \hat{\theta}^{*(1 - \alpha/2)}, \hat{\theta}^{*(\alpha/2)})$$
"bc"
(default): The bias-corrected (BC) percentile bootstrap CI corrects
the percentile bootstrap CI for median bias of \(\hat{\theta^*}\), i.e., the
discrepancy between the median of \(\hat{\theta}^*\) and \(\hat{\theta}\)
in normal units. The bias correction \(\hat{z}_0\) is obtained from the
proportion of bootstrap replications less than the sample estimate \(\hat{\theta}\):
$$\hat{z}_0 = \Phi^{-1}\left(\frac{\#{\hat{\theta}^*_b < \hat{\theta}}}{B}\right)$$
where \(\Phi^{-1}(.)\) represents the inverse function of the standard normal cumulative distribution function and \(B\) is the number of bootstrap replications. The BC percentile CI of intended coverage of \(1 - 2(\alpha/2)\) is given by
$$BC: (\hat{\theta}_{low}, \hat{\theta}_{upp} = \hat{\theta}^{*(\alpha_1)}, \hat{\theta}^{*(\alpha_2)})$$
where
$$\alpha_1 = \Phi(2\hat{z}_0 + z^{\alpha/2})$$
$$\alpha_2 = \Phi(2\hat{z}_0 + z^{1 - \alpha/2})$$
where \(\Phi(.)\) represents the standard normal cumulative distribution function and \(z^{\alpha/2}\) is the \(100(\alpha/2)\) percentile of a standard normal distribution.
"bca"
: The bias-corrected and accelerated (BCa) bootstrap CI
corrects the percentile bootstrap CI for median bias \(\hat{z}_0\) and
for acceleration or skewness \(\hat{a}\), i.e., the rate of change of the
standard error of \(\hat{\theta}\) with respect to the true parameter value
\(\theta\) on a normalized scale. The standard normal approximation
\(\hat{\theta} \sim N(\theta, \mathit{SE}^2)\) assumes that the standard error of
\(\hat{\theta}\) is the same for all \(\theta\). The acceleration constant
\(\hat{a}\) corrects for this unrealistic assumption and can be computed by
using jackknife resampling:
$$\hat{a} = \frac{\sum_{i=1}^{n}(\hat{\theta}_{(.)} - \hat{\theta}_{(i)})^3}{6\{\sum_{i=1}^{n}(\theta_{(.)} - \hat{\theta}_{(i)})^2\}^{3/2}}$$
where \(\hat{\theta}_{(i)}\) is the sample estimate with the \(i\)th observation deleted and \(\hat{\theta}_{(.)} = \sum_{i=1}^{n}\frac{\hat{\theta}_{(i)}}{n}\). Note that the function uses infinitesimal jackknife instead of regular leave-one-out jackknife that down-weights each observation by an infinitesimal amount of \(\frac{0.001}{n}\) instead of removing observations. The BCa percentile CI of intended coverage of \(1 - 2(\alpha/2)\) is given by
$$BCa: (\hat{\theta}_{low}, \hat{\theta}_{upp} = \hat{\theta}^{*(\alpha_1)}, \hat{\theta}^{*(\alpha_2)})$$
where
$$\alpha_1 = \Phi\left(\hat{z}_0 + \frac{\hat{z}_0 + z^{\alpha/2}}{1 - \hat{a}(\hat{z}_0 + z^{\alpha/2})}\right)$$
$$\alpha_2 = \Phi\left(\hat{z}_0 + \frac{\hat{z}_0 + z^{1 - \alpha/2}}{1 - \hat{a}(\hat{z}_0 + z^{1 - \alpha/2})}\right)$$
Note that Fisher transformation is applied before computing the confidence
intervals to reverse-transform the limits of the interval using the inverse
of the Fisher \(z\) transformation (fisher = TRUE
) when specifying
"norm"
or "basic"
for the argument boot
. In addition,
interpolation on the normal quantile scale is applied for "basic"
,
"perc"
, "bc"
, and "bca"
when a non-integer order
statistic is required (see equation 5.8 in Davison & Hinkley, 1997).
Baguley, T. (2024). CI for Spearman's rho. seriousstats. https://rpubs.com/seriousstats/616206
Bonett, D. G., & Wright, T. A. (2000). Sample size requirements for estimating Pearson, Kendall and Spearman correlations. Psychometrika, 65, 23-28. https://doi.org/10.1007/BF02294183
Bishara, A. J., & Hittner, J. B. (2012). Testing the significance of a correlation with nonnormal data: Comparison of Pearson, Spearman, transformation, and resampling approaches. Psychological Methods, 17(3), 399–417. https://doi.org/10.1037/a0028087
Bishara, A. J., & Hittner, J.B. (2017). Confidence intervals for correlations when data are not normal. Behavior Research Methods, 49, 294-309. https://doi.org/10.3758/s13428-016-0702-8
Bishara, A. J., Li, J., & Nash, T. (2018). Asymptotic confidence intervals for the Pearson correlation via skewness and kurtosis. British Journal of Mathematical and Statistical Psychology, 71(1), 167–185. https://doi.org/10.1111/bmsp.12113
Brown, M. B., & Benedetti, J. K. (1977). Sampling behavior of tests for correlation in two-way contingency tables. Journal of the American Statistical Association, 72 (358), 309-315. https://doi.org/10.1080/01621459.1977.10480995
Canty, A., & Ripley, B. (2024). boot: Bootstrap R (S-Plus) Functions. R package version 1.3-31.
Davison, A. C., & Hinkley, D. V. (1997). Bootstrap methods and their application. Cambridge University Press.
Efron, B., & Tibshirani, R. (1993). An introduction to the bootstrap. Chapman & Hall.
Fieller, E. C., Hartley, H. O., & Pearson, E. S. (1957). Tests for rank correlation coefficients: I. Biometrika, 44, 470-481. https://doi.org/10.2307/2332878
Fisher, R. A. (1921). On the “Probable Error” of a Coefficient of Correlation Deduced from a Small Sample. Metron, 1, 3-32.
Hawkins, D. L. (1989). Using U statistics to derive the asymptotic distribution of Fisher’s Z statistic. American Statistician, 43, 235–237. https://doi.org/10.2307/2685369
Hollander, M., Wolfe, D. A., & Chicken, E. (2015). Nonparametric statistical methods. Wiley.
cor.matrix
, ci.mean
, ci.mean.diff
,
ci.prop
, ci.var
, ci.sd
#----------------------------------------------------------------------------
# Pearson product-moment correlation coefficient
# Example 1a: Approximate distribution method
ci.cor(mtcars, mpg, drat, qsec)
# Alternative specification without using the '...' argument
ci.cor(mtcars[, c("mpg", "drat", "qsec")])
# Example 1b: Joint moments method
ci.cor(mtcars, mpg, drat, qsec, adjust = "joint")
#----------------------------------------------------------------------------
# Spearman's rank-order correlation coefficient
# Example 2a: Fieller et al. (1957) approximate standard error
ci.cor(mtcars, mpg, drat, qsec, method = "spearman")
# Example 2b: Bonett and Wright (2000) approximate standard error
ci.cor(mtcars, mpg, drat, qsec, method = "spearman", se = "bonett")
# Example 2c: Rank-based inverse normal (RIN) transformation
ci.cor(mtcars, mpg, drat, qsec, method = "spearman", se = "rin")
#----------------------------------------------------------------------------
# Kendall's Tau
# Example 3a: Kendall's Tau-b
ci.cor(mtcars, mpg, drat, qsec, method = "kendall-b")
# Example 3b: Kendall's Tau-c
ci.cor(mtcars, mpg, drat, qsec, method = "kendall-c")
if (FALSE) {
#----------------------------------------------------------------------------
# Bootstrap Confidence Interval (CI)
# Example 4a: Bias-corrected (BC) percentile bootstrap CI
ci.cor(mtcars, mpg, drat, qsec, boot = "bc")
# Example 4b: Bias-corrected and accelerated (BCa) bootstrap CI,
# 5000 bootstrap replications, set seed of the pseudo-random number generator
ci.cor(mtcars, mpg, drat, qsec, boot = "bca", R = 5000, seed = 123)
#----------------------------------------------------------------------------
# Grouping and Split Variable
# Example 5a: Grouping variable
ci.cor(mtcars, mpg, drat, qsec, group = "vs")
# Alternative specification without using the argument '...'
ci.cor(mtcars[, c("mpg", "drat", "qsec")], group = mtcars$vs)
# Example 5b: Split variable
ci.cor(mtcars, mpg, drat, qsec, split = "am")
# Alternative specification without using the argument '...'
ci.cor(mtcars[, c("mpg", "drat", "qsec")], split = mtcars$am)
# Example 5c: Grouping and split variable
ci.cor(mtcars, mpg, drat, qsec, group = "vs", split = "am")
# Alternative specification without using the argument '...'
ci.cor(mtcars[, c("mpg", "drat", "qsec")], group = mtcars$vs, split = mtcars$am)
#----------------------------------------------------------------------------
# Write Output
# Example 6a: Text file
ci.cor(mtcars, mpg, drat, qsec, write = "CI_Cor_Text.txt")
# Example 6b: Excel file
ci.cor(mtcars, mpg, drat, qsec, write = "CI_Cor_Excel.xlsx")
#----------------------------------------------------------------------------
# Plot Confidence Intervals
# Example 7a: Pearson product-moment correlation coefficient
ci.cor(mtcars, mpg, drat, qsec, plot = "ci")
# Example 7b: Grouping variable
ci.cor(mtcars, mpg, drat, qsec, group = "vs", plot = "ci")
# Example 7c: Split variable
ci.cor(mtcars, mpg, drat, qsec, split = "am", plot = "ci")
# Example 7d: Save plot as PDF file
ci.cor(mtcars, mpg, drat, qsec, plot = "ci", saveplot = "CI_Cor.pdf",
width = 8, height = 6)
# Example 7e: Save plot as PNG file
ci.cor(mtcars, mpg, drat, qsec, plot = "ci", saveplot = "CI_Cor.png",
width = 8, height = 6)
#----------------------------------------------------------------------------
# Plot Bootstrap Samples
# Example 8a: Pearson product-moment correlation coefficient
ci.cor(mtcars, mpg, drat, qsec, boot = "bc", plot = "boot")
# Example 8b: Grouping variable
ci.cor(mtcars, mpg, drat, qsec, group = "vs", boot = "bc", plot = "boot")
# Example 8c: Split variable
ci.cor(mtcars, mpg, drat, qsec, split = "am", boot = "bc", plot = "boot")
# Example 8d: Save plot as PDF file
ci.cor(mpg, drat, qsec, data = mtcars, boot = "bc", plot = "boot",
saveplot = "CI_Cor_Boot.pdf", width = 14, height = 9)
# Example 8e: Save plot as PNG file
ci.cor(mtcars, mpg, drat, qsec, boot = "bc", plot = "boot",
saveplot = "CI_Cor_Boot.png", width = 14, height = 9)}
Run the code above in your browser using DataLab