Learn R Programming

esvis

R Package for effect size visualization and estimation.

This package is designed to help you very quickly estimate and visualize distributional differences by categorical factors (e.g., the effect of treatment by gender and income category). Emphasis is placed on evaluating distributional differences across the entirety of the scale, rather than only by measures of central tendency (e.g., means).

Installation

Install directly from CRAN with

install.packages("esvis")

Or the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("datalorax/esvis")

Plotting methods

There are three primary data visualizations: (a) binned effect size plots, (b) probability-probability plots, and (c) empirical cumulative distribution functions. All plots use the ggplot2 package and are fully manipulable after creation using standard ggplot commands (e.g., changing the theme, labels, etc.). These plots were all produced by first running library(ggplot2); theme_set(theme_minimal()) to produce the plots with the minimal theme, but no theme structure is imposed on any of the plots.

Binned ES Plot

At present, the binned effect size plot can only be produced with Cohen’s d, although future development will allow the user to select the type of effect size. The binned effect size plot splits the distribution into quantiles specified by the user (defaults to lower, middle, and upper thirds), calculates the mean difference between groups within each quantile bin, and produces an effect size for each bin by dividing by the overall pooled standard deviation (i.e., not by quantile). For example

library(esvis)
binned_plot(benchmarks, math ~ ell)
#> Warning: `cols` is now required.
#> Please use `cols = c(data, q)`

Note that in this plot one can clearly see that the magnitude of the differences between the groups depends upon scale location, as evidence by the reversal of the effect (negative to positive) for the Non-ELL (non-English Language Learners) group. We could also change the reference group, change the level of quantile binning, and evaluate the effect within other factors. For example, we can look by season eligibility for free or reduced price lunch, with quantiles binning, and non-ELL students as the reference group with

binned_plot(benchmarks, 
            math ~ ell + frl + season, 
            ref_group = "Non-ELL",
            qtile_groups = 5)
#> Warning: `cols` is now required.
#> Please use `cols = c(data, q)`

The ref_group argument can also supplied as a formula.

PP Plots

Probability-probability plot can be produced with a call to pp_plot and an equivalent argument structure. In this case, we’re visualizing the difference in reading achievement by race/ethnicity by season.

pp_plot(benchmarks, reading ~ ethnicity + season)

Essentially, the empirical cummulative distribution function (ECDF) for the reference group (by default, the highest performing group) is mapped against the ECDF for each corresponding group. The magnitude of the achievement gap is then displayed by the distance from the diagonal reference line, representing, essentially, the ECDF for the reference group.

By default, the area under the curve is shaded, which itself is an effect-size like measure, but this is also manipulable.

ECDF Plot

Finally, the ecdf_plot function essentially dresses up the base plot.ecdf function, but also adds some nice referencing features through additional, optional arguments. Below, I have included the optional hor_ref = TRUE argument such that horizontal reference lines appear, relative to the cuts provided.

ecdf_plot(benchmarks, math ~ season, 
    cuts = c(190, 200, 215))

These are the curves that go into the PP-Plot, but occasionally can be useful on their own.

Estimation Methods

Compute effect sizes for all possible pairwise comparisons.

coh_d(benchmarks, math ~ season + frl)
#> `mutate_if()` ignored the following grouping variables:
#> Column `season`
#> # A tibble: 30 x 6
#>    season_ref frl_ref season_foc frl_foc      coh_d     coh_se
#>    <chr>      <chr>   <chr>      <chr>        <dbl>      <dbl>
#>  1 Fall       FRL     Fall       Non-FRL  0.7443868 0.07055679
#>  2 Fall       FRL     Spring     FRL      1.321191  0.04957348
#>  3 Fall       FRL     Spring     Non-FRL  2.008066  0.07873488
#>  4 Fall       FRL     Winter     FRL      0.6246112 0.04716189
#>  5 Fall       FRL     Winter     Non-FRL  1.300031  0.07326622
#>  6 Fall       Non-FRL Fall       FRL     -0.7443868 0.07055679
#>  7 Fall       Non-FRL Spring     FRL      0.5498306 0.06939873
#>  8 Fall       Non-FRL Spring     Non-FRL  1.140492  0.09189070
#>  9 Fall       Non-FRL Winter     FRL     -0.1269229 0.06934576
#> 10 Fall       Non-FRL Winter     Non-FRL  0.5009081 0.08716735
#> # … with 20 more rows

Or specify a reference group. In this case, I’ve used the formula-based interface, but a string vector specifying the specific reference group could also be supplied.

coh_d(benchmarks, 
      math ~ season + frl, 
      ref_group = ~Fall + `Non-FRL`)
#> `mutate_if()` ignored the following grouping variables:
#> Column `season`
#> # A tibble: 5 x 6
#>   season_ref frl_ref season_foc frl_foc      coh_d     coh_se
#>   <chr>      <chr>   <chr>      <chr>        <dbl>      <dbl>
#> 1 Fall       Non-FRL Fall       FRL     -0.7443868 0.07055679
#> 2 Fall       Non-FRL Spring     FRL      0.5498306 0.06939873
#> 3 Fall       Non-FRL Spring     Non-FRL  1.140492  0.09189070
#> 4 Fall       Non-FRL Winter     FRL     -0.1269229 0.06934576
#> 5 Fall       Non-FRL Winter     Non-FRL  0.5009081 0.08716735

Notice that the reference to Non-FRL is wrapped in back-ticks, which should be used anytime there are spaces or other non-standard characters.

Other effect sizes are estimated equivalently. For example, compute V (Ho, 2009) can be estimated with

v(benchmarks, 
  math ~ season + frl, 
  ref_group = ~Fall + `Non-FRL`)
#> # A tibble: 5 x 5
#> # Groups:   frl, season [1]
#>   frl_ref season_ref frl_foc season_foc          v
#>   <chr>   <chr>      <chr>   <chr>           <dbl>
#> 1 Non-FRL Fall       Non-FRL Winter      0.5070737
#> 2 Non-FRL Fall       FRL     Spring      0.5454666
#> 3 Non-FRL Fall       FRL     Winter     -0.1117226
#> 4 Non-FRL Fall       Non-FRL Spring      1.139235 
#> 5 Non-FRL Fall       FRL     Fall       -0.7051069

or AUC with

auc(benchmarks, 
    math ~ season + frl, 
    ref_group = ~Fall + `Non-FRL`)
#> # A tibble: 5 x 5
#> # Groups:   frl, season [1]
#>   frl_ref season_ref frl_foc season_foc       auc
#>   <chr>   <chr>      <chr>   <chr>          <dbl>
#> 1 Non-FRL Fall       Non-FRL Winter     0.6400361
#> 2 Non-FRL Fall       FRL     Spring     0.6501417
#> 3 Non-FRL Fall       FRL     Winter     0.4685164
#> 4 Non-FRL Fall       Non-FRL Spring     0.7897519
#> 5 Non-FRL Fall       FRL     Fall       0.3090356

Copy Link

Version

Install

install.packages('esvis')

Monthly Downloads

233

Version

0.3.1

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Daniel Anderson

Last Published

April 30th, 2020

Functions in esvis (0.3.1)

binned_plot

Quantile-binned effect size plot
hedg

Hedge's g
hedg_g

Compute Hedges' g This function calculates effect sizes in terms of Hedges' g, also called the corrected (for sample size) effect size. See coh_d for the uncorrected version. Also see Lakens (2013) for a discussion on different types of effect sizes and their interpretation. Note that missing data are removed from the calculations of the means and standard deviations.
pac_compare

Compute the difference in the proportion above a specific cut location
v

Calculate the V effect size statistic
tpac_compare

Compare Transformed Proportion Above the Cut
pac

Compute the proportion above a specific cut location
paired_ecdf

Pairs empirical cummulative distribution functions for all groups supplied by the formula.
pp_plot

Produces the paired probability plot for two groups
tpac

Transformed proportion above the cut
seda

Portion of the Stanford Educational Data Archive (SEDA).
star

Data from the Tennessee class size experiment
psd

Pooled Standard Deviation
auc

Compute the Area Under the pp_plot Curve Calculates the area under the pp curve. The area under the curve is also a useful effect-size like statistic, representing the probability that a randomly selected individual from the x distribution will have a higher value than a randomly selected individual from the y distribution.
benchmarks

Synthetic benchmark screening data
binned_es

Calculate binned effect sizes
descrip_stats

Report descriptive stats for all possible pairings on the rhs of the formula.
esvis-package

esvis: Visualization and Estimation of Effect Sizes
ecdf_fun

Computes the empirical cummulative distribution function for all groups supplied by the formula.
ecdf_plot

Empirical Cumulative Distribution Plot
coh

Cohen's d
coh_d

Compute Cohen's d