Learn R Programming

distributional

The distributional package allows distributions to be used in a vectorised context. It provides methods which are minimal wrappers to the standard d, p, q, and r distribution functions which are applied to each distribution in the vector. Additional distributional statistics can be computed, including the mean(), median(), variance(), and intervals with hilo().

The distributional nature of a model’s predictions is often understated, with default output of prediction methods usually only producing point predictions. Some R packages (such as forecast) further emphasise uncertainty by producing point forecasts and intervals by default, however the user’s ability to interact with them is limited. This package vectorises distributions and provides methods for working with them, making distributions compatible with prediction outputs of modelling functions. These vectorised distributions can be illustrated with ggplot2 using the ggdist package, providing further opportunity to visualise the uncertainty of predictions and teach distributional theory.

Installation

You can install the released version of distributional from CRAN with:

install.packages("distributional")

The development version can be installed from GitHub with:

# install.packages("remotes")
remotes::install_github("mitchelloharawild/distributional")

Examples

Distributions are created using dist_*() functions. A list of included distribution shapes can be found here: https://pkg.mitchelloharawild.com/distributional/reference/

library(distributional)
my_dist <- c(dist_normal(mu = 0, sigma = 1), dist_student_t(df = 10))
my_dist
#> <distribution[2]>
#> [1] N(0, 1)     t(10, 0, 1)

The standard four distribution functions in R are usable via these generics:

density(my_dist, 0) # c(dnorm(0, mean = 0, sd = 1), dt(0, df = 10))
#> [1] 0.3989423 0.3891084
cdf(my_dist, 5) # c(pnorm(5, mean = 0, sd = 1), pt(5, df = 10))
#> [1] 0.9999997 0.9997313
quantile(my_dist, 0.1) # c(qnorm(0.1, mean = 0, sd = 1), qt(0.1, df = 10))
#> [1] -1.281552 -1.372184
generate(my_dist, 10) # list(rnorm(10, mean = 0, sd = 1), rt(10, df = 10))
#> [[1]]
#>  [1]  1.262954285 -0.326233361  1.329799263  1.272429321  0.414641434
#>  [6] -1.539950042 -0.928567035 -0.294720447 -0.005767173  2.404653389
#> 
#> [[2]]
#>  [1]  0.99165484 -1.36999677 -0.40943004 -0.85261144 -1.37728388  0.81020460
#>  [7] -1.82965813 -0.06142032 -1.33933588 -0.28491414

You can also compute intervals using hilo()

hilo(my_dist, 0.95)
#> <hilo[2]>
#> [1] [-0.01190677, 0.01190677]0.95 [-0.01220773, 0.01220773]0.95

Additionally, some distributions may support other methods such as mathematical operations and summary measures. If the methods aren’t supported, a transformed distribution will be created.

my_dist
#> <distribution[2]>
#> [1] N(0, 1)     t(10, 0, 1)
my_dist*3 + 2
#> <distribution[2]>
#> [1] N(2, 9)        t(t(10, 0, 1))
mean(my_dist)
#> [1] 0 0
variance(my_dist)
#> [1] 1.00 1.25

You can also visualise the distribution(s) using the ggdist package.

library(ggdist)
library(ggplot2)

df <- data.frame(
  name = c("Gamma(2,1)", "Normal(5,1)", "Mixture"),
  dist = c(dist_gamma(2,1), dist_normal(5,1),
           dist_mixture(dist_gamma(2,1), dist_normal(5, 1), weights = c(0.4, 0.6)))
)

ggplot(df, aes(y = factor(name, levels = rev(name)))) +
  stat_dist_halfeye(aes(dist = dist)) + 
  labs(title = "Density function for a mixture of distributions", y = NULL, x = NULL)

Related work

There are several packages which unify interfaces for distributions in R:

  • stats provides functions to work with possibly multiple distributions (comparisons made below).
  • distributions3 represents singular distributions using S3, with particularly nice documentation. This package makes use of some code and documentation from this package.
  • distr represents singular distributions using S4.
  • distr6 represents singular distributions using R6.
  • Many more in the CRAN task view

This package differs from the above libraries by storing the distributions in a vectorised format. It does this using vctrs, so it should play nicely with the tidyverse (try putting distributions into a tibble!).

Copy Link

Version

Install

install.packages('distributional')

Monthly Downloads

62,193

Version

0.5.0

License

GPL-3

Issues

Pull Requests

Stars

Forks

Last Published

September 17th, 2024

Functions in distributional (0.5.0)

dist_gamma

The Gamma distribution
dist_gk

The g-and-k Distribution
dist_geometric

The Geometric Distribution
dist_gumbel

The Gumbel distribution
dist_gh

The generalised g-and-h Distribution
dist_exponential

The Exponential Distribution
dist_f

The F Distribution
dist_inflated

Inflate a value of a probability distribution
dist_pareto

The Pareto distribution
dist_inverse_exponential

The Inverse Exponential distribution
dist_normal

The Normal distribution
dist_hypergeometric

The Hypergeometric distribution
dist_gpd

The Generalized Pareto Distribution
dist_uniform

The Uniform distribution
dist_poisson_inverse_gaussian

The Poisson-Inverse Gaussian distribution
median.distribution

Median of a probability distribution
dist_cauchy

The Cauchy distribution
dist_lognormal

The log-normal distribution
dist_inverse_gamma

The Inverse Gamma distribution
new_dist

Create a new distribution
dist_missing

Missing distribution
likelihood

The (log) likelihood of a sample matching a distribution
dist_sample

Sampling distribution
mean.distribution

Mean of a probability distribution
new_support_region

Create a new support region vector
dist_truncated

Truncate a distribution
dist_transformed

Modify a distribution with a transformation
hdr

Compute highest density regions
dist_weibull

The Weibull distribution
dist_inverse_gaussian

The Inverse Gaussian distribution
dist_multivariate_normal

The multivariate normal distribution
parameters

Extract the parameters of a distribution
dist_negative_binomial

The Negative Binomial distribution
hdr.distribution

Highest density regions of probability distributions
generate.distribution

Randomly sample values from a distribution
family.distribution

Extract the name of the distribution family
skewness

Skewness of a probability distribution
is_distribution

Test if the object is a distribution
is_hdr

Is the object a hdr
dist_percentile

Percentile distribution
dist_chisq

The (non-central) Chi-Squared Distribution
dist_degenerate

The degenerate distribution
dist_logarithmic

The Logarithmic distribution
dist_logistic

The Logistic distribution
quantile.distribution

Distribution Quantiles
reexports

Objects exported from other packages
support

Region of support of a distribution
variance

Variance
dist_poisson

The Poisson Distribution
dist_wrap

Create a distribution from p/d/q/r style functions
dist_mixture

Create a mixture of distributions
hilo.distribution

Probability intervals of a probability distribution
new_hdr

Construct hdr intervals
hilo

Compute intervals
new_hilo

Construct hilo intervals
distributional-package

distributional: Vectorised Probability Distributions
dist_multinomial

The Multinomial distribution
dist_studentized_range

The Studentized Range distribution
is_hilo

Is the object a hilo
dist_student_t

The (non-central) location-scale Student t Distribution
variance.distribution

Variance of a probability distribution
kurtosis

Kurtosis of a probability distribution
covariance

Covariance
dist_bernoulli

The Bernoulli distribution
dist_categorical

The Categorical distribution
cdf

The cumulative distribution function
dist_binomial

The Binomial distribution
dist_burr

The Burr distribution
dist_beta

The Beta distribution
covariance.distribution

Covariance of a probability distribution
density.distribution

The probability density/mass function
dist_gev

The Generalized Extreme Value Distribution