Learn R Programming

performance (version 0.9.1)

check_distribution: Classify the distribution of a model-family using machine learning

Description

Choosing the right distributional family for regression models is essential to get more accurate estimates and standard errors. This function may help to check a models' distributional family and see if the model-family probably should be reconsidered. Since it is difficult to exactly predict the correct model family, consider this function as somewhat experimental.

Usage

check_distribution(model)

Arguments

model

Typically, a model (that should response to residuals()). May also be a numeric vector.

Details

This function uses an internal random forest model to classify the distribution from a model-family. Currently, following distributions are trained (i.e. results of check_distribution() may be one of the following): "bernoulli", "beta", "beta-binomial", "binomial", "chi", "exponential", "F", "gamma", "lognormal", "normal", "negative binomial", "negative binomial (zero-inflated)", "pareto", "poisson", "poisson (zero-inflated)", "uniform" and "weibull".

Note the similarity between certain distributions according to shape, skewness, etc. Thus, the predicted distribution may not be perfectly representing the distributional family of the underlying fitted model, or the response value.

There is a plot() method, which shows the probabilities of all predicted distributions, however, only if the probability is greater than zero.

Examples

Run this code
if (require("lme4") && require("parameters") &&
  require("see") && require("patchwork") && require("randomForest")) {
  data(sleepstudy)

  model <<- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
  check_distribution(model)
  plot(check_distribution(model))
}

Run the code above in your browser using DataLab