Learn R Programming

poweRlaw (version 0.80.0)

get_bootstrap_sims: Estimating the lower bound (xmin)

Description

When fitting heavy tailed distributions, sometimes it is necessary to estimate the lower threshold, xmin. The lower bound is estimated by minimising the Kolmogorov-Smirnoff statistic (as described in Clauset, Shalizi, Newman (2009)).

get_KS_statistic

Calculates the KS statistic for a particular value of xmin.

estimate_xmin

Estimates the optimal lower cutoff using a goodness-of-fit based approach. This function may issue warnings when fitting lognormal, Poisson or Exponential distributions. The warnings occur for large values of xmin. Essentially, we are discarding the bulk of the distribution and cannot calculate the tails to enough accuracy.

bootstrap

Estimates the unncertainty in the xmin and parameter values via bootstrapping.

bootstrap_p

Performs a bootstrapping hypothesis test to determine whether a suggested (typically power law) distribution is plausible. This is only available for distributions that have dist_rand methods available.

Usage

get_bootstrap_sims(m, no_of_sims, seed, threads = 1)

bootstrap( m, xmins = NULL, pars = NULL, xmax = 1e+05, no_of_sims = 100, threads = 1, seed = NULL, distance = "ks" )

get_bootstrap_p_sims(m, no_of_sims, seed, threads = 1)

bootstrap_p( m, xmins = NULL, pars = NULL, xmax = 1e+05, no_of_sims = 100, threads = 1, seed = NULL, distance = "ks" )

get_distance_statistic(m, xmax = 1e+05, distance = "ks")

estimate_xmin(m, xmins = NULL, pars = NULL, xmax = 1e+05, distance = "ks")

Arguments

m

A reference class object that contains the data.

no_of_sims

number of bootstrap simulations. When no_of_sims is large, this can take a while to run.

seed

default NULL. An integer to be supplied to set.seed, or NULL not to set reproducible seeds. This argument is passed clusterSetRNGStream.

threads

number of concurrent threads used during the bootstrap.

xmins

default 1e5. A vector of possible values of xmin to explore. When a single value is passed, this represents the maximum value to search, i.e. by default we search from (1, 1e5). See details for further information.

pars

default NULL. A vector or matrix (number of columns equal to the number of parameters) of parameters used to #' optimise over. Otherwise, for each value of xmin, the mle will be used, i.e. estimate_pars(m). For small samples, the mle may be biased.

xmax

default 1e5. The maximum x value calculated when working out the CDF. See details for further information.

distance

A string containing the distance measure (or measures) to calculate. Possible values are ks or reweight. See details for further information.

Details

When estimating xmin for discrete distributions, the search space when comparing the data-cdf (empirical cdf) and the distribution_cdf runs from xmin to max(x) where x is the data set. This can often be computationally brutal. In particular, when bootstrapping we generate random numbers from the power law distribution, which has a long tail.

To speed up computations for discrete distributions it is sensible to put an upper bound, i.e. xmax and/or explicitly give values of where to search, i.e. xmin.

Occassionally bootstrapping can generate strange situations. For example, all values in the simulated data set are less then xmin. In this case, the estimated distance measure will be Inf and the parameter values, NA.

There are other possible distance measures that can be calculated. The default is the Kolomogorov Smirnoff statistic (KS). This is equation 3.9 in the CSN paper. The other measure currently available is reweight, which is equation 3.11.

Examples

Run this code
###################################################
# Load the data set and create distribution object#
###################################################
x = 1:10
m = displ$new(x)

###################################################
# Estimate xmin and pars                          #
###################################################
est = estimate_xmin(m)
m$setXmin(est)

###################################################
# Bootstrap examples                              #
###################################################
if (FALSE) {
bootstrap(m, no_of_sims=1, threads=1)
bootstrap_p(m, no_of_sims=1, threads=1)
}

Run the code above in your browser using DataLab