cutter: Distribution of the fitted distribution without cut.

Description

If observations is a data.frame, it can have 4 columns: A column for the measurements; A column for the lower detection limit; A column for the upper detection limit; A column for the truncated of censored nature of the data. The names of the different columns are in the observations.colname, lower_detection_limit.colname, upper_detection_limit.colname and cut_method.colname. If lower_detection_limit.colname is NULL or if the column does not exist, the data are supposed to not be left-cut and if upper_detection_limit.colname is NULL or if the column does not exist, the data are supposed to not be right-cut. If observations is a vector, then the parameters lower_detection_limit and/or upper_detection_limit must be given. Then cut_method must be also provided. In abservations, -Inf must be used to indicate a value below the lower detection limit and +Inf must be used for a value above the upper detection limit. Be careful: NA is used to represent a missing data and not a value below of above the detection limit. If lower_detection_limit, upper_detection_limit or cut_method are only one value, they are supposed to be used for all the observations. Definitions for censored or truncated distribution vary, and the two terms are sometimes used interchangeably. Let the following data set: 1 1.25 2 4 5

Censoring: some observations will be censored, meaning that we only know that they are below (or above) some bound. This can for instance occur if we measure the concentration of a chemical in a water sample. If the concentration is too low, the laboratory equipment cannot detect the presence of the chemical. It may still be present though, so we only know that the concentration is below the laboratory's detection limit.

If the detection limit is 1.5, so that observations that fall below this limit is censored, our example data set would become: <1.5 <1.5 2 4 5; that is, we don't know the actual values of the first two observations, but only that they are smaller than 1.5.

Truncation: the process generating the data is such that it only is possible to observe outcomes above (or below) the truncation limit. This can for instance occur if measurements are taken using a detector which only is activated if the signals it detects are above a certain limit. There may be lots of weak incoming signals, but we can never tell using this detector.

If the truncation limit is 1.5, our example data set would become: 2 4 5; and we would not know that there in fact were two signals which were not recorded. If n.iter is NULL, no Bayesian MCMC is performed but credible interval will not be available.

Usage

cutter(
  observations = stop("Observations must be provided"),
  observations.colname = "Observations",
  lower_detection_limit.colname = "LDL",
  upper_detection_limit.colname = "UDL",
  cut_method.colname = "Cut",
  par = NULL,
  lower_detection_limit = NULL,
  upper_detection_limit = NULL,
  cut_method = "censored",
  distribution = "gamma",
  n.mixture = 1,
  n.iter = 5000,
  debug = FALSE,
  progress.bar = TRUE,
  session = NULL
)

Arguments

observations

The observations; see description

observations.colname

If observations is a data.frame, the name of column with observations

lower_detection_limit.colname

If observations is a data.frame, the name of column with lower detection limit

upper_detection_limit.colname

If observations is a data.frame, the name of column with upper detection limit

cut_method.colname

If observations is a data.frame, the name of column with cut method, being "censored" or "truncated"

par

Initial values for parameters of distribution

lower_detection_limit

Value for lower detection limit

upper_detection_limit

Value for upper detection limit

cut_method

Value for cut method, being "censored" or "truncated"

distribution

Can be gamma, normal, weibull, lognormal, or generalized.gamma

n.mixture

Number of distributions

n.iter

Number of iteration for Bayesian MCMC and to estimate the goodness-of-fit

debug

If TRUE, show some information

progress.bar

If TRUE, show a progress bar for MCMC

session

The session of a shiny process

Value

The parameters of distribution of values below or above the detection limit.

Details

cutter returns the fitted distribution without cut

Examples

Run this code

# NOT RUN {
library(HelpersMG)
# _______________________________________________________________
# right censored distribution with gamma distribution
# _______________________________________________________________
# Detection limit
DL <- 100
# Generate 100 random data from a gamma distribution
obc <- rgamma(100, scale=20, shape=2)
# remove the data below the detection limit
obc[obc>DL] <- +Inf
# search for the parameters the best fit these censored data
result <- cutter(observations=obc, upper_detection_limit=DL, 
                           cut_method="censored")
result
plot(result, xlim=c(0, 150), breaks=seq(from=0, to=150, by=10), col.mcmc=NULL)
plot(result, xlim=c(0, 150), breaks=seq(from=0, to=150, by=10))
# _______________________________________________________________
# The same data seen as truncated data with gamma distribution
# _______________________________________________________________
obc <- obc[is.finite(obc)]
# search for the parameters the best fit these truncated data
result <- cutter(observations=obc, upper_detection_limit=DL, 
                           cut_method="truncated")
result
plot(result, xlim=c(0, 150), breaks=seq(from=0, to=150, by=10))
# _______________________________________________________________
# left censored distribution with gamma distribution
# _______________________________________________________________
# Detection limit
DL <- 10
# Generate 100 random data from a gamma distribution
obc <- rgamma(100, scale=20, shape=2)
# remove the data below the detection limit
obc[obc<DL] <- -Inf
# search for the parameters the best fit these truncated data
result <- cutter(observations=obc, lower_detection_limit=DL, 
                          cut_method="censored")
result
plot(result, xlim=c(0, 200), breaks=seq(from=0, to=300, by=10))
# _______________________________________________________________
# left censored distribution with mixture of gamma distribution
# _______________________________________________________________
#' # Detection limit
library(HelpersMG)
# Generate 200 random data from a gamma distribution
set.seed(1234)
obc <- c(rgamma(100, scale=10, shape=5), rgamma(100, scale=20, shape=10))
LDL <- 20
l <- seq(from=0, to=LDL, length.out=1001)
p <- pgamma(l, scale=10, shape=5)*0.5+pgamma(l, scale=20, shape=10)
deltal <- l[2]-l[1]
expected_LDL <- sum((l[-1]-deltal/2)*(p[-1]-p[-length(p)]))/sum((p[-1]-p[-length(p)]))
# remove the data below the detection limit
obc[obc<LDL] <- -Inf

UDL <- 300
l <- seq(from=UDL, to=1000, length.out=1001)
p <- pgamma(l, scale=10, shape=5)*0.5+pgamma(l, scale=20, shape=10)
deltal <- l[2]-l[1]
expected_UDL <- sum((l[-1]-deltal/2)*(p[-1]-p[-length(p)]))/sum((p[-1]-p[-length(p)]))
obc[obc>UDL] <- +Inf

# search for the parameters the best fit these truncated data
result1_gamma <- cutter(observations=obc, lower_detection_limit=LDL, 
                          upper_detection_limit = UDL, 
                          distribution="gamma", 
                          cut_method="censored", n.iter=5000, debug=0)
result1_normal <- cutter(observations=obc, lower_detection_limit=LDL, 
                          upper_detection_limit = UDL, 
                          distribution="normal", 
                          cut_method="censored", n.iter=5000)
result1_lognormal <- cutter(observations=obc, lower_detection_limit=LDL, 
                          upper_detection_limit = UDL, 
                          distribution="lognormal", 
                          cut_method="censored", n.iter=5000)
result1_Weibull <- cutter(observations=obc, lower_detection_limit=LDL, 
                          upper_detection_limit = UDL,  
                          distribution="Weibull", 
                          cut_method="censored", n.iter=5000)
result1_generalized.gamma <- cutter(observations=obc, lower_detection_limit=LDL, 
                          upper_detection_limit = UDL, 
                          distribution="generalized.gamma", 
                          cut_method="censored", n.iter=5000)
result2_gamma <- cutter(observations=obc, lower_detection_limit=LDL, 
                          upper_detection_limit = UDL, 
                          distribution="gamma", 
                          n.mixture=2, 
                          cut_method="censored", n.iter=5000)
result2_normal <- cutter(observations=obc, lower_detection_limit=LDL, 
                          upper_detection_limit = UDL, 
                          distribution="normal", 
                          n.mixture=2, 
                          cut_method="censored", n.iter=5000)
result2_lognormal <- cutter(observations=obc, lower_detection_limit=LDL, 
                          upper_detection_limit = UDL, 
                          distribution="lognormal", 
                          n.mixture=2, 
                          cut_method="censored", n.iter=5000)
result2_Weibull <- cutter(observations=obc, lower_detection_limit=LDL, 
                          upper_detection_limit = UDL, 
                          distribution="Weibull", 
                          n.mixture=2, 
                          cut_method="censored", n.iter=5000)
result2_generalized.gamma <- cutter(observations=obc, lower_detection_limit=LDL, 
                          upper_detection_limit = UDL, 
                          distribution="generalized.gamma", 
                          n.mixture=2, 
                          cut_method="censored", n.iter=5000)
                          
compare_AIC(nomixture.gamma=result1_gamma, 
             nomixture.normal=result1_normal, 
             nomixture.lognormal=result1_lognormal, 
             nomixture.Weibull=result1_Weibull, 
             nomixture.generalized.gamma=result1_generalized.gamma, 
             mixture.gamma=result2_gamma, 
             mixture.normal=result2_normal, 
             mixture.lognormal=result2_lognormal, 
             mixture.Weibull=result2_Weibull, 
             mixture.generalized.gamma=result2_generalized.gamma)
             
plot(result2_gamma, xlim=c(0, 600), breaks=seq(from=0, to=600, by=10))
plot(result2_generalized.gamma, xlim=c(0, 600), breaks=seq(from=0, to=600, by=10))

# _______________________________________________________________
# left and right censored distribution
# _______________________________________________________________
# Generate 100 random data from a gamma distribution
obc <- rgamma(100, scale=20, shape=2)
# Detection limit
LDL <- 10
# remove the data below the detection limit
obc[obc<LDL] <- -Inf
# Detection limit
UDL <- 100
# remove the data below the detection limit
obc[obc>UDL] <- +Inf
# search for the parameters the best fit these censored data
result <- cutter(observations=obc, lower_detection_limit=LDL, 
                           upper_detection_limit=UDL, 
                          cut_method="censored")
result
plot(result, xlim=c(0, 150), col.DL=c("black", "grey"), 
                             col.unobserved=c("green", "blue"), 
     breaks=seq(from=0, to=150, by=10))
# _______________________________________________________________
# Example with two values for lower detection limits
# corresponding at two different methods of detection for example
# with gamma distribution
# _______________________________________________________________
obc <- rgamma(50, scale=20, shape=2)
# Detection limit for sample 1 to 50
LDL1 <- 10
# remove the data below the detection limit
obc[obc<LDL1] <- -Inf
obc2 <- rgamma(50, scale=20, shape=2)
# Detection limit for sample 1 to 50
LDL2 <- 20
# remove the data below the detection limit
obc2[obc2<LDL2] <- -Inf
obc <- c(obc, obc2)
# search for the parameters the best fit these censored data
result <- cutter(observations=obc, 
                           lower_detection_limit=c(rep(LDL1, 50), rep(LDL2, 50)), 
                          cut_method="censored")
result
# It is difficult to choose the best set of colors
plot(result, xlim=c(0, 150), col.dist="red", 
     col.unobserved=c(rgb(red=1, green=0, blue=0, alpha=0.1), 
                      rgb(red=1, green=0, blue=0, alpha=0.2)), 
     col.DL=c(rgb(red=0, green=0, blue=1, alpha=0.5), 
                      rgb(red=0, green=0, blue=1, alpha=0.9)), 
     breaks=seq(from=0, to=200, by=10))
     
# ________________________________________________________________________
# left censored distribution comparison of normal, lognormal, 
# weibull, generalized gamma, and gamma without Bayesian MCMC
# Comparison with Akaike Information Criterion
# ________________________________________________________________________
# Detection limit
DL <- 10
# Generate 100 random data from a gamma distribution
obc <- rgamma(100, scale=20, shape=2)
# remove the data below the detection limit
obc[obc<DL] <- -Inf

result_gamma <- cutter(observations=obc, lower_detection_limit=DL, 
                          cut_method="censored", distribution="gamma", 
                          n.iter=NULL)
plot(result_gamma)

result_lognormal <- cutter(observations=obc, lower_detection_limit=DL, 
                          cut_method="censored", distribution="lognormal", 
                          n.iter=NULL)
plot(result_lognormal)

result_weibull <- cutter(observations=obc, lower_detection_limit=DL, 
                          cut_method="censored", distribution="weibull", 
                          n.iter=NULL)
plot(result_weibull)

result_normal <- cutter(observations=obc, lower_detection_limit=DL, 
                          cut_method="censored", distribution="normal", 
                          n.iter=NULL)
plot(result_normal)

result_generalized.gamma <- cutter(observations=obc, lower_detection_limit=DL, 
                          cut_method="censored", distribution="generalized.gamma", 
                          n.iter=NULL)
plot(result_generalized.gamma)

compare_AIC(gamma=result_gamma, 
            lognormal=result_lognormal, 
            normal=result_normal, 
            Weibull=result_weibull, 
            Generalized.gamma=result_generalized.gamma)

# ______________________________________________________________________________
# left censored distribution comparison of normal, lognormal, 
# weibull, generalized gamma, and gamma
# ______________________________________________________________________________
# Detection limit
DL <- 10
# Generate 100 random data from a gamma distribution
obc <- rgamma(100, scale=20, shape=2)
# remove the data below the detection limit
obc[obc<DL] <- -Inf
# search for the parameters the best fit these truncated data
result_gamma <- cutter(observations=obc, lower_detection_limit=DL, 
                          cut_method="censored", distribution="gamma")
result_gamma
plot(result_gamma, xlim=c(0, 250), breaks=seq(from=0, to=250, by=10))

result_lognormal <- cutter(observations=obc, lower_detection_limit=DL, 
                          cut_method="censored", distribution="lognormal")
result_lognormal
plot(result_lognormal, xlim=c(0, 250), breaks=seq(from=0, to=250, by=10))

result_weibull <- cutter(observations=obc, lower_detection_limit=DL, 
                          cut_method="censored", distribution="weibull")
result_weibull
plot(result_weibull, xlim=c(0, 250), breaks=seq(from=0, to=250, by=10))

result_normal <- cutter(observations=obc, lower_detection_limit=DL, 
                          cut_method="censored", distribution="normal")
result_normal
plot(result_normal, xlim=c(0, 250), breaks=seq(from=0, to=250, by=10))

result_generalized.gamma <- cutter(observations=obc, lower_detection_limit=DL, 
                          cut_method="censored", distribution="generalized.gamma")
result_generalized.gamma
plot(result_generalized.gamma, xlim=c(0, 250), breaks=seq(from=0, to=250, by=10))

# ___________________________________________________________________
# Test for similarity in gamma left censored distribution between two
# datasets
# ___________________________________________________________________
obc1 <- rgamma(100, scale=20, shape=2)
# Detection limit for sample 1 to 50
LDL <- 10
# remove the data below the detection limit
obc1[obc1<LDL] <- -Inf
obc2 <- rgamma(100, scale=10, shape=2)
# remove the data below the detection limit
obc2[obc2<LDL] <- -Inf
# search for the parameters the best fit these censored data
result1 <- cutter(observations=obc1, 
                  distribution="gamma", 
                  lower_detection_limit=LDL, 
                  cut_method="censored", n.iter=NULL)
logLik(result1)
plot(result1, xlim=c(0, 200), 
     breaks=seq(from=0, to=200, by=10))
result2 <- cutter(observations=obc2, 
                  distribution="gamma", 
                  lower_detection_limit=LDL, 
                  cut_method="censored", n.iter=NULL)
logLik(result2)
plot(result2, xlim=c(0, 200), 
     breaks=seq(from=0, to=200, by=10))
result_totl <- cutter(observations=c(obc1, obc2), 
                      distribution="gamma", 
                      lower_detection_limit=LDL, 
                      cut_method="censored", n.iter=NULL)
logLik(result_totl)
plot(result_totl, xlim=c(0, 200), 
     breaks=seq(from=0, to=200, by=10))
     
compare_AIC(Separate=list(result1, result2), 
            Common=result_totl, factor.value=1)
compare_BIC(Separate=list(result1, result2), 
            Common=result_totl, factor.value=1)           
# }

Run the code above in your browser using DataLab