EnvStats-package: Package for Environmental Statistics, Including US EPA Guidance

Description

A comprehensive R package for environmental statistics and the successor to the S-PLUS module EnvironmentalStats for S-PLUS (first released in April, 1997). EnvStats provides a set of powerful functions for graphical and statistical analyses of environmental data, with a focus on analyzing chemical concentrations and physical parameters, usually in the context of mandated environmental monitoring. It includes major environmental statistical methods found in the literature and regulatory guidance documents, and extensive help that explains what these methods do, how to use them, and where to find them in the literature. It also includes numerous built-in data sets from regulatory guidance documents and environmental statistics literature, and scripts reproducing analyses presented in the user's manual (Millard, 2013). For a complete list of functions and datasets, you can do any of the following:

See the help fileFunctions By Categoryfor a listing of functions by category.
If you are in the on-line help, scroll to the bottom of this help page and click on theIndexlink.
Typelibrary(help="EnvStats")at the command prompt.

Note: The names of all EnvStats functions start with a lowercase letter, and the names of all EnvStats datasets and data objects start an uppercase letter. You can type newsEnvStats() at the Rcommand prompt for the latest news for the EnvStats package.

Arguments

Details

ll{ Package: EnvStats Type: Package Version: 2.1.0 Date: 2016-04-18 License: GPL (>=3) LazyLoad: yes } A companion file EnvStats-manual.pdf containing a listing of all the current help files is located on the RCRAN web site at http://cran.r-project.org/web/packages/EnvStats/EnvStats.pdf and also in the doc subdirectory of the directory where the EnvStats package was installed. For example, if you installed R under Windows, this file might be located in the directory C:\Program Files\R-*.**.*\library\EnvStats\doc, where *.**.* denotes the version of R you are using (e.g., 3.2.5) or in the directory C:\Users\Name\Documents\R\win-library\*.**.*\EnvStats\doc, where Name denotes your user name on the Windows operating system. EnvStats comes with companion scripts, located in the scripts subdirectory of the directory where the package was installed. One set of scripts lets you reproduce the examples in the User's Manual (currently is still in preparation). There are also scripts that let you reproduce examples from US EPA guidance documents. See the References section below for documentation for the predecessor to EnvStats, EnvironmentalStats for S-PLUS for Windows. Features of EnvStats include:

New functions for computingsummary statisticsand creatingsummary plotsto compare the distributions of groups side-by-side.
Newprobability distributionshave been added to the ones already available inR, including the extreme value distribution and the zero-modified lognormal (delta) distribution. You can compute quantities associated with these probability distributions (probability density functions, cumulative distribution functions, and quantiles), and generate random numbers from these distributions.
Plot probability distributionsso you can see how they change with the value of the distribution parameter(s).
Estimate distribution parametersanddistribution quantiles, and compute confidence intervals for commonly used probability distributions, including special methods for the lognormal and gamma distributions.
Perform and plot the results ofgoodness-of-fit tests:
- Observed and Fitted Distributions
- Quantile-Quantile Plots
- Results of Shaprio-Wilk test, Kolmogorov-Smirnov test, etc.
Includes a new generalized goodness-of-fit test for any continuous distribution.
Functions for assessing optimalBox-Cox data transformations.
Compute parametric and non-parametricprediction intervals, simultaneous prediction intervals, andtolerance intervals.
New functions forhypothesis tests, including:
- Nonparametric estimation and tests for seasonal trend
- Fisher's one-sample randomization (permutation) test for location
- Quantile test to detect a shift in the tail of one population relative to another
- Two-sample linear rank tests
- Test for serial correlation based on von Neumann rank test
Performcalibrationbased on a machine signal to determine decision and detection limits and report estimated concentrations along with confidence intervals.
Easily performpower and sample sizecomputations and create companion plots for sampling designs based on confidence intervals, hypothesis tests, prediction intervals, and tolerance intervals.
Handle singly and multiplycensored (less-than-detection-limit) data:
- Empirical CDF and Quantile-Quantile Plots
- Parameter/Quantile Estimation and Confidence Intervals
- Prediction and Tolerance Intervals
- Goodness-of-Fit Tests
- Optimal Box-Cox Transformations
- Two-Sample Rank Tests
Functions for performingMonte Carlo simulation and probabilistic risk assessement.
Reproduce specific examples in EPA guidance documents by using built-in data sets from these documents and running companion scripts.

References

Millard, S.P. (2013). EnvStats: An R Package for Environmental Statistics. Springer, New York. Millard, S.P. (2002). EnvironmentalStats for S-PLUS: User's Manual for Version 2.0. Second Edition. Springer-Verlag, New York. Millard, S.P., and N.K. Neerchal. (2001). Environmental Statistics with S-PLUS. CRC Press, Boca Raton, FL.

Examples

Run this code

# Look at plots and summary statistics for the TcCB data given in 
  # USEPA (1994b), (the data are stored in EPA.94b.tccb.df). 
  # Arbitrarily set the one censored observation to the censoring level. 
  # Group by the variable Area.

  EPA.94b.tccb.df
  #    TcCB.orig   TcCB Censored      Area
  #1        0.22   0.22    FALSE Reference
  #2        0.23   0.23    FALSE Reference
  #...
  #46       1.20   1.20    FALSE Reference
  #47       1.33   1.33    FALSE Reference
  #48      <0.09   0.09     TRUE   Cleanup
  #49       0.09   0.09    FALSE   Cleanup
  #...
  #123     51.97  51.97    FALSE   Cleanup
  #124    168.64 168.64    FALSE   Cleanup


  # First plot the data
  #--------------------
  dev.new()
  stripChart(TcCB ~ Area, data = EPA.94b.tccb.df, 
    xlab = "Area", ylab = "TcCB (ppb)")
  mtext("TcCB Concentrations by Area", line = 3, cex = 1.25, font = 2)

  dev.new()
  stripChart(log10(TcCB) ~ Area, data = EPA.94b.tccb.df, 
    p.value = TRUE, 
    xlab = "Area", ylab = expression(paste(log[10], "[ TcCB (ppb) ]")))
  mtext(expression(paste(log[10], "(TcCB) Concentrations by Area")), 
    line = 3, cex = 1.25, font = 2)

  #--------------------------------------------------------------------

  # Now compute summary statistics
  #-------------------------------
  
  sum(EPA.94b.tccb.df$Censored) 
  #[1] 1 

  with(EPA.94b.tccb.df, TcCB[Censored])
  #0.09 

  # Summary statistics will treat the one censored value 
  # as assuming the detection limit.

  summaryFull(TcCB ~ Area, data = EPA.94b.tccb.df)
  #                             Cleanup  Reference
  #N                             77       47      
  #Mean                           3.915    0.5985 
  #Median                         0.43     0.54   
  #10% Trimmed Mean               0.6846   0.5728 
  #Geometric Mean                 0.5784   0.5382 
  #Skew                           7.717    0.9019 
  #Kurtosis                      62.67     0.132  
  #Min                            0.09     0.22   
  #Max                          168.6      1.33   
  #Range                        168.5      1.11   
  #1st Quartile                   0.23     0.39   
  #3rd Quartile                   1.1      0.75   
  #Standard Deviation            20.02     0.2836 
  #Geometric Standard Deviation   3.898    1.597  
  #Interquartile Range            0.87     0.36   
  #Median Absolute Deviation      0.3558   0.2669 
  #Coefficient of Variation       5.112    0.4739

  summaryStats(TcCB ~ Area, data = EPA.94b.tccb.df, digits = 1)
  #           N Mean   SD Median Min   Max
  #Cleanup   77  3.9 20.0    0.4 0.1 168.6
  #Reference 47  0.6  0.3    0.5 0.2   1.3

  #----------------------------------------------------------------

  # Compute Shapiro-Wilk Goodness-of-Fit statistic for the 
  # Reference Area TcCB data assuming a lognormal distribution
  #-----------------------------------------------------------
  
  sw.list <- gofTest(TcCB ~ 1, data = EPA.94b.tccb.df, 
    subset = Area == "Reference", dist = "lnorm")
  sw.list

  # Results of Goodness-of-Fit Test
  # -------------------------------
  #
  # Test Method:                     Shapiro-Wilk GOF
  #
  # Hypothesized Distribution:       Lognormal
  #
  # Estimated Parameter(s):          meanlog = -0.6195712
  #                                  sdlog   =  0.4679530
  #
  # Estimation Method:               mvue
  #
  # Data:                            TcCB
  #
  # Subset With:                     Area == "Reference"
  #
  # Data Source:                     EPA.94b.tccb.df
  #
  # Sample Size:                     47
  #
  # Test Statistic:                  W = 0.978638
  #
  # Test Statistic Parameter:        n = 47
  #
  # P-value:                         0.5371935
  #
  # Alternative Hypothesis:          True cdf does not equal the
  #                                  Lognormal Distribution.

  #----------

  # Plot results of GOF test
  dev.new()
  plot(sw.list)

  #----------------------------------------------------------------

  # Based on the Reference Area data, estimate 90th percentile 
  # and compute a 95\% confidence limit for the 90th percentile 
  # assuming a lognormal distribution.
  #------------------------------------------------------------

  with(EPA.94b.tccb.df, 
    eqlnorm(TcCB[Area == "Reference"], p = 0.9, ci = TRUE))

  # Results of Distribution Parameter Estimation
  # --------------------------------------------
  #
  # Assumed Distribution:            Lognormal
  #
  # Estimated Parameter(s):          meanlog = -0.6195712
  #                                  sdlog   =  0.4679530
  #
  # Estimation Method:               mvue
  #
  # Estimated Quantile(s):           90'th \%ile = 0.9803307
  #
  # Quantile Estimation Method:      qmle
  #
  # Data:                            TcCB[Area == "Reference"]
  #
  # Sample Size:                     47
  #
  # Confidence Interval for:         90'th \%ile
  #
  # Confidence Interval Method:      Exact
  #
  # Confidence Interval Type:        two-sided
  #
  # Confidence Level:                95\%
  #
  # Confidence Interval:             LCL = 0.8358791
                                     UCL = 1.2154977
  #----------

  # Cleanup
  rm(TcCB.ref, sw.list)

Run the code above in your browser using DataLab