summaryFull: Full Complement of Summary Statistics

Description

summaryFull is a generic function used to produce a full complement of summary statistics. The function invokes particular methods which depend on the class of the first argument. The summary statistics include: sample size, number of missing values, mean, median, trimmed mean, geometric mean, skew, kurtosis, min, max, range, 1st quartile, 3rd quartile, standard deviation, geometric standard deviation, interquartile range, median absolute deviation, and coefficient of variation.

Usage

summaryFull(object, ...)

## S3 method for class 'formula':
summaryFull(object, data = NULL, subset, 
  na.action = na.pass, ...)

## S3 method for class 'default':
summaryFull(object, group = NULL, 
    combine.groups = FALSE, drop.unused.levels = TRUE, 
    rm.group.na = TRUE, stats = NULL, trim = 0.1, 
    sd.method = "sqrt.unbiased", geo.sd.method = "sqrt.unbiased", 
    skew.list = list(), kurtosis.list = list(), 
    cv.list = list(), digits = max(3, getOption("digits") - 3), 
    digit.type = "signif", stats.in.rows = TRUE, 
    drop0trailing = TRUE, data.name = deparse(substitute(object)), 
    ...)

## S3 method for class 'data.frame':
summaryFull(object, ...)

## S3 method for class 'matrix':
summaryFull(object, ...)

## S3 method for class 'list':
summaryFull(object, ...)

Arguments

object

an object for which summary statistics are desired. In the default method, the argument object must be a numeric vector, a data frame, a matrix, or a list. When object is a data frame, all columns must be numeric.

data

when object is a formula, data specifies an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data

subset

when object is a formula, subset specifies an optional vector specifying 
  a subset of observations to be used.

na.action

when object is a formula, na.action specifies a function which indicates 
  what should happen when the data contain NAs. The default is na.pass.

group

when object is a numeric vector, group is a factor or character vector 
  indicating which group each observation belongs to.  When object is a matrix or data frame
  this argument is ignored and the columns define

combine.groups

logical scalar indicating whether to show summary statistics for all groups combined.  
  The default value is FALSE.

drop.unused.levels

when drop.unused.levels=TRUE, groups with no observations are dropped.

rm.group.na

logical scalar indicating whether to remove missing values from the group argument.  By 
  default rm.group.na=TRUE.

stats

character vector indicating which statistics to compute.  Possible elements of the character 
  vector include:  "all" (indicating to include all summary statistics), 
  "for.non.pos" (only compute statistics that are meaningfu

trim

fraction (between 0 and 0.5 inclusive) of values to be trimmed from each end of the ordered data 
  to compute the trimmed mean.  The default value is trim=0.1.  
  If trim=0.5, this yields the median.

sd.method

character string specifying what method to use to compute the sample standard deviation.  
  The possible values are "sqrt.ubiased" (the square root of the unbiased estimate of variance; 
  the default), or "moments" (the metho

geo.sd.method

character string specifying what method to use to compute the sample standard deviation of the 
  log-transformed observations prior to exponentiating this quantity.  The possible values are 
  "sqrt.ubiased" (the square root of the unbiase

skew.list

list of arguments to supply to the skewness function.  See the help file for 
  skewness for more information.  The default value is skew.list=list(

kurtosis.list

list of arguments to supply to the kurtosis function.  See the help file for 
  kurtosis for more information.  The default value is 
kurtosis.list=

cv.list

list of arguments to supply to the cv function.  See the help file for cv 
  for more information.  The default value is cv.list=list(), which results in

digits

integer indicating the number of digits to use for the summary statistics.  
  When digit.type="signif", digits indicates the number of significant 
  digits.  When digit.type="round", digits indicates

digit.type

character string indicating whether the digits argument refers to significant digits 
  (digit.type="signif", the default), or how many decimal places to round to 
  (digit.type="round").

stats.in.rows

logical scalar indicating whether to show the summary statistics in the rows or columns of the 
  output.  The default is stats.in.rows=TRUE.

drop0trailing

logical scalar indicating whether to drop trailing 0's when printing the summary statistics.  
  The value of this argument is added as an attribute to the returned list and is used by the 
  print.summar

data.name

character string indicating the name of the data used for the summary statistics.

...

additional arguments affecting the summary statistics produced.

`Value`

an object of class "summaryStats" (see summaryStats.object.  
  Objects of class "summaryStats" are numeric matrices that contain the 
  summary statisics produced by a call to summaryStats or summaryFull.  
  These objects have a special printing method that by default removes 
  trailing zeros for sample size entries and prints blanks for statistics that are 
  normally displayed as NA (see print.summaryStats).

`Details`

The function summaryFull returns summary statistics that are useful to describe various 
  characteristics of one or more variables.  It is an extended version of the built-in R function 
  summary specifically for non-factor numeric data.  The table below shows what 
  statistics are computed and what functions are called by summaryFull to compute these statistics.

  The object returned by summaryFull is useful for printing or report purposes.  You may also 
  use the functions that summaryFull calls (see table below) to compute summary statistics to 
  be used by other functions.

  See the help files for the functions listed in the table below for more information on these 
  summary statistics.

  ll{
  Summary Statistic     	Function Used               
Mean                         	mean                            
Median                       	median                          
Trimmed Mean                 	mean with trim argument  
Geometric Mean               	geoMean                         
Skew                         	skewness                        
Kurtosis                     	kurtosis                        
Min                          	min                             
Max                          	max                             
Range                        	range and diff    
1st Quartile                 	quantile                        
3rd Quartile                 	quantile                        
Standard Deviation           	sd                              
Geometric Standard Deviation 	geoSD                           
Interquartile Range          	iqr                             
Median Absolute Deviation    	mad                             
Coefficient of Variation     	cv                              
}

`References`

Berthouex, P.M., and L.C. Brown. (2002). 
  Statistics for Environmental Engineers, Second Edition. 
  Lewis Publishers, Boca Raton, FL.

  Gilbert, R.O. (1987). Statistical Methods for Environmental 
  Pollution Monitoring.  Van Nostrand Reinhold, NY.

  Helsel, D.R., and R.M. Hirsch. (1992). 
  Statistical Methods in Water Resources Research. 
  Elsevier, New York, NY.

  Leidel, N.A., K.A. Busch, and J.R. Lynch. (1977). Occupational Exposure Sampling Strategy Manual. 
  U.S. Department of Health, Education, and Welfare, Public Health Service, Center for Disease Control, 
  National Institute for Occupational Safety and Health, Cincinnati, Ohio 45226, January, 1977, pp.102-103.

  Millard, S.P., and N.K. Neerchal. (2001). Environmental Statistics with S-PLUS. 
  CRC Press, Boca Raton, FL.

  Ott, W.R. (1995). Environmental Statistics and Data Analysis. 
  Lewis Publishers, Boca Raton, FL.

  Zar, J.H. (2010). Biostatistical Analysis, Fifth Edition. 
  Prentice-Hall, Upper Saddle River, NJ.

`See Also`

summary, summaryStats.

`Examples`

Run this code# Generate 20 observations from a lognormal distribution with 
  # parameters mean=10 and cv=1, and compute the summary statistics.  
  # (Note: the call to set.seed simply allows you to reproduce this 
  # example.)

  set.seed(250) 

  dat <- rlnormAlt(20, mean=10, cv=1) 

  summary(dat) 
  # Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  #2.608   4.995   6.235   7.490   9.295  15.440

  summaryFull(dat) 
  #                             dat     
  #N                            20      
  #Mean                          7.49   
  #Median                        6.235  
  #10% Trimmed Mean              7.125  
  #Geometric Mean                6.674  
  #Skew                          0.9877 
  #Kurtosis                     -0.03539
  #Min                           2.608  
  #Max                          15.44   
  #Range                        12.83   
  #1st Quartile                  4.995  
  #3rd Quartile                  9.295  
  #Standard Deviation            3.803  
  #Geometric Standard Deviation  1.634  
  #Interquartile Range           4.3    
  #Median Absolute Deviation     2.607  
  #Coefficient of Variation      0.5078 

  #----------

  # Compare summary statistics for normal and lognormal data:
  log.dat <- log(dat) 

  summaryFull(list(dat = dat, log.dat = log.dat))
  #                             dat      log.dat
  #N                            20       20     
  #Mean                          7.49     1.898 
  #Median                        6.235    1.83  
  #10% Trimmed Mean              7.125    1.902 
  #Geometric Mean                6.674    1.835 
  #Skew                          0.9877   0.1319
  #Kurtosis                     -0.03539 -0.4288
  #Min                           2.608    0.9587
  #Max                          15.44     2.737 
  #Range                        12.83     1.778 
  #1st Quartile                  4.995    1.607 
  #3rd Quartile                  9.295    2.227 
  #Standard Deviation            3.803    0.4913
  #Geometric Standard Deviation  1.634    1.315 
  #Interquartile Range           4.3      0.62  
  #Median Absolute Deviation     2.607    0.4915
  #Coefficient of Variation      0.5078   0.2588

  # Clean up
  rm(dat, log.dat)

  #--------------------------------------------------------------------

  # Compute summary statistics for 10 observations from a normal 
  # distribution with parameters mean=0 and sd=1.  Note that the 
  # geometric mean and geometric standard deviation are not computed 
  # since some of the observations are non-positive.

  set.seed(287) 

  dat <- rnorm(10) 

  summaryFull(dat) 
  #                          dat     
  #N                         10      
  #Mean                       0.07406
  #Median                     0.1095 
  #10% Trimmed Mean           0.1051 
  #Skew                      -0.1646 
  #Kurtosis                  -0.7135 
  #Min                       -1.549  
  #Max                        1.449  
  #Range                      2.998  
  #1st Quartile              -0.5834 
  #3rd Quartile               0.6966 
  #Standard Deviation         0.9412 
  #Interquartile Range        1.28   
  #Median Absolute Deviation  1.05

  # Clean up
  rm(dat)

  #--------------------------------------------------------------------

  # Compute summary statistics for the TcCB data given in USEPA (1994b) 
  # (the data are stored in EPA.94b.tccb.df).  Arbitrarily set the one 
  # censored observation to the censoring level. Group by the variable 
  # Area.

  summaryFull(TcCB ~ Area, data = EPA.94b.tccb.df)
  #                             Cleanup  Reference
  #N                             77       47      
  #Mean                           3.915    0.5985 
  #Median                         0.43     0.54   
  #10% Trimmed Mean               0.6846   0.5728 
  #Geometric Mean                 0.5784   0.5382 
  #Skew                           7.717    0.9019 
  #Kurtosis                      62.67     0.132  
  #Min                            0.09     0.22   
  #Max                          168.6      1.33   
  #Range                        168.5      1.11   
  #1st Quartile                   0.23     0.39   
  #3rd Quartile                   1.1      0.75   
  #Standard Deviation            20.02     0.2836 
  #Geometric Standard Deviation   3.898    1.597  
  #Interquartile Range            0.87     0.36   
  #Median Absolute Deviation      0.3558   0.2669 
  #Coefficient of Variation       5.112    0.4739
Run the code above in your browser using DataLab