Learn R Programming

FSA (version 0.8.20)

Summarize: Summary statistics for a numeric variable.

Description

Summary statistics for a single numeric variable, possibly separated by the levels of a factor variable or variables. This function is very similar to summary for a numeric variable.

Usage

Summarize(object, ...)

# S3 method for default Summarize(object, digits = getOption("digits"), na.rm = TRUE, exclude = NULL, nvalid = c("different", "always", "never"), percZero = c("different", "always", "never"), ...)

# S3 method for formula Summarize(object, data = NULL, digits = getOption("digits"), na.rm = TRUE, exclude = NULL, nvalid = c("different", "always", "never"), percZero = c("different", "always", "never"), ...)

Arguments

object

A vector of numeric data.

Not implemented.

digits

A single numeric that indicates the number of decimals to round the numeric summaries.

na.rm

A logical that indicates whether numeric missing values (NA) should be removed (=TRUE, default) or not.

exclude

A string that contains the level that should be excluded from a factor variable.

nvalid

A string that indicates how the “validn” result will be handled. If "always" then “validn” will always be shown and if "never" then “validn” will never be shown. However, if "different" (DEFAULT), then “validn” will only be shown if it differs from “n” (or if at least one group differs from “n” when summarized by multiple groups).

percZero

A string that indicates how the “percZero” result will be handled. If "always" then “percZero” will always be shown and if "never" then “percZero” will never be shown. However, if "different" (DEFAULT), then “percZero” will only be shown if it is greater than zero (or if at least one group is greater than zero when summarized by multiple groups).

data

A data.frame that contains the variables in formula.

Value

A named vector or data frame (when a quantitative variable is separated by one or two factor variables) of summary statistics for numeric data.

Details

This function is primarily used with formulas of the following types (where quant and factor generically represent quantitative/numeric and factor variables, respectively):

Formula Description of Summary
~quant Numerical summaries (see below) of quant.
quant~factor Summaries of quant separated by levels in factor.
quant~factor1*factor2 Summaries of quant separated by the combined levels in factor1 and factor2.

Numerical summaries include all results from summary (min, Q1, mean, median, Q3, and max) and the sample size, valid sample size (sample size minus number of NAs), and standard deviation (i.e., sd). NA values are removed from the calculations with na.rm=TRUE (the DEFAULT). The number of digits in the returned results are controlled with digits=.

See Also

See summary for related one dimensional functionality. See tapply, summaryBy in doBy, describe in psych, describe in prettyR, and basicStats in fBasics for similar “by” functionality.

Examples

Run this code
# NOT RUN {
## Create a data.frame of "data"
n <- 102
d <- data.frame(y=c(0,0,NA,NA,NA,runif(n-5)),
                w=sample(7:9,n,replace=TRUE),
                v=sample(0:2,n,replace=TRUE),
                g1=factor(sample(c("A","B","C",NA),n,replace=TRUE)),
                g2=factor(sample(c("male","female","UNKNOWN"),n,replace=TRUE)),
                g3=sample(c("a","b","c","d"),n,replace=TRUE),
                stringsAsFactors=FALSE)

# typical output of summary() for a numeric variable
summary(d$y)   

# this function           
Summarize(d$y,digits=3)
Summarize(~y,data=d,digits=3)
Summarize(y~1,data=d,digits=3)

# note that nvalid is not shown if there are no NAs and
#   percZero is not shown if there are no zeros
Summarize(~w,data=d,digits=3)
Summarize(~v,data=d,digits=3)

# note that the nvalid and percZero results can be forced to be shown
Summarize(~w,data=d,digits=3,nvalid="always",percZero="always")

## Numeric vector by levels of a factor variable
Summarize(y~g1,data=d,digits=3)
Summarize(y~g2,data=d,digits=3)
Summarize(y~g2,data=d,digits=3,exclude="UNKNOWN")

## Numeric vector by levels of two factor variables
Summarize(y~g1+g2,data=d,digits=3)
Summarize(y~g1+g2,data=d,digits=3,exclude="UNKNOWN")

## What happens if RHS of formula is not a factor
Summarize(y~w,data=d,digits=3)

## Summarizing multiple variables in a data.frame (must reduce to numerics)
lapply(as.list(d[,1:3]),Summarize,digits=4)

# }

Run the code above in your browser using DataLab