summaryP: Multi-way Summary of Proportions

Description

summaryP produces a tall and thin data frame containing numerators (freq) and denominators (denom) after stratifying the data by a series of variables. A special capability to group a series of related yes/no variables is included through the use of the ynbind function, for which the user specials a final argument label used to label the panel created for that group of related variables.

If options(grType='plotly') is not in effect, the plot method for summaryP displays proportions as a multi-panel dot chart using the lattice package's dotplot function with a special panel function. Numerators and denominators of proportions are also included as text, in the same colors as used by an optional groups variable. The formula argument used in the dotplot call is constructed, but the user can easily reorder the variables by specifying formula, with elements named val (category levels), var (classification variable name), freq (calculated result) plus the overall cross-classification variables excluding groups. If options(grType='plotly') is in effect, the plot method makes an entirely different display using Hmisc::dotchartpl with codeplotly if marginVal is specified, whereby a stratification variable causes more finely stratified estimates to be shown slightly below the lines, with smaller and translucent symbols if data has been run through addMarginal. The marginal summaries are shown as the main estimates and the user can turn off display of the stratified estimates, or view their details with hover text.

The ggplot method for summaryP does not draw numerators and denominators but the chart is more compact than using the plot method with base graphics because ggplot2 does not repeat category names the same way as lattice does. Variable names that are too long to fit in panel strips are renamed (1), (2), etc. and an attribute "fnvar" is added to the result; this attribute is a character string defining the abbreviations, useful in a figure caption. The ggplot2 object has labels for points plotted, used by plotly::ggplotly as hover text (see example).

The latex method produces one or more LaTeX tabulars containing a table representation of the result, with optional side-by-side display if groups is specified. Multiple tabulars result from the presence of non-group stratification factors.

Usage

summaryP(formula, data = NULL, subset = NULL,
         na.action = na.retain, sort=TRUE,
         asna = c("unknown", "unspecified"), …)
# S3 method for summaryP
plot(x, formula=NULL, groups=NULL,
         marginVal=NULL, marginLabel=marginVal,
         refgroup=NULL, exclude1=TRUE,  xlim = c(-.05, 1.05),
         text.at=NULL, cex.values = 0.5,
         key = list(columns = length(groupslevels), x = 0.75,
                    y = -0.04, cex = 0.9,
                    col = trellis.par.get('superpose.symbol')$col,
                    corner=c(0,1)),
         outerlabels=TRUE, autoarrange=TRUE,
         col=colorspace::rainbow_hcl, …)
# S3 method for summaryP
ggplot(data, mapping, groups=NULL, exclude1=TRUE,
           xlim=c(0, 1), col=NULL, shape=NULL, size=function(n) n ^ (1/4),
           sizerange=NULL, abblen=5, autoarrange=TRUE, addlayer=NULL,
           …, environment)
# S3 method for summaryP
latex(object, groups=NULL, exclude1=TRUE, file='', round=3,
                           size=NULL, append=TRUE, …)

Arguments

formula

a formula with the variables for whose levels proportions are computed on the left hand side, and major classification variables on the right. The formula need to include any variable later used as groups, as the data summarization does not distinguish between superpositioning and paneling. For the plot method, formula can provide an overall to the default formula for dotplot().

data

an optional data frame. For ggplot.summaryP data is the result of summaryP.

subset

an optional subsetting expression or vector

na.action

function specifying how to handle NAs. The default is to keep all NAs in the analysis frame.

sort

set to FALSE to not sort category levels in descending order of global proportions

asna

character vector specifying level names to consider the same as NA. Set asna=NULL to not consider any.

an object produced by summaryP

groups

a character string containing the name of a superpositioning variable for obtaining further stratification within a horizontal line in the dot chart.

marginVal

if options(grType='plotly') is in effect and the data given to summaryP were run through addMarginal, specifies the category name that represents marginal summaries (usually "All").

marginLabel

specifies a different character string to use than the value of marginVal. For example, if marginal proportions were computed over all regions, one may specify marginVal="All", marginLabel="All Regions". marginLabel is only used for formatting graphical output.

refgroup

used when doing a plotly chart and a two-level group variable was used, resulting in the half-width confidence interval for the difference in two proportions to be shown, and the actual confidence limits and the difference added to hover text. See dotchartpl for more details.

exclude1

By default, ggplot, plot, and latex methods for summaryP remove redundant entries from tables for variables with only two levels. For example, if you print the proportion of females, you don't need to print the proportion of males. To override this, set exclude1=FALSE.

xlim

x-axis limits. Default is c(0,1).

text.at

specify to leave unused space to the right of each panel to prevent numerators and denominators from touching data points. text.at is the upper limit for scaling panels' x-axes but tick marks are only labeled up to max(xlim).

cex.values

character size to use for plotting numerators and denominators

key

a list to pass to the auto.key argument of dotplot. To place a key above the entire chart use auto.key=list(columns=2) for example.

outerlabels

by default if there are two conditioning variables besides groups, the latticeExtra package's useOuterStrips function is used to put strip labels in the margins, usually resulting in a much prettier chart. Set to FALSE to prevent usage of useOuterStrips.

autoarrange

If TRUE, the formula is re-arranged so that if there are two conditioning (paneling) variables, the variable with the most levels is taken as the vertical condition.

col

a vector of colors to use to override defaults in ggplot. When options(grType='plotly'), see dotchartpl.

shape

a vector of plotting symbols to override ggplot defaults

mapping, environment

not used; needed because of rules for generics

size

for ggplot, a function that transforms denominators into metrics used for the size aesthetic. Default is the fourth root function so that the area of symbols is proportional to the square root of sample size. Specify NULL to not vary point sizes. size=sqrt is a reasonable alternative. Set size to an integer to categorize the denominators into size quantile groups using cut2. Unless size is an integer, the legend for sizes uses the minimum and maximum denominators and 6-tiles using quantile(..., type=1) so that actually occurring sample sizes are used as labels. size is overridden to NULL if the range in denominators is less than 10 or the ratio of the maximum to the minimum is less than 1.2. For latex, size is an optional font size such as "small"

sizerange

a 2-vector specifying the range argument to the ggplot2 scale_size_... function, which is the range of sizes allowed for the points according to the denominator. The default is sizerange=c(.7, 3.25) but the lower limit is increased according to the ratio of maximum to minimum sample sizes.

abblen

labels of variables having only one level and having their name longer than abblen characters are abbreviated and documented in fnvar (described elsewhere here). The default abblen=5 is good for labels plotted vertically. If labels are rotated using theme a better value would be 12.

…

used only for plotly graphics and these arguments are passed to dotchartpl

object

an object produced by summaryP

file

file name, defaults to writing to console

round

number of digits to the right of the decimal place for proportions

append

set to FALSE to start output over

addlayer

a ggplot layer to add to the plot object

Value

summaryP produces a data frame of class "summaryP". The plot method produces a lattice object of class "trellis". The latex method produces an object of class "latex" with an additional attribute ngrouplevels specifying the number of levels of any groups variable and an attribute nstrata specifying the number of strata.

Examples

Run this code

# NOT RUN {
n <- 100
f <- function(na=FALSE) {
  x <- sample(c('N', 'Y'), n, TRUE)
  if(na) x[runif(100) < .1] <- NA
  x
}
set.seed(1)
d <- data.frame(x1=f(), x2=f(), x3=f(), x4=f(), x5=f(), x6=f(), x7=f(TRUE),
                age=rnorm(n, 50, 10),
                race=sample(c('Asian', 'Black/AA', 'White'), n, TRUE),
                sex=sample(c('Female', 'Male'), n, TRUE),
                treat=sample(c('A', 'B'), n, TRUE),
                region=sample(c('North America','Europe'), n, TRUE))
d <- upData(d, labels=c(x1='MI', x2='Stroke', x3='AKI', x4='Migraines',
                 x5='Pregnant', x6='Other event', x7='MD withdrawal',
                 race='Race', sex='Sex'))
dasna <- subset(d, region=='North America')
with(dasna, table(race, treat))
s <- summaryP(race + sex + ynbind(x1, x2, x3, x4, x5, x6, x7, label='Exclusions') ~
              region + treat, data=d)
# add exclude1=FALSE below to include female category
plot(s, groups='treat')
ggplot(s, groups='treat')

plot(s, val ~ freq | region * var, groups='treat', outerlabels=FALSE)
# Much better looking if omit outerlabels=FALSE; see output at
# http://biostat.mc.vanderbilt.edu/HmiscNew#summaryP
# See more examples under bpplotM

## For plotly interactive graphic that does not handle variable size
## panels well:
## require(plotly)
## g <- ggplot(s, groups='treat')
## ggplotly(g, tooltip='text')

## For nice plotly interactive graphic:
## options(grType='plotly')
## s <- summaryP(race + sex + ynbind(x1, x2, x3, x4, x5, x6, x7,
##                                   label='Exclusions') ~
##               treat, data=subset(d, region='Europe'))
##
## plot(s, groups='treat', refgroup='A')  # refgroup='A' does B-A differences


# Make a chart where there is a block of variables that
# are only analyzed for males.  Keep redundant sex in block for demo.
# Leave extra space for numerators, denominators
sb <- summaryP(race + sex +
               pBlock(race, sex, label='Race: Males', subset=sex=='Male') ~
               region, data=d)
plot(sb, text.at=1.3)
plot(sb, groups='region', layout=c(1,3), key=list(space='top'),
     text.at=1.15)
ggplot(sb, groups='region')
# }
# NOT RUN {
plot(s, groups='treat')
# plot(s, groups='treat', outerlabels=FALSE) for standard lattice output
plot(s, groups='region', key=list(columns=2, space='bottom'))
colorFacet(ggplot(s))

plot(summaryP(race + sex ~ region, data=d), exclude1=FALSE, col='green')

# Make your own plot using data frame created by summaryP
useOuterStrips(dotplot(val ~ freq | region * var, groups=treat, data=s,
        xlim=c(0,1), scales=list(y='free', rot=0), xlab='Fraction',
        panel=function(x, y, subscripts, ...) {
          denom <- s$denom[subscripts]
          x <- x / denom
          panel.dotplot(x=x, y=y, subscripts=subscripts, ...) }))

# Show marginal summary for all regions combined
s <- summaryP(race + sex ~ region, data=addMarginal(d, region))
plot(s, groups='region', key=list(space='top'), layout=c(1,2))

# Show marginal summaries for both race and sex
s <- summaryP(ynbind(x1, x2, x3, x4, label='Exclusions', sort=FALSE) ~
              race + sex, data=addMarginal(d, race, sex))
plot(s, val ~ freq | sex*race)
# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples