Learn R Programming

Gmisc (version 3.0.3)

getDescriptionStatsBy: Creating of description statistics

Description

A function that returns a description statistic that can be used for creating a publication "table 1" when you want it by groups. The function identifies if the variable is a continuous, binary or a factored variable. The format is inspired by NEJM, Lancet & BMJ.

Usage

getDescriptionStatsBy(
  x,
  ...,
  by,
  digits = 1,
  digits.nonzero = NA,
  html = TRUE,
  numbers_first = TRUE,
  statistics = FALSE,
  statistics.sig_lim = 10^-4,
  statistics.two_dec_lim = 10^-2,
  statistics.suppress_warnings = TRUE,
  useNA = c("ifany", "no", "always"),
  useNA.digits = digits,
  continuous_fn = describeMean,
  prop_fn = describeProp,
  factor_fn = describeFactors,
  show_all_values = FALSE,
  hrzl_prop = FALSE,
  add_total_col,
  total_col_show_perc = TRUE,
  use_units = FALSE,
  units_column_name = "Units",
  default_ref = NULL,
  NEJMstyle = FALSE,
  percentage_sign = TRUE,
  header_count = NULL,
  missing_value = "-",
  names_of_missing = NULL
)

# S3 method for Gmisc_getDescriptionStatsBy htmlTable(x, ...)

# S3 method for Gmisc_getDescriptionStatsBy print(x, ...)

# S3 method for Gmisc_getDescriptionStatsBy knit_print(x, ...)

# S3 method for Gmisc_getDescriptionStatsBy length(x)

Value

Returns matrix if a single value was provided, otherwise a list

of matrices with the class "Gmisc_getDescriptionStatsBy".

Arguments

x

If a data.frame it will be used as the data source for the variables in the ... parameter. If it is a single variable it will be the core value that want the statistics for. In the print this is equivalent to the output of this function.

...

The variables that you want you statistic for. In the print all thes parameters are passed on as [htmlTable::htmlTable] arguments.

by

The variable that you want to split into different columns

digits

The number of decimals used

digits.nonzero

The number of decimals used for values that are close to zero

html

If HTML compatible output should be used. If FALSE it outputs LaTeX formatting

numbers_first

If the number should be given or if the percentage should be presented first. The second is encapsulated in parentheses ().

statistics

Add statistics, fisher test for proportions and Wilcoxon for continuous variables. See details below for more customization.

statistics.sig_lim

The significance limit for < sign, i.e. p-value 0.0000312 should be < 0.0001 with the default setting.

statistics.two_dec_lim

The limit for showing two decimals. E.g. the p-value may be 0.056 and we may want to keep the two decimals in order to emphasize the proximity to the all-mighty 0.05 p-value and set this to \(10^-2\). This allows that a value of 0.0056 is rounded to 0.006 and this makes intuitive sense as the 0.0056 level as this is well below the 0.05 value and thus not as interesting to know the exact proximity to 0.05. Disclaimer: The 0.05-limit is really silly and debated, unfortunately it remains a standard and this package tries to adapt to the current standards in order to limit publication associated issues.

statistics.suppress_warnings

Hide warnings from the statistics function.

useNA

This indicates if missing should be added as a separate row below all other. See table for useNA-options. Note: defaults to ifany and not "no" as table does.

useNA.digits

The number of digits to use for the missing percentage, defaults to the overall digits.

continuous_fn

The method to describe continuous variables. The default is describeMean.

prop_fn

The method used to describe proportions, see describeProp.

factor_fn

The method used to describe factors, see describeFactors.

show_all_values

Show all values in proportions. For factors with only two values it is most sane to only show one option as the other one will just be a complement to the first, i.e. we want to convey a proportion. For instance sex - if you know gender then automatically you know the distribution of the other sex as it's 100 % - other %. To choose which one you want to show then set the default_ref parameter.

hrzl_prop

This is default FALSE and indicates that the proportions are to be interpreted in a vertical manner. If we want the data to be horizontal, i.e. the total should be shown and then how these differ in the different groups then set this to TRUE.

add_total_col

This adds a total column to the resulting table. You can also specify if you want the total column "first" or "last" in the column order.

total_col_show_perc

This is by default true but if requested the percentages are suppressed as this sometimes may be confusing.

use_units

If the Hmisc package's units() function has been employed it may be interesting to have a column at the far right that indicates the unit measurement. If this column is specified then the total column will appear before the units (if specified as last). You can also set the value to "name" and the units will be added to the name as a parenthesis, e.g. Age (years).

units_column_name

The name of the units column. Used if use_units = TRUE

default_ref

The default reference when dealing with proportions. When using `dplyr` syntax (`tidyselect`) you can specify a named vector/list for each column name.

NEJMstyle

Adds - no (%) at the end to proportions

percentage_sign

If you want to suppress the percentage sign you can set this variable to FALSE. You can also choose something else that the default % if you so wish by setting this variable.

header_count

Set to TRUE if you want to add a header count, e.g. Smoking; No. 25 observations, where there is a new line after the factor name. If you want a different text for the second line you can specifically use the sprintf formatting, e.g. "No. %s patients".

missing_value

Value that is substituted for empty cells. Defaults to "-"

names_of_missing

Optional character vector containing the names of returned statistics, in case all returned values for a given by level are missing. Defaults to NULL

Customizing statistics

You can specify what function that you want for statistic by providing a function that takes two arguments x and by and returns a p-value. There are a few functions already prepared for this see getPvalAnova, getPvalChiSq getPvalFisher getPvalKruskal getPvalWilcox. The default functions used are getPvalFisher and getPvalWilcox (unless the by argument has more than three unique levels where it defaults to getPvalAnova).

If you want the function to select functions depending on the type of input you can provide a list with the names 'continuous', 'proportion', 'factor' and the function will choose accordingly. If you fail to define a certain category it will default to the above.

You can also use a custom function that returns a string with the attribute 'colname' set that will be appended to the results instead of the p-value column.

See Also

Other descriptive functions: describeFactors(), describeMean(), describeMedian(), describeProp(), getPvalWilcox()

Examples

Run this code
library(magrittr)
library(dplyr)
library(htmlTable)

data(mtcars)
mtcars %<>%
  mutate(am = factor(am, levels = 0:1, labels = c("Automatic", "Manual")),
         vs = factor(vs, levels = 0:1, labels = c("V-shaped", "straight")),
         drat_prop = drat > median(drat),
         drat_prop = factor(drat_prop,
                            levels = c(FALSE, TRUE),
                            labels = c("High ratio", "Low ratio")),
         carb_prop = carb > 2,
         carb_prop = factor(carb_prop,
                            levels = c(FALSE, TRUE),
                            labels = c("≤ 2", "> 2")),
         across(c(gear, carb, cyl), factor))

# A simple bare-bone example
mtcars %>%
  getDescriptionStatsBy(`Miles per gallon` = mpg,
                        Weight = wt,
                        `Carborators ≤ 2` = carb_prop,
                        by = am) %>%
  htmlTable(caption  = "Basic continuous stats from the mtcars dataset")
invisible(readline(prompt = "Press [enter] to continue"))

# For labeling & units we use set_column_labels/set_column_unit that use
# the Hmisc package annotation functions
mtcars %<>%
  set_column_labels(am = "Transmission",
                    mpg = "Gas",
                    wt = "Weight",
                    gear = "Gears",
                    disp = "Displacement",
                    vs = "Engine type",
                    drat_prop = "Rear axel ratio",
                    carb_prop = "Carburetors") %>%
  set_column_units(mpg = "Miles/(US) gallon",
                   wt = "103 lbs",
                   disp = "cu.in.")

mtcars %>%
  getDescriptionStatsBy(mpg,
                        wt,
                        `Gear†` = gear,
                        drat_prop,
                        carb_prop,
                        vs,
                        by = am,
                        header_count = TRUE,
                        use_units = TRUE,
                        show_all_values = TRUE)  %>%
  addHtmlTableStyle(pos.caption = "bottom") %>%
  htmlTable(caption  = "Stats from the mtcars dataset",
            tfoot = "† Number of forward gears")
invisible(readline(prompt = "Press [enter] to continue"))

# Using the default parameter we can
mtcars %>%
  getDescriptionStatsBy(mpg,
                        wt,
                        `Gear†` = gear,
                        drat_prop,
                        carb_prop,
                        vs,
                        by = am,
                        header_count = TRUE,
                        use_units = TRUE,
                        default_ref = c(drat_prop = "Low ratio",
                                        carb_prop = "> 2"))  %>%
  addHtmlTableStyle(pos.caption = "bottom") %>%
  htmlTable(caption  = "Stats from the mtcars dataset",
            tfoot = "† Number of forward gears")
invisible(readline(prompt = "Press [enter] to continue"))

# We can also use lists
tll <- list()
tll[["Gear (3 to 5)"]] <- getDescriptionStatsBy(mtcars$gear, mtcars$am)
tll <- c(tll,
         list(getDescriptionStatsBy(mtcars$disp, mtcars$am)))

mergeDesc(tll,
          htmlTable_args = list(caption  = "Factored variables")) %>%
  htmlTable::addHtmlTableStyle(css.rgroup = "")
invisible(readline(prompt = "Press [enter] to continue"))

tl_no_units <- list()
tl_no_units[["Gas (mile/gallons)"]] <-
  getDescriptionStatsBy(mtcars$mpg, mtcars$am,
                        header_count = TRUE)
tl_no_units[["Weight (103 kg)"]] <-
  getDescriptionStatsBy(mtcars$wt, mtcars$am,
                        header_count = TRUE)
mergeDesc(tl_no_units,
          tll) %>%
  htmlTable::addHtmlTableStyle(css.rgroup = "")
invisible(readline(prompt = "Press [enter] to continue"))

# Other settings
mtcars$mpg[sample(1:NROW(mtcars), size = 5)] <- NA
getDescriptionStatsBy(mtcars$mpg,
                      mtcars$am,
                      statistics = TRUE)
invisible(readline(prompt = "Press [enter] to continue"))

# Do the horizontal version
getDescriptionStatsBy(mtcars$gear,
                      mtcars$am,
                      statistics = TRUE,
                      hrzl_prop = TRUE)
invisible(readline(prompt = "Press [enter] to continue"))

mtcars$wt_with_missing <- mtcars$wt
mtcars$wt_with_missing[sample(1:NROW(mtcars), size = 8)] <- NA
getDescriptionStatsBy(mtcars$wt_with_missing, mtcars$am, statistics = TRUE,
                      hrzl_prop = TRUE, total_col_show_perc = FALSE)
invisible(readline(prompt = "Press [enter] to continue"))

if (FALSE) {
  ## There is also a LaTeX wrapper
  tll <- list(
    getDescriptionStatsBy(mtcars$gear, mtcars$am),
    getDescriptionStatsBy(mtcars$col, mtcars$am))

  latex(mergeDesc(tll),
        caption  = "Factored variables",
        file = "")
}

Run the code above in your browser using DataLab