Summary of a data frame consisting of: variable names, labels if any, factor levels, frequencies and/or numerical summary statistics, and valid/missing observation counts.
dfSummary(x, round.digits = st_options("round.digits"),
varnumbers = st_options("dfSummary.varnumbers"),
labels.col = st_options("dfSummary.labels.col"),
valid.col = st_options("dfSummary.valid.col"),
na.col = st_options("dfSummary.na.col"),
graph.col = st_options("dfSummary.graph.col"),
graph.magnif = st_options("dfSummary.graph.magnif"),
style = st_options("dfSummary.style"),
plain.ascii = st_options("plain.ascii"), justify = "left",
col.widths = NA, headings = st_options("headings"),
display.labels = st_options("display.labels"),
max.distinct.values = 10, trim.strings = FALSE,
max.string.width = 25, split.cells = 40, split.tables = Inf,
tmp.img.dir = st_options("tmp.img.dir"),
silent = st_options("dfSummary.silent"), ...)
A data frame.
Number of significant digits to display. Defaults to
2
and can be set globally; see st_options
.
Logical. Should the first column contain variable number?
Defaults to TRUE
. Can be set globally; see st_options
,
option “dfSummary.varnumbers”.
Logical. If TRUE
, variable labels (as defined with
rapportools, Hmisc or summarytools' label
functions) will be displayed. TRUE
by default, but the labels
column is only shown if at least one column has a defined label. This
option can also be set globally; see st_options
, option
“dfSummary.labels.col”.
Logical. Include column indicating count and proportion of
valid (non-missing) values. TRUE
by default, but can be set
globally; see st_options
, option
“dfSummary.valid.col”.
Logical. Include column indicating count and proportion of
missing (NA) values. TRUE
by default, but can be set globally; see
st_options
, option “dfSummary.na.col”.
Logical. Display barplots / histograms column in html
reports. TRUE
by default, but can be set globally; see
st_options
, option “dfSummary.graph.col”.
Numeric. Magnification factor, useful if the graphs show
up too large (then use a value < 1) or too small (use a value > 1). Must be
positive. Default to 1
. Can be set globally; see
st_options
, option “dfSummary.graph.magnif”.
Style to be used by pander
when rendering
output table. Defaults to “multiline”. The only other valid option
is “grid”. Style “simple” is not supported for this
particular function, and “rmarkdown” will fallback to
“multiline”.
Logical. pander
argument; when
TRUE
, no markup characters will be used (useful when printing to
console). Defaults to TRUE
. Set to FALSE
when in context of
markdown rendering. To change the default value globally, see
st_options
.
String indicating alignment of columns; one of “l” (left) “c” (center), or “r” (right). Defaults to “l”.
Numeric or character. Vector of column widths. If numeric,
values are assumed to be numbers of pixels. Otherwise, any CSS-supported
units can be used. NA
by default, meaning widths are calculated
automatically.
Logical. Set to FALSE
to omit headings. To change this
default value globally, see st_options
.
Logical. Should data frame label be displayed in the
title section? Default is TRUE
. To change this default value
globally, see st_options
.
The maximum number of values to display frequencies for. If variable has more distinct values than this number, the remaining frequencies will be reported as a whole, along with the number of additional distinct values. Defaults to 10.
Logical; for character variables, should leading and
trailing white space be removed? Defaults to FALSE
. See
details section.
Limits the number of characters to display in the
frequency tables. Defaults to 25
.
A numeric argument passed to pander
.
It is the number of characters allowed on a line before splitting the cell.
Defaults to 40
.
pander argument which determines the maximum width
of a table. Keeping the default value (Inf
) is recommended.
Character. Directory used to store temporary images when rendering dfSummary() with `method = "pander"`, `plain.ascii = TRUE` and `style = "grid"`. See Details.
Logical. Hide console messages. FALSE
by default. To
change this value globally, see st_options
.
Additional arguments passed to pander
.
A data frame with additional class summarytools
containing as
many rows as there are columns in x
, with attributes to inform
print
method. Columns in the output data frame are:
Number indicating the order in which column appears in the data frame.
Name of the variable, along with its class(es).
Label of the variable (if applicable).
For factors, a list of their values, limited by the
max.distinct.values
parameter. For character variables, the most
common values (in descending frequency order), also limited by
max.distinct.values
. For numerical variables, common univariate
statistics (mean, std. deviation, min, med, max, IQR and CV).
For factors and character variables, the
frequencies and proportions of the values listed in the previous
column. For numerical vectors, number of distinct values, or frequency
of distinct values if their number is not greater than
max.distinct.values
.
An ascii histogram for numerical variables, and ascii barplot for factors and character variables.
Number and proportion of valid values.
Number and proportion of missing (NA and NAN) values.
The default plain.ascii = TRUE
option is there to make
results appear cleaner in the console. When used in a context of
rmarkdown rendering, set this option to FALSE
.
When the trim.strings
is set to TRUE
, trimming is done
before calculating frequencies, so those will be impacted
accordingly.
Specifying tmp.img.dir
allows producing results consistent with
pandoc styling while also showing png graphs. Due to the fact that
in Pandoc, column widths are determined by the length of cell contents
even if said content is merely a link to an image, we cannot
use the standard R temporary directory to store the images. We need a
shorter path; on Mac OS and Linux, using “/tmp” is a sensible
choice, since this directory is cleaned up automatically on a regular
basis. On Windows however, there is no such convenient directory and the
user will have to choose a directory and cleanup the temporary images
manually after the document has been rendered. Providing a relative path
such as “img” is recommended. The maximum length for this parameter
is set to 5 characters. It can be set globally using
st_options
; for example: st_options(tmp.img.dir = ".")
.
# NOT RUN {
data("tobacco")
dfSummary(tobacco)
# Exclude some columns
dfSummary(tobacco, varnumbers = FALSE, valid.col = FALSE)
# Limit number of categories to be displayed for factors / categorical data
dfSummary(tobacco, max.distinct.values = 5, style = "grid")
# }
# NOT RUN {
# Show in Viewer or browser (view: no capital V!)
view(dfSummary(iris))
# Rmarkdown-ready
dfSummary(tobacco, style = "rmarkdown", plain.ascii = TRUE,
varnumbers = FALSE, valid.col = FALSE, tmp.img.dir = "./img")
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab