Generate an HTML report that scours the input table data. Before calling up an agent to validate the data, it's a good idea to understand the data with some level of precision. Make this the initial step of a well-balanced data quality reporting workflow. The reporting output contains several sections to make everything more digestible, and these are:
Table dimensions, duplicate row counts, column types, and reproducibility information
A summary for each table variable and further statistics and summaries depending on the variable type
A matrix plot that shows interactions between variables
A set of correlation matrix plots for numerical variables
A summary figure that shows the degree of missingness across variables
A table that provides the head and tail rows of the dataset
The output HTML report will appear in the RStudio Viewer and can also be
integrated in R Markdown HTML output. If you need the output HTML as a
string, it's possible to get that by using as.character()
(e.g.,
scan_data(tbl = mtcars) %>% as.character()
). The resulting HTML string is a
complete HTML document where Bootstrap and jQuery are embedded
within.
scan_data(tbl, sections = "OVICMS", navbar = TRUE, lang = NULL, locale = NULL)
The input table. This can be a data frame, tibble, a tbl_dbi
object, or a tbl_spark
object.
The sections to include in the finalized Table Scan
report.
A string with key characters representing section names is required here.
The default string is "OVICMS"
wherein each letter stands for the
following sections in their default order: "O"
: "overview"
; "V"
:
"variables"
; "I"
: "interactions"
; "C"
: "correlations"
; "M"
:
"missing"
; and "S"
: "sample"
. This string can be comprised of less
characters and the order can be changed to suit the desired layout of the
report. For tbl_dbi
and tbl_spark
objects supplied to tbl
, the
"interactions"
and "correlations"
sections are currently excluded.
Should there be a navigation bar anchored to the top of the
report page? By default this is TRUE
.
The language to use for label text in the report. By default,
NULL
will create English ("en"
) text. Other options include French
("fr"
), German ("de"
), Italian ("it"
), Spanish ("es"
), Portuguese,
("pt"
), Chinese ("zh"
), and Russian ("ru"
).
An optional locale ID to use for formatting values in the
report according the locale's rules. Examples include "en_US"
for English
(United States) and "fr_FR"
for French (France); more simply, this can be
a language identifier without a country designation, like "es"
for
Spanish (Spain, same as "es_ES"
).
1-1
Other Planning and Prep:
action_levels()
,
create_agent()
,
create_informant()
,
db_tbl()
,
file_tbl()
,
tbl_get()
,
tbl_source()
,
tbl_store()
,
validate_rmd()
# NOT RUN {
if (interactive()) {
# Get an HTML document that describes all of
# the data in the `dplyr::storms` dataset
tbl_scan <- scan_data(tbl = dplyr::storms)
}
# }
Run the code above in your browser using DataLab