- data
The dataset to be checked. This dataset should be of class data.frame
,
tibble
or matrix
. If it is of classs matrix
, it will be converted to a
data.frame
.
- output
Output format. Options are "pdf"
, "word"
(.docx) and "html"
. If NULL
(the default),
the output format depends two sequential checks. First, whether a LaTeX installation is available,
in which case pdf
output is chosen. Secondly, if no LaTeX installation
is found, then if the operating system is Windows, word
output is used. Lastly, if neither of these
checks are positive, html
output is used.
- render
Should the output file be rendered (defaults to TRUE
),
i.e. should a pdf/word/html document be generated and saved to the disc?
- useVar
Variables to describe in the report.
If NULL
(the default), all variables in data
are included. If a vector of variable names is supplied, only the variables in data
that are
also in useVar
are included in the data report.
- ordering
Choose the ordering of the variables in the variable presentation. The options
are "asIs" (ordering as in the dataset) and "alphabetical" (alphabetical order).
- onlyProblematic
A logical. If TRUE
, only the variables flagged as
problematic in the check step will be included in the variable list.
- labelled_as
A string explaining the way to handle labelled and haven_labelled vectors.
Currently "factor"
(the default) is the only possibility. This means that labelled or haven_labelled
variables that appear factor-like (by having a non-NULL
labels
-attribute) will
be treated as factors, while other labelled or haven_labelled variables will be treated as whatever base
variable class they inherit from.
- mode
Vector of tasks to perform among the three categories "summarize", "visualize" and "check".
The default, c("summarize", "visualize", "check")
, implies that all three steps are
performed. The steps selected in mode
will be performed for each variable in
data
and their results are presented in the second part of the outputtet data report.
The "summarize" step is responsible for creating the summary table,
the "visualize" step is responsible for creating the plot and the "check" step is responsible
for performing checks on the variable and printing the results if any problems are found.
- smartNum
If TRUE
(the default), numeric and integer variables with
less than 5 unique values are treated as factor variables in the checking,
visualization and summary steps, and a message notifying the reader of this is
printed in the data summary.
- preChecks
Vector of function names for check functions used in the pre-check stage.
The pre-check stage consists of variable checks that should be performed before the
summary/visualization/checking step. If any of these checks find problems, the variable
will not be summarized nor visualized nor checked.
- file
The filename of the outputted rmarkdown (.Rmd) file.
If set to NULL
(the default), the filename will be the name of data
prefixed with "dataMaid_", if this qualifies as a valid file name (e.g. no special
characters allowed). Otherwise, makeDataReport()
tries to create a valid filename by
substituing illegal characters. Note that a valid file is of type .Rmd, hence all
filenames should have a ".Rmd"-suffix.
- replace
If FALSE
(the default), an error is thrown if one of the files
that we are about to be created (.Rmd overview file and possible also a .html, .pdf or
.docx file) already exist. If TRUE
, no checks are performed and files on disc thus
might be overwritten.
- vol
Extra text string or numeric that is appended on the end of the output
file name(s). For example, if the dataset is called "myData", no file argument is
supplied and vol=2
, the output file will be called "dataMaid_myData2.Rmd"
- standAlone
A logical. If TRUE
, the document begins with a
markdown YAML preamble such that it can be rendered as a stand alone rmarkdown
file, e.g. by calling render
. If FALSE
, this preamble is removed.
Moreover, no matter the input to the render
argument, the document will now
not be rendered, as it has no preamble.
- twoCol
A logical. Should the results from the summarize and visualize
steps be presented in two columns? Defaults to TRUE
.
- quiet
A logical. If TRUE
(the default), only a few messages
are printed to the screen as makeDataReport
runs. If FALSE
, no messages are
suppressed. The third option, silent
, renders the function completely
silent, such that only fatal errors are printed.
- openResult
A logical. If TRUE
(the default), the last file produced
by makeDataReport
is automatically opened by the end of the function run. This
means that if render = TRUE
, the rendered pdf, word or html file is opened, while
if render = FALSE
, the .Rmd file is opened.
- summaries
A list of summaries to use on each supported variable type. We recommend
using setSummaries
for creating this list and refer to the documentation
of this function for more details.
- visuals
A list of visual functions to use on each supported variable type. We recommend
using setVisuals
for creating this list and refer to the documentation
of this function for more details.
- checks
A list of checks to use on each supported variable type. We recommend
using setChecks
for creating this list and refer to the documentation
of this function for more details.
- listChecks
A logical. Controls whether what checks that were used for each
possible variable type are summarized in the output. Defaults to TRUE
.
- maxProbVals
A positive integer or Inf
. Maximum number of unique
values printed from check-functions. In the case of Inf
, all problematic
values are printed. Defaults to 10
.
- maxDecimals
A positive integer or Inf
. Number of decimals used when
printing numerical values in the data summary and in problematic values from the
data checks. If Inf
, no rounding is performed.
- addSummaryTable
A logical. If TRUE
(the default), a summary table
of the variable checks is added between the Data Cleaning Summary and the
Variable List. Only one of addSummaryTable
and addCodebookTable
can be TRUE
.
- codebook
A logical. Defaults to FALSE
. If TRUE
then the document is tweaked to better represent a codebook.
- reportTitle
A text string. If supplied, this will be the printed title of the
report. If left unspecified, the title with the name of the supplied dataset.
- treatXasY
A list that indicates how non-standard variable classes should be treated.
This parameter allows you to include variables that are not of class factor
, character
,
labelled
, haven_labelled
, numeric
, integer
, logical
nor Date
(or a class
that inherits from any of these classes). The names of the list are the new classes and the entries
are the names of the class, they should be treated as. If makeDataReport()
should e.g. treat variables of
class raw
as characters and variables of class complex
as numeric, you should put
treatXasY = list(raw = "character", complex = "numeric")
.
- includeVariableList
A logical indicating whether the results of the summarize/visualize/check-steps
should be added to the report. Defaults to TRUE
. Note that setting it to FALSE
does currently
not speed up computations, it just means that the information is not printed in the report.
- ...
Other arguments that are passed on the to precheck,
checking, summary and visualization functions.