skimr
provides extensions to a variety of functions with R's stats package
to simplify creating summaries of data. All functions are vectorized over the
first argument. Additional arguments should be set in the sfl()
that sets
the appropriate skimmers for a data type. You can use these, along with other
vectorized R functions, for creating custom sets of summary functions for
a given data type.
n_missing(x)n_complete(x)
complete_rate(x)
n_whitespace(x)
sorted_count(x)
top_counts(x, max_char = 3, max_levels = 4)
inline_hist(x, n_bins = 8)
n_empty(x)
min_char(x)
max_char(x)
n_unique(x)
ts_start(x)
ts_end(x)
inline_linegraph(x, length.out = 16)
list_lengths_min(x)
list_lengths_median(x)
list_lengths_max(x)
list_min_length(x)
list_max_length(x)
A vector
In top
= 3, max_levels = 4
The maximum number of levels to be displayed.
In inline_hist
, the number of histogram bars.
In inline_linegraph
, the length of the character time
series.
n_missing()
: Calculate the sum of NA
and NULL
(i.e. missing) values.
n_complete()
: Calculate the sum of not NA
and NULL
(i.e. missing)
values.
complete_rate()
: Calculate complete values; complete values are not missing.
n_whitespace()
: Calculate the number of rows containing only whitespace
values using s+ regex.
sorted_count()
: Create a contingency table and arrange its levels in
descending order. In case of ties, the ordering of results is alphabetical
and depends upon the locale. NA
is treated as a ordinary value for
sorting.
top_counts()
: Compute and collapse a contingency table into a single
character scalar. Wraps sorted_count()
.
inline_hist()
: Generate inline histogram for numeric variables. The
character length of the histogram is controlled by the formatting options
for character vectors.
n_empty()
: Calculate the number of blank values in a character vector.
A "blank" is equal to "".
min_char()
: Calculate the minimum number of characters within a
character vector.
max_char()
: Calculate the maximum number of characters within a
character vector.
n_unique()
: Calculate the number of unique elements but remove NA
.
ts_start()
: Get the start for a time series without the frequency.
ts_end()
: Get the finish for a time series without the frequency.
inline_linegraph()
: Generate inline line graph for time series variables. The
character length of the line graph is controlled by the formatting options
for character vectors.
Based on the function in the pillar package.
list_lengths_min()
: Get the length of the shortest list in a vector of lists.
list_lengths_median()
: Get the median length of the lists.
list_lengths_max()
: Get the maximum length of the lists.
list_min_length()
: Get the length of the shortest list in a vector of lists.
list_max_length()
: Get the length of the longest list in a vector of lists.
get_skimmers()
for customizing the functions called by skim()
.