cursory
The goal of cursory is to make it easier to summarize data and look at
your variables. It builds off dplyr and
purrr. It is also compatible with
dbplyr and remote data.
Installation
You can install the released version of cursory from CRAN with:
install.packages("cursory")And the development version from GitHub with:
# install.packages("devtools")
devtools::install_github("halpo/cursory")Example
This is a basic example which shows you how to solve a common problem:
library(dplyr)
library(cursory)
data(iris)
## basic summary statistics for each variable in a data frame.
cursory_all(group_by(iris, Species), lst(mean, median)) %>% ungroup() | Variable | Species | mean | median |
|---|---|---|---|
| Sepal.Length | setosa | 5.006 | 5.00 |
| Sepal.Length | versicolor | 5.936 | 5.90 |
| Sepal.Length | virginica | 6.588 | 6.50 |
| Sepal.Width | setosa | 3.428 | 3.40 |
| Sepal.Width | versicolor | 2.770 | 2.80 |
| Sepal.Width | virginica | 2.974 | 3.00 |
| Petal.Length | setosa | 1.462 | 1.50 |
| Petal.Length | versicolor | 4.260 | 4.35 |
| Petal.Length | virginica | 5.552 | 5.55 |
| Petal.Width | setosa | 0.246 | 0.20 |
| Petal.Width | versicolor | 1.326 | 1.30 |
| Petal.Width | virginica | 2.026 | 2.00 |
## summary statistics for only numeric variables.
cursory_if(iris, is.numeric, lst(Mean = mean, 'Std. Deviation' = sd))| Variable | Mean | Std. Deviation |
|---|---|---|
| Sepal.Length | 5.843333 | 0.8280661 |
| Sepal.Width | 3.057333 | 0.4358663 |
| Petal.Length | 3.758000 | 1.7652982 |
| Petal.Width | 1.199333 | 0.7622377 |
## summary statistics for specific variables.
cursory_at(iris, vars(ends_with("Length")), var)| Variable | var |
|---|---|
| Sepal.Length | 0.6856935 |
| Petal.Length | 3.1162779 |
table_1
The cursory package also provides a table_1 function that allows for
describing variables of a dataset for different subsets automatically.
This is useful in creating the very common demographics “table 1”.
table_1(iris, Species)| Variable | Level | (All) | setosa | versicolor | virginica |
|---|---|---|---|---|---|
| Sepal.Length | Min | 4.300 | 4.300 | 4.900 | 4.900 |
| Median | 5.800 | 5.000 | 5.900 | 6.500 | |
| Mean | 5.843 | 5.006 | 5.936 | 6.588 | |
| Max | 7.900 | 5.800 | 7.000 | 7.900 | |
| SD | 0.828 | 0.352 | 0.516 | 0.636 | |
| Sepal.Width | Min | 2.000 | 2.300 | 2.000 | 2.200 |
| Median | 3.000 | 3.400 | 2.800 | 3.000 | |
| Mean | 3.057 | 3.428 | 2.770 | 2.974 | |
| Max | 4.400 | 4.400 | 3.400 | 3.800 | |
| SD | 0.436 | 0.379 | 0.314 | 0.322 | |
| Petal.Length | Min | 1.000 | 1.000 | 3.000 | 4.500 |
| Median | 4.300 | 1.500 | 4.300 | 5.500 | |
| Mean | 3.758 | 1.462 | 4.260 | 5.552 | |
| Max | 6.900 | 1.900 | 5.100 | 6.900 | |
| SD | 1.765 | 0.174 | 0.470 | 0.552 | |
| Petal.Width | Min | 0.100 | 0.100 | 1.000 | 1.400 |
| Median | 1.300 | 0.200 | 1.300 | 2.000 | |
| Mean | 1.199 | 0.246 | 1.326 | 2.026 | |
| Max | 2.500 | 0.600 | 1.800 | 2.500 | |
| SD | 0.762 | 0.105 | 0.198 | 0.275 |
The table_1() function also tags the Variable column as a dontrepeat
class column which make repeating values in columns not appear when
formatted, so that tables are easier to read.