analyze: Generate Rows Analyzing Variables Across Columns

Description

Adding /analyzed variables/ to our table layout defines the primary tabulation to be performed. We do this by adding calls to analyze and/or analyze_colvars into our layout pipeline. As with adding further splitting, the tabulation will occur at the current/next level of nesting by default.

Usage

analyze(
  lyt,
  vars,
  afun = simple_analysis,
  var_labels = vars,
  table_names = vars,
  format = NULL,
  nested = TRUE,
  inclNAs = FALSE,
  extra_args = list(),
  show_labels = c("default", "visible", "hidden"),
  indent_mod = 0L
)

Arguments

lyt

layout object pre-data used for tabulation

vars

character vector. Multiple variable names.

afun

function. Analysis function, must take x or df as its first parameter. Can optionally take other parameters which will be populated by the tabulation framework. See Details in analyze.

var_labels

character. Variable labels for 1 or more variables

table_names

character. Names for the tables representing each atomic analysis. Defaults to var.

format

FormatSpec. Format associated with this split. Formats can be declared via strings ("xx.x") or function. In cases such as analyze calls, they can character vectors or lists of functions.

nested

boolean, Add this as a new top-level split (defining a new subtable directly under root). Defaults to FALSE

inclNAs

boolean. Should observations with NA in the var variable(s) be included when performing this analysis. Defaults to FALSE

extra_args

list. Extra arguments to be passed to the tabulation function. Element position in thte list corresponds to the children of this split. Named elements in the child-specific lists are ignored if they do not match a formal argument of the ttabulation function.

show_labels

character(1). Should the variable labels for corresponding to the variable(s) in vars be visible in the resulting table.

indent_mod

numeric. Modifier for the default indent position for the structure created by this function(subtable, content table, or row) and all of that structure's children. Defaults to 0, which corresponds to the unmodified default behavior.

Value

A PreDataTableLayouts object suitable for passing to further layouting functions, and to build_table.

.spl_context Details

The .spl_context data.frame gives information about the subsets of data corresponding to the splits within-which the current analyze action is nested. Taken together, these correspond to the path that the resulting (set of) rows the analysis function is creating, although the information is in a slighlyt different form. Each split (which correspond to groups of rows in the resulting table) is represented via the following columns:

split: The name of the split (often the variable being split in the simple case)
value: The string representation of the value at that split
full_parent_df: a dataframe containing the full data (ie across all columns) corresponding to the path defined by the combination of split and value of this row and all rows above this row
all_cols_n: the number of observations corresponding to this row grouping (union of all columns)
(row-split and analyze contexts only) <1 column for each column in the table structure: These list columns (named the same as names(col_exprs(tab))) contain logical vectors corresponding to the subset of this row's full_parent_df corresponding to that column
cur_col_subset: List column containing logical vectors indicating the subset of that row's full_parent_df for the column currently being created by the analysis function
cur_col_n: integer column containing the observation counts for that split

note Within analysis functions that accept .spl_context, the all_cols_n and cur_col_n columns of the dataframe will contain the 'true' observation counts corresponding to the row-group and row-group x column subsets of the data. These numbers will not, and currently cannot, reflect alternate column observation counts provided by the alt_counts_df, col_counts or col_total arguments to build_table

Details

When non-NULL format is used to specify formats for all generated rows, and can be a character vector, a function, or a list of functions. It will be repped out to the number of rows once this is known during the tabulation process, but will be overridden by formats specified within rcell calls in afun.

The analysis function (afun) should take as its first parameter either x or df. Which of these the function accepts changes the behavior when tabulation is performed.

If afun's first parameter is x, it will receive the corresponding subset vector of data from the relevant column (from var here) of the raw data being used to build the table.
If afun's first parameter is df, it will receive the corresponding subset data.frame (i.e. all columns) of the raw data being tabulated

In addition to differentiation on the first argument, the analysis function can optionally accept a number of other parameters which, if and only if present in the formals will be passed to the function by the tabulation machinery. These are as follows:

.N_col: column-wise N (column count) for the full column being tabulated within
.N_total: overall N (all observation count, defined as sum of column counts) for the tabulation
.N_row: row-wise N (row group count) for the group of observations being analyzed (ie with no column-based subsetting)
.df_row: data.frame for observations in the row group being analyzed (ie with no column-based subsetting)
.var: variable that is analyzed
.ref_group: data.frame or vector of subset corresponding to the ref_group column including subsetting defined by row-splitting. Optional and only required/meaningful if a ref_group column has been defined
.ref_full: data.frame or vector of subset corresponding to the ref_group column without subsetting defined by row-splitting. Optional and only required/meaningful if a ref_group column has been defined
.in_ref_col: boolean indicates if calculation is done for cells withing the reference column
.spl_context: data.frame, each row gives information about a previous/'ancestor' split state. see below

Examples

Run this code

# NOT RUN {
l <- basic_table() %>%
    split_cols_by("ARM") %>%
    analyze("AGE", afun = list_wrap_x(summary) , format = "xx.xx")
l
build_table(l, DM)


l <- basic_table() %>%
    split_cols_by("Species") %>%
    analyze(head(names(iris), -1), afun = function(x) {
        list(
            "mean / sd" = rcell(c(mean(x), sd(x)), format = "xx.xx (xx.xx)"),
            "range" = rcell(diff(range(x)), format = "xx.xx")
        )
    })
l
build_table(l, iris)

# }

Run the code above in your browser using DataLab