Learn R Programming

SmartEDA (version 0.3.10)

ExpCustomStat: Customized summary statistics

Description

Table of descriptive statistics. Output returns matrix object containing descriptive information on all input variables for each level or combination of levels in categorical/group variable. Also while running the analysis user can filter out the data by individual variable level or across data level.

Usage

ExpCustomStat(
  data,
  Cvar = NULL,
  Nvar = NULL,
  stat = NULL,
  gpby = TRUE,
  filt = NULL,
  dcast = FALSE,
  value = NULL
)

Value

summary statistics as dataframe. Usage of this function is detailed in user guide vignettes document.

Arguments

data

data frame or Matrix

Cvar

qualitative variables on which to stratify / subgroup or run categorical summaries

Nvar

quantitative variables on which to run summary statistics for.

stat

descriptive statistics. Specify which summary statistics required (Included all base stat functions like 'mean','medain','max','min','sum','IQR','sd','var',quantile like P0.1, P0.2 etc'). Also added two more stat here are 'PS' is percentage of shares and 'Prop' is column percentage

gpby

default value is True. Group level summary will be created based on list of categorical variable. If summary required at each categorical variable level then keep this option as FALSE

filt

filter out data while running the summary statistics. Filter can apply across data or individual variable level using filt option. If there are multiple filters, seperate the conditons by using '^'. Ex: Nvar = c("X1","X2","X3","X4"), let say we need to exclude data X1>900 for X1 variable, X2==10 for X2 variable, Gender !='Male' for X3 variable and all data for X4 then filt should be, filt = c("X1>900"^"X2==10"^"Gender!='Male'"^all) or c("X1>900"^"X2==10"^"Gender!='Male'"^ ^). in case if you want to keep all data for some of the variable listed in Nvar, then specify inside the filt like ^all^ or ^ ^(single space)

dcast

fast dcast from data.table

value

If dcast is TRUE, pass the variable name which needs to come on column

Details

Filter unique value from all the numeric variables

Case1: Excluding unique values or outliers values like '999' or '9999' or '888' etc from each selected variables.

Eg:dat = data.frame(x = c(23,24,34,999,12,12,23,999,45), y = c(1,3,4,999,0,999,0,8,999,0)

Exclude 999:

x = c(23,24,34,12,12,23,45)

y = c(1,3,4,0,0,8,0)

Case2: Summarise the data with selected descriptive statistics like 'mean' and 'median' or 'sum' and 'variance' etc..

Case3: Aggregate the data with different statistics using group by statement

Case4: Reshape the summary statistics.. etc

The complete functionality of `ExpCustomStat` function is detailed in vignette help page with example code.

Examples

Run this code
## Selected summary statistics 'Count,sum, percentage of shares' for
## disp and mpg variables by vs, am and gear
ExpCustomStat(mtcars, Cvar=c("vs","am","gear"), Nvar = c("disp","mpg"),
             stat = c("Count","sum","PS"), gpby = TRUE, filt = NULL)

ExpCustomStat(mtcars, Cvar=c("gear"), Nvar = c("disp","mpg"),
             stat = c("Count","sum","var"), gpby = TRUE, filt = "am==1")

ExpCustomStat(mtcars, Cvar = c("gear"), Nvar = c("disp","mpg"),
             stat = c("Count","sum","mean","median"), gpby = TRUE, filt = "am==1")

## Selected summary statistics 'Count and fivenum stat for disp and mpg
## variables by gear
ExpCustomStat(mtcars, Cvar = c("gear"), Nvar = c("disp", "mpg"),
              stat = c("Count",'min','p0.25','median','p0.75','max'), gpby = TRUE)

Run the code above in your browser using DataLab