Learn R Programming

analyzer (version 1.0.1)

plottr: Creates plots for the variables in a data.frame

Description

plottr can be used to create plots for all the variables in a dataframe or any one vector. The output is a list of plots for each variable of class 'analyzerPlot'

Usage

plottr(
  tb,
  yvar = NULL,
  xclasses = NULL,
  yclass = NULL,
  printall = F,
  callasfactor = 1,
  FUN1 = Cx,
  FUN2 = Qx,
  FUN3 = CxCy,
  FUN4 = QxCy,
  FUN5 = CxQy,
  FUN6 = QxQy,
  ...
)

Arguments

tb

a data.frame or a vector. If yvar argument is also passed, then this should be a data.frame including the response variable (yvar)

yvar

a string showing the response (dependent) variable name. Can be NULL if response variable is not present. Make sure that this variable is present in the tb

xclasses

a vector of length = ncol(tb) with the data type of all the columns. Can be NULL, in such case function assigns a class to each column. The values have to be either NULL, or a vector of either 'factor' or 'numeric'. The order should be same as the actual columns in tb. In case when tb is a vector, this can be a vector of length 1.

yclass

class of response variable. Can be NULL, but must have value when yvar is not NULL. Value can be 'factor' or 'numeric'

printall

(logical) Whether user wants to show the plots. Setting this as FALSE will only returns a list of plots silently.

callasfactor

minimum unique values needed for x to be considered as numeric. See details for more information

FUN1

an user-defined function for plotting 1 variables when the variable is Continuous. See details for more details on how to define these variables

FUN2

same as FUN1 but for categorical variable

FUN3

an user defined function for plotting 2 variables when both the independent variable (x) and dependent variable (y) are Continuous

FUN4

same as FUN3, but when independent variable (x) is Categorical and dependent variable (y) is Continuous

FUN5

same as FUN3, but when independent variable (x) is Continuous and dependent variable (y) is Categorical

FUN6

same as FUN3, but when both the independent variable (x) and dependent variable (y) are Categorical

...

extra arguments passed to functions FUN1-FUN6

Value

A list of plots for all the variables. Each plot will have the class analyzerPlot and can be displayed using plot(). If printall = TRUE, then all plots will also be displayed.

Details

This is a function which helps in understanding the data through multiple visualizations. This works either for a data.frame having multiple variables or a single x variable or for a combination of predictor x and response y variables. Based on class of x and y different types of plots are automatically generated.

Please note the following points:

Defining the class of variables: If yvar is not NULL, then yclass has to be passed (which can be 'factor' for classification type problem, or 'numeric' for regression). xclasses stores the class of all the variables in the dataframe in same order of columns. Note - if yvar is not NULL, then tb has to be a data.frame with at least 2 columns (including the yvar). In such case xclasses should also have the class of yvar although it is also passed through yclass. This can also be set as NULL, in such case the function assigns a class based on the contents. If variable is factor/character type, then xclasses will have 'factor' as the entry for that variable, else if x is numeric with number of unique values less than callasfactor parameter value, then xclasses will have 'factor', else 'numeric'.

DEFINING CUSTOM FUNCTIONS FOR THE PLOTS USING FUN1, FUN2, FUN3 ... FUN6:

Custom plots can be made using these functions passed as arguments. Following things must be followed while defining such functions:

  • the return plot must be of type 'grob' or 'gtables' or 'ggplot'. Since these outputs will go to arrangeGrob, make sure the output plots are acceptable by arrangeGrob function. See code of CxCy for sample.

  • not all 6 functions are required to be passed. Only pass those functions for which plots need to be changed.

  • FUN1 and FUN2 must have 3 parameters: dat (of type data.frame for the data. Even if there is only one column, it should be passed as a data.frame of one column), xname name of column in dat and ... In addition to these three, any number of additional parameters can be added. Look into source of code of Cx for sample.

  • FUN3, FUN4, FUN5 and FUN6 must have 4 parameters: dat (of type data.frame for the data. Must have two columns for independent and dependent variables), xname name of independent variable in dat, yname name of dependent variable in dat and ... In addition to these four, any number of additional parameters can be added. Look into source of code of CxCy for sample.

  • ... must be added as an argument in all the functions.

To get a better idea, see the code for function CxCy and Cx

Default plots: If the y is NULL, then histogram with density is generated for numeric x. Boxplot is also shown in the same histogram using color and vertical lines. For factor x, a pie chart showing the distribution. This are the univariate plots which can be modified by using the FUN1 and FUN2 arguments.

If y is not NULL, then additional plots are added which can be modified by using the FUN3, FUN4, FUN5, FUN6 arguments:

  • factor x, factor y: Crosstab with heatmap (modified by using FUN6)

  • factor x, numeric y: histogram and boxplot of y for different values of x (modified by using FUN4)

  • numeric x, factor y: histogram and boxplot of x for different values of y (modified by using FUN5)

  • numeric x, numeric y: Scatter plot of x and y with rug plot included (modified by using FUN3)

Examples

Run this code
# NOT RUN {
# simple use for one variable
p <- plottr(mtcars$mpg)
# To display the plot
plot(p$x)

# With complete dataframe and assuming 'mpg' as a dependent variable
p <- plottr(mtcars, yvar = "mpg", yclass = "numeric")
plot(p$disp)

# }

Run the code above in your browser using DataLab