Abbreviation: dn
Plots a normal density curve and/or a general density curve superimposed over a histogram, all estimated from the data. Also reports the Shapiro-Wilk normality test and summary statistics.
If the provided object to analyze is a set of multiple variables, including an entire data frame, then each non-numeric variable in the data frame is analyzed and the results written to the current graphics device or to a pdf file in the current working directory. The name of each output pdf file that contains a bar chart and its path are specified in the output.
When output is assigned into an object, such as d
in d <- dn(Y)
, the pieces of output can be accessed for later analysis. A primary such analysis is knitr
for dynamic report generation from an R markdown document in which R output is embedded in documents, facilitated by the Rmd
option. See value
below.
Density(x, data=d, rows=NULL,
n_cat=getOption("n_cat"), Rmd=NULL, bw=NULL, type=c("general", "normal", "both"),
histogram=TRUE, bin_start=NULL, bin_width=NULL,
color_nrm="gray20", color_gen="gray20",
fill_nrm=NULL, fill_gen=NULL,
rotate_x=0, rotate_y=0, offset=0.5,
x.pt=NULL, xlab=NULL, main=NULL, sub=NULL, y_axis=FALSE,
x.min=NULL, x.max=NULL,
rug=FALSE, color_rug="black", size_rug=0.5,
eval_df=NULL, digits_d=NULL, quiet=getOption("quiet"),
width=4.5, height=4.5, pdf_file=NULL,
fun_call=NULL, …)
dn(…)
Variable(s) to analyze. Can be a single numerical variable,
either within a data frame or as a vector in the user's workspace,
or multiple variables in a data frame such as designated with the
c
function, or an entire data frame. If not specified,
then defaults to all numerical variables in the specified data
frame, d
by default.
Optional data frame that contains the variable(s) of interest,
default is d
.
A logical expression that specifies a subset of rows of the data frame to analyze.
For the analysis of multiple variables, such as a data frame, specifies the largest number of unique values of variable of a numeric data type for which the variable will be analyzed as categorical. Default is 0.
File name for the file of R markdown to be written, if specified. The file type is .Rmd, which automatically opens in RStudio, but it is a simple text file that can be edited with any text editor, including RStudio.
Bandwidth of kernel estimation. Initial value is "nrd0", but unless specified, then may be iterated upward to create a smoother curve.
Type of density curve plotted. By default, the general density is plotted, though can request the normal density and both densities.
If TRUE
overlay the density plot over a histogram.
Optional specified starting value of the bins.
Optional specified bin width, which can be specified with or
without a bin_start
value.
Color of the normal curve.
Color of the general density curve.
Fill color for the estimated normal curve, with a partially transparent blue as the default, and transparent for the gray theme.
Fill color for the estimated general density curve, with a partially transparent light red as the default, and a light transparent gray for the gray theme.
Degrees that the x
-axis values are rotated, usually to
accommodate longer values, typically used in conjunction with offset
.
Degrees that the y
-axis values are rotated.
The amount of spacing between the axis values and the axis_ Default is 0.5. Larger values such as 1.0 are used to create space for the label when longer axis value names are rotated.
Value of the point on the x-axis for which to draw a unit interval
around illustrating the corresponding area under the general density curve.
Only applies when requesting type=general
.
Label for x-axis_ Defaults to variable name unless variable labels
are present, the defaults to also include the corresponding variable
label. Can style with the lessR style
function.
Label for the title of the graph.
Can set size with main_cex
and color with main_color
from the
lessR style
function.
Sub-title of graph, below xlab_
Specifies if the y-axis, the density axis, should be included.
Smallest value of the variable x plotted on the x-axis_
Largest value of the variable x plotted on the x-axis_
If TRUE
, add a rug plot, a direct display of density in the
form of a narrow band beneath the density curve.
Color of the rug ticks.
Line width of the rug ticks.
Determines if to check for existing data frame and
specified variables. By default is TRUE
unless the shiny
package is loaded then set to FALSE
so
that Shiny will run. Needs to be set to FALSE
if using
the pipe %\>%
notation.
Number of significant digits for each of the displayed summary statistics.
If set to TRUE
, no text output. Can change system default
with style
function.
Width of the plot window in inches, defaults to 4.5.
Height of the plot window in inches, defaults to 4.5.
Indicate to direct pdf graphics to the specified name of the pdf file.
Function call. Used with knitr
to pass the function call when
obtained from the abbreviated function call dn
.
Other parameter values for graphics as defined processed
by plot
, including xlim
, ylim
, lwd
and lab_cex
,
color_main
, color_lab
, sub
,
color_sub
, and color_ticks
to specify the color of
the ticks used to label the axis values, density
, for the general
density calculations, can set bandwidth with the standard bw
.
The output can optionally be saved into an R
object, otherwise it simply appears in the console. Redesigned in lessR
version 3.3 to provide two different types of components: the pieces of readable output, and a variety of statistics. The readable output are character strings such as tables amenable for reading. The statistics are numerical values amenable for further analysis. The motivation of these types of output is to facilitate R markdown documents, as the name of each piece, preceded by the name of the saved object and a $
, can be inserted into the R~Markdown document (see examples
).
READABLE OUTPUT
out_title
: Title of output
out_stats
: Statistics
out_file
: Name and location of optional R markdown file
STATISTICS
bw
: Bandwidth parameter
n
: Number of data values analyzed
n.miss
: Number of missing data values
W
: W statistic for Shapiro-Wilk normality test
pvalue
: p-value for W statistic
Although not typically needed, if the output is assigned to an object named, for example, h
, then the contents of the object can be viewed directly with the unclass
function, here as unclass(h)
.
OVERVIEW
Results are based on the standard dnorm
function and density
R functions for estimating densities from data, as well as the hist
function for calculating a histogram. Colors are provided by default and can also be specified.
The default histogram can be modified with the bin_start
and bin_width
options. Use the Histogram
function in this package for more control over the parameters of the histogram.
The limits for the axes are automatically calculated so as to provide sufficient space for the density curves and histogram, and should generally not require user intervention. Also, the curves are centered over the plot window so that the resulting density curves are symmetric even if the underlying histogram is not. The estimated normal curve is based on the corresponding sample mean and standard deviation.
If x.pt
is specified, then type
is set to general and y_axis
set to TRUE
.
DATA
The data may either be a vector from the global environment, the user's workspace, as illustrated in the examples below, or one or more variable's in a data frame, or a complete data frame. The default input data frame is d
. Can specify the source data frame name with the data
option. If multiple variables are specified, only the numerical variables in the list of variables are analyzed. The variables in the data frame are referenced directly by their names, that is, no need to invoke the standard R
mechanisms of the d$name
notation, the with
function or the attach
function. If the name of the vector in the global environment and of a variable in the input data frame are the same, the vector is analyzed.
The rows
parameter subsets rows (cases) of the input data frame according to a logical expression. Use the standard R operators for logical statements as described in Logic
such as &
for and, |
for or and !
for not, and use the standard R relational operators as described in Comparison
such as ==
for logical equality !=
for not equals, and >
for greater than.
COLOR THEME
Individual colors in the plot can be manipulated with options such as color_bars
for the color of the histogram bars. A color theme for all the colors can be chosen for a specific plot with the colors
option with the lessR
function style
. The default color theme is blue
, but a gray scale is available with "gray"
, and other themes are available as explained in style
, such as "red"
and "green"
. Use the option style(sub_theme="black")
for a black background and partial transparency of plotted colors.
VARIABLE LABELS
If variable labels exist, then the corresponding variable label is by default listed as the label for the horizontal axis and on the text output. For more information, see Read
.
PDF OUTPUT
To obtain pdf output, use the pdf
option, perhaps with the optional width
and height
options. These files are written to the default working directory, which can be explicitly specified with the R setwd
function.
ONLY VARIABLES ARE REFERENCED
The referenced variable in a lessR
function can only be a variable name (or list of variable names). This referenced variable must exist in either the referenced data frame, such as the default d
, or in the user's workspace, more formally called the global environment. That is, expressions cannot be directly evaluated. For example:
> Density(rnorm(50)) # does NOT work
Instead, do the following:
> Y <- rnorm(50) # create vector Y in user workspace > Density(Y) # directly reference Y
# NOT RUN {
# make sure default style is active
style()
# create data frame, d, to mimic reading data with Read function
# d contains both numeric and non-numeric data
d <- data.frame(rnorm(50), rnorm(50), rnorm(50), rep(c("A","B"),25))
names(d) <- c("X","Y","Z","C")
# normal curve and general density curves superimposed over histogram
# all defaults
Histogram(Y, density=TRUE, type="both")
# specify (non-transparent) colors for the curves,
# to make transparent, need alpha option for the rgb function
Histogram(Y, density=TRUE, color_nrm="darkgreen", color_gen="plum")
# rug with custom color and width of ticks
Histogram(Y, density=TRUE, color_rug="steelblue", size.rug=1)
# display only the general estimated density
# so do not display the estimated normal curve
# specify the bandwidth for the general density curve,
# use the standard bw option for the density function
Histogram(Y, density=TRUE, bw=.6)
# display only the general estimated density and a corresponding
# interval of unit width around x.pt
Histogram(Y, density=TRUE, x.pt=2)
# generate R markdown file to be "knit" such as in RStudio
#dn(Y, Rmd="myout")
# variable of interest is in a data frame which is not the default d
# access the breaks variable in the R provided warpbreaks data set
# although data not attached, access the variable directly by its name
Histogram(breaks, density=TRUE, data=warpbreaks)
# densities for all specified numeric variables in a list of variables
# e.g., use the combine or c function to specify a list of variables
Histogram(c(X,Y), density=FALSE)
# }
Run the code above in your browser using DataLab