- x
a vector of numeric data, or a data frame (for jitter2
or
ecdfpM
)
- object
a data frame or list (even with unequal number of observations per
variable, as long as group
is notspecified)
- side
axis side to use (1=bottom (default for histSpike
), 2=left,
3=top (default for scat1d
), 4=right)
- frac
fraction of smaller of vertical and horizontal axes for tick mark
lengths. Can be negative to move tick marks outside of plot. For
histSpike
, this is the relative y-direction length to be used for the
largest frequency. When scat1d
calls histSpike
, it
multiplies its frac
argument by 2.5. For histSpikeg
,
frac
is a function of f
, the vector of all frequencies. The
default function scales tick marks so that they are between 0.01 and
0.03 of the y range, linearly scaled in the square root of the
frequency less one.
- jitfrac
fraction of axis for jittering. If
\(\code{jitfrac} \le 0\), no
jittering is done. If preserve=TRUE
, the amount of
jittering is independent of jitfrac.
- tfrac
Fraction of tick mark to actually draw. If \(\code{tfrac}<1\),
will draw a random fraction tfrac
of the line segment at
each point. This is useful for very large samples or ones with some
very dense points. The default value is 1 if the number of
non-missing observations n
is less than 125, and
\(\max{(.1, 125/n)}\) otherwise.
- eps
fraction of axis for determining overlapping points in x
. For
preserve=TRUE
the default is 0 and original unique values are
retained, bigger values of eps tends to bias observations from dense
to sparse regions, but ranks are still preserved.
- lwd
line width for tick marks, passed to segments
- col
color for tick marks, passed to segments
- y
specify a vector the same length as x
to draw tick marks
along a curve instead of by one of the axes. The y
values
are often predicted values from a model. The side
argument
is ignored when y
is given. If the curve is already
represented as a table look-up, you may specify it using the
curve
argument instead. y
may be a scalar to use a
constant verticalplacement.
- curve
a list containing elements x
and y
for which linear
interpolation is used to derive y
values corresponding to
values of x
. This results in tick marks being drawn along
the curve. For histSpike
, interpolated y
values are
derived for binmidpoints.
- minimal
for histSpike
set minimal=TRUE
to draw a
minimalist spike histogram with no y-axis. This works best when
produce graphics images that are short, e.g., have a height of
two inches. add
is forced to be FALSE
in this case
so that a standalone graph is produced. Only base graphics are
used.
- bottom.align
set to TRUE
to have the bottoms of tick marks (for
side=1
or side=3
) aligned at the y-coordinate. The
default behavior is to center the tick marks. For
datadensity.data.frame
, bottom.align
defaults to
TRUE
if nint>1
. In other words, if you are only
labeling the first and last axis tick mark, the scat1d
tick
marks are centered on the variable's axis.
- preserve
set to TRUE
to invoke jitter2
- fill
maximum fraction of the axis filled by jittered values. If d
are duplicated values between a lower value l and upper value
u, then d will be spread within
\(\pm \code{fill}*\min{(u-d,d-l)}/2\).
- limit
specifies a limit for maximum shift in jittered values. Duplicate
values will be spread within
\(\pm\code{fill}*\min{(u-d,d-l)}/2\). The
default TRUE
restricts jittering to the smallest
\(\min{(u-d,d-l)}/2\) observed and results
in equal amount of jittering for all d. Setting to
FALSE
allows for locally different amount of jittering, using
maximum space available.
- nhistSpike
If the number of observations exceeds or equals nhistSpike
,
scat1d
will automatically call histSpike
to draw the
data density, to prevent the graphics file from being too large.
- type
used by or passed to histSpike
. Set to "count"
to
display frequency counts rather than relative frequencies, or
"density"
to display a kernel density estimate computed using
the density
function.
- grid
set to TRUE
if the R grid
package is in effect for
the current plot
- nint
number of intervals to divide each continuous variable's axis for
datadensity
. For histSpike
, is the number of
equal-width intervals for which to bin x
, and if instead
nint
is a character string (e.g.,nint="all"
), the
frequency tabulation is done with no binning. In other words,
frequencies for all unique values of x
are derived and
plotted. For histSpikeg
, if x
has no more than
nint
unique values, all observed values are used, otherwise
the data are rounded before tabulation so that there are no more
than nint
intervals. For histSpike
, nint
is
ignored if bins
is given.
- bins
for histSpike
specifies the actual cutpoints to use
for binning x
. The default is to use nint
in
conjunction with xlim
.
- ...
optional arguments passed to scat1d
from datadensity
or to histSpike
from scat1d
. For histSpikep
are passed to the lines
list to add_trace
. For
ecdfpM
these arguments are passed to add_lines
.
- presorted
set to TRUE
to prevent from sorting for determining the order
\(l<d<u\). This is usefull if an existing
meaningfull local order would be destroyed by sorting, as in
\(\sin{(\pi*\code{sort}(\code{round}(\code{runif}(1000,0,10),1)))}\).
- group
an optional stratification variable, which is converted to a
factor
vector if it is not one already
- which
set which="continuous"
to only plot continuous variables, or
which="categorical"
to only plot categorical, character, or
discrete numeric ones. By default, all types of variables are
depicted.
- method.cat
set method.cat="freq"
to depict frequencies of categorical
variables with digits representing the cell frequencies, with size
proportional to the square root of the frequency. By default,
vertical bars are used.
- col.group
colors representing the group
strata. The vector of colors
is recycled to be the same length as the levels of group
.
- n.unique
number of unique values a numeric variable must have before it is
considered to be a continuous variable
- show.na
set to FALSE
to suppress drawing the number of NA
s to
the right of each axis
- naxes
number of axes to draw on each page before starting a new plot. You
can set naxes
larger than the number of variables in the data
frame if you want to compress the plot vertically.
- q
a vector of quantiles to display. By default, quantiles are not
shown.
- extra
a two-vector specifying the fraction of the x
range to add on the left and the fraction to add on the right
- cex.axis
character size for draw labels for axis tick marks
- cex.var
character size for variable names and frequence of NA
s
- lmgp
spacing between numeric axis labels and axis (see par
for
mgp
)
- tck
see tck
under par
- ranges
a list containing ranges for some or all of the numeric variables.
If ranges
is not given or if a certain variable is not found
in the list, the empirical range, modified by pretty
, is
used. Example:
ranges=list(age=c(10,100), pressure=c(50,150))
.
- labels
a vector of labels to use in labeling the axes for
datadensity.data.frame
. Default is to use the names of the
variable in the input data frame. Note: margin widths computed for
setting aside names of variables use the names, and not these
labels.
- minf
For histSpike
, if minf
is specified low bin
frequencies are set to a minimum value of minf
times the
maximum bin frequency, so that rare data points will remain visible.
A good choice of minf
is 0.075.
datadensity.data.frame
passes minf=0.075
to
scat1d
to pass to histSpike
. Note that specifying
minf
will cause the shape of the histogram to be distorted
somewhat.
- mult.width
multiplier for the smoothing window width computed by
histSpike
when type="density"
- xlim
a 2-vector specifying the outer limits of x
for binning (and
plotting, if add=FALSE
and nint
is a number). For
histSpikeg
, observations outside the xlim
range are ignored.
- ylim
y-axis range for plotting (if add=FALSE
). Often needed for
histSpikeg
to help scale the tick mark line segments.
- xlab
x-axis label (add=FALSE
or for ecdfpM
); default is
name of input argument, or for ecdfpM
comes from
label
and units
attributes of the analysis
variable. For ecdfpM
xlab
may be a vector if there
is more than one analysis variable.
- ylab
y-axis label (add=FALSE
or for ecdfpM
)
- add
set to TRUE
to add the spike-histogram to an existing plot,
to show marginal data densities
- formula
a formula of the form y ~ x1
or y ~ x1 + ...
where
y
is the name of the y
-axis variable being plotted
with ggplot
, x1
is the name of the x
-axis
variable, and optional ... are variables used by
ggplot
to produce multiple curves on a panel and/or facets.
- predictions
the data frame being plotted by ggplot
, containing x
and y
coordinates of curves. If omitted, spike histograms
are drawn at the bottom (default) or top of the plot according to
side
.
- data
for histSpikeg
is a mandatory data frame containing raw data whose
frequency distribution is to be summarized, using variables in
formula
.
- plotly
an existing plotly
object. If not NULL
,
histSpikeg
uses plotly
instead of ggplot
.
- lowess
set to TRUE
to have histSpikeg
add a geom_line
layer to the ggplot2
graphic, containing
lowess()
nonparametric smoothers. This causes the
returned value of histSpikeg
to be a list with two
components: "hist"
and "lowess"
each containing
a layer. Fortunately, ggplot2
plots both layers
automatically. If the dependent variable is binary,
iter=0
is passed to lowess
so that outlier
detection is turned off; otherwise iter=3
is passed.
- span
passed to lowess
as the f
argument
- histcol
color of line segments (tick marks) for
histSpikeg
. Default is black. Set to any color or to
"default"
to use the prevailing colors for the
graphic.
- showlegend
set to FALSE
too have the added plotly
traces not have entries in the plot legend
- what
set to "1-F"
to plot 1 minus the ECDF instead of the
ECDF, "f"
to plot cumulative frequency, or "1-f"
to
plot the inverse cumulative frequency
- height,width
passed to plot_ly
- colors
a vector of colors to pas to add_lines
- nrows,ncols
passed to plotly::subplot