Last chance! 50% off unlimited learning
Sale ends in
For any number of cross-classification variables, bystats
returns a matrix with the sample size, number missing y
, and
fun(non-missing y)
, with the cross-classifications designated
by rows. Uses Harrell's modification of the interaction
function to produce cross-classifications. The default fun
is
mean
, and if y
is binary, the mean is labeled as
Fraction
. There is a print
method as well as a
latex
method for objects created by bystats
.
bystats2
handles the special case in which there are 2
classifcation variables, and places the first one in rows and the
second in columns. The print
method for bystats2
uses
the print.char.matrix
function to organize statistics
for cells into boxes.
bystats(y, ..., fun, nmiss, subset)
# S3 method for bystats
print(x, ...)
# S3 method for bystats
latex(object, title, caption, rowlabel, ...)
bystats2(y, v, h, fun, nmiss, subset)
# S3 method for bystats2
print(x, abbreviate.dimnames=FALSE,
prefix.width=max(nchar(dimnames(x)[[1]])), ...)
# S3 method for bystats2
latex(object, title, caption, rowlabel, ...)
for bystats
, a matrix with row names equal to the classification labels and column
names N, Missing, funlab
, where funlab
is determined from fun
.
A row is added to the end with the summary statistics computed
on all observations combined. The class of this matrix is bystats
.
For bystats
, returns a 3-dimensional array with the last dimension
corresponding to statistics being computed. The class of the array is
bystats2
.
a binary, logical, or continuous variable or a matrix or data frame of
such variables. If y
is a data frame it is converted to a matrix.
If y
is a data frame or matrix, computations are done on subsets of
the rows of y
, and you should specify fun
so as to be able to operate
on the matrix. For matrix y
, any column with a missing value causes
the entire row to be considered missing, and the row is not passed to
fun
.
For bystats
, one or more classifcation variables separated by commas.
For print.bystats
, options passed to print.default
such as digits
.
For latex.bystats
, and latex.bystats2
,
options passed to latex.default
such as digits
.
If you pass cdec
to latex.default
, keep in mind that the first one or
two positions (depending on nmiss
) should have zeros since these
correspond with frequency counts.
vertical variable for bystats2
. Will be converted to factor
.
horizontal variable for bystats2
. Will be converted to factor
.
a function to compute on the non-missing y
for a given subset.
You must specify fun=
in front of the function name or definition.
fun
may return a single number or a vector or matrix of any length.
Matrix results are rolled out into a vector, with names preserved.
When y
is a matrix, a common fun
is function(y) apply(y, 2, ff)
where ff
is the name of a function which operates on one column of
y
.
A column containing a count of missing values is included if nmiss=TRUE
or if there is at least one missing value.
a vector of subscripts or logical values indicating the subset of data to analyze
set to TRUE
to abbreviate
dimnames
in output
see print.char.matrix
title
to pass to latex.default
. Default is the first word of
the character string version of the first calling argument.
caption to pass to latex.default
. Default is the heading
attribute from the object produced by bystats
.
rowlabel
to pass to latex.default
. Default is the byvarnames
attribute from the object produced by bystats
. For bystats2
the
default is ""
.
an object created by bystats
or bystats2
an object created by bystats
or bystats2
latex
produces a .tex
file.
Frank Harrell
Department of Biostatistics
Vanderbilt University
fh@fharrell.com
interaction
, cut
, cut2
, latex
, print.char.matrix
,
translate
if (FALSE) {
bystats(sex==2, county, city)
bystats(death, race)
bystats(death, cut2(age,g=5), race)
bystats(cholesterol, cut2(age,g=4), sex, fun=median)
bystats(cholesterol, sex, fun=quantile)
bystats(cholesterol, sex, fun=function(x)c(Mean=mean(x),Median=median(x)))
latex(bystats(death,race,nmiss=FALSE,subset=sex=="female"), digits=2)
f <- function(y) c(Hazard=sum(y[,2])/sum(y[,1]))
# f() gets the hazard estimate for right-censored data from exponential dist.
bystats(cbind(d.time, death), race, sex, fun=f)
bystats(cbind(pressure, cholesterol), age.decile,
fun=function(y) c(Median.pressure =median(y[,1]),
Median.cholesterol=median(y[,2])))
y <- cbind(pressure, cholesterol)
bystats(y, age.decile,
fun=function(y) apply(y, 2, median)) # same result as last one
bystats(y, age.decile, fun=function(y) apply(y, 2, quantile, c(.25,.75)))
# The last one computes separately the 0.25 and 0.75 quantiles of 2 vars.
latex(bystats2(death, race, sex, fun=table))
}
Run the code above in your browser using DataLab