Usage
"trend.ebam"(data, cl, catt = TRUE, approx = TRUE, n.interval = NULL, df.dens = NULL, knots.mode = NULL, type.nclass = "wand", B = 100, B.more = 0.1, B.max = 50000, n.subset = 10, fast = FALSE, df.ratio = 3, rand = NA, ...)
"trend.ebam"(data, cl, catt = TRUE, approx = TRUE, n.interval = NULL, df.dens = NULL, knots.mode = NULL, type.nclass = "wand", ...)
Arguments
data
either a numeric matrix or data frame, or a list. If a matrix or data frame, then each row
must correspond to a variable (e.g., a SNP), and each column to a sample (i.e.\ an observation).
The values in the matrix or data frame are interpreted as the scores for the different levels
of the variables.
If the number of observations is huge it is better to specify data
as a list consisting
of matrices, where each matrix represents one group and summarizes
how many observations in this group show which level at which variable. The row and column names
of all matrices must be identical and in the same order. The column names must be interpretable
as numeric scores for the different levels of the variables. These matrices can, e.g.,
be generated using the function rowTables
from the package scrime. (It is recommended
to use this function, as trend.stat
has been made for using the output of rowTables
.)
For details on how to specify this list, see the examples section on this man page, and the help for
rowChisqMultiClass
in the package scrime.
cl
a numeric vector of length ncol(data)
indicating to which classes
the samples in the matrix or data frame data
belongs. The values in cl
must be interpretable
as scores for the different classes. Must be specified if data
is a matrix or a data frame,
whereas cl
can but must not be specified if data
is a list. If specified in the latter case,
cl
must have length data
, i.e.\ one score for each of the matrices, and thus for each of
the groups. If not specified, cl
will be set to the integers between 1 and $c$, where $c$
is the number of classes/matrices.
catt
should the Cochran-Armitage trend statistic be computed in the two-class case? If FALSE
,
the trend statistic described on page 87 of Agresti (2002) is determined which differs by the factor
$(n - 1) / n$ from the Cochran-Armitage trend statistic.
approx
should the null distribution be approximated by the $Chisquare$-distribution
with one degree of freedom? If FALSE
, a permutation method is used to estimate the null distribution.
If data
is a list, approx
must currently be TRUE
.
n.interval
the number of intervals used in the logistic regression with
repeated observations for estimating the ratio $f0/f$
(if approx = FALSE
), or in the Poisson regression used to estimate
the density of the observed $z$-values (if approx = TRUE
).
If NULL
, n.interval
is set to 139 if approx = FALSE
,
and estimated by the method specified by type.nclass
if approx = TRUE
.
df.dens
integer specifying the degrees of freedom of the natural cubic
spline used in the Poisson regression to estimate the density of the observed
$z$-values. Ignored if approx = FALSE
.
If NULL
, df.dens
is set to 3 if the degrees of freedom
of the appromimated null distribution, i.e.\ the $ChiSquare$-distribution,
are less than or equal to 2, and otherwise df.dens
is set to 5.
knots.mode
if TRUE
the df.dens
- 1 knots are centered around the
mode and not the median of the density when fitting the Poisson regression model.
Ignored if approx = FALSE
.
If not specified, knots.mode
is set to
TRUE
if the degrees of freedom of the approximated null distribution, i.e.\
tht $ChiSquare$-distribution, are larger than or equal to 3, and otherwise
knots.mode
is set to FALSE
. For details on this density estimation,
see denspr
. type.nclass
character string specifying the procedure used to compute the
number of cells of the histogram. Ignored if approx = FALSE
or
n.interval
is specified. Can be either
"wand"
(default), "scott"
, or "FD"
. For details, see
denspr
. B
the number of permutations used in the estimation of the null distribution,
and hence, in the computation of the expected $z$-values.
B.more
a numeric value. If the number of all possible permutations is smaller
than or equal to (1+B.more
)*B
, full permutation will be done.
Otherwise, B
permutations are used.
B.max
a numeric value. If the number of all possible permutations is smaller
than or equal to B.max
, B
randomly selected permutations will be used
in the computation of the null distribution. Otherwise, B
random draws
of the group labels are used.
n.subset
a numeric value indicating in how many subsets the B
permutations are divided when computing the permuted $z$-values. Please note
that the meaning of n.subset
differs between the SAM and the EBAM functions.
fast
if FALSE
the exact number of permuted test scores that are
more extreme than a particular observed test score is computed for each of
the variables/SNPs. If TRUE
, a crude estimate of this number is used.
df.ratio
integer specifying the degrees of freedom of the natural cubic
spline used in the logistic regression with repeated observations. Ignored
if approx = TRUE
.
rand
numeric value. If specified, i.e. not NA
, the random number generator
will be set into a reproducible state.