Discrete mixture on central t and a specified number of noncentral t-distributions. This is slightly modified code of tMixture
in OCplus
package.
discTMix(tstat, n1 = 10, n2 = n1, nq, p0, p1, D, delta, paired = FALSE,
tbreak, ext = TRUE, threshold.delta=0.75, ...)
the vector of genewise t-statistics
number of samples in the first group
number of samples in the second group
the number of components in the mixture that is fitted
a starting value for the proportion of non-differentially expressed genes.
a vector with starting values for the proportions of genes that are differentially expressed with effect size D
.
a vector of starting values for the effect sizes of the differentially expressed genes, corresponding to the proportions p1
.
a vector of starting values for the effect sizes of the differentially expressed genes, expressed as non-centrality parameters; this is just a different way of specifying D
, though if both are given, delta
will get priority.
a logical value indicating whether the t-statistics are two-sample or paired.
either the number of equally spaced bins for tabulating tstat
, or the explicit break points for the bins, very much like the argument breaks
to function cut
; the default value is the square root of the number of genes.
a logical value indicating whether to extend the bins, i.e. to set the lowest bin limit to -infinity and the largest bin limit to inifinity.
mixture components with an estimated absolute non-centrality parameter delta
below this value are considered to be too small for independent estimation; these components and their corresponding p1
are pooled with the null-component and p0
, see Details.
additional arguments that are passed to optim
to control the optimization.
A list with class
discTMix
, with the following components:
the estimated proportion of non-differentially expressed genes, after collapsing components with estimated non-centrality sizes below threshold.delta
.
the estimated proportion before collapsing the components.
the estimated proportions of differentially expressed genes corresponding to the effect sizes, relating to p0.raw
.
effect sizes of the differentially expressed genes in multiples of the gene-by-gene standard deviation.
effect sizes of the differentially expressed genes expressed as the noncentrality parameter of the corresponding noncentral t-distribution.
the AIC value for the maximum likelihood fit.
The output from optim
, giving details about the optimization process.
A list of tstat and df.
The minimum parameter that needs to be specified is nq
- if nothing else is given, the proportions are equally distributed between p0
and the p1
, and the noncentrality parameters are set up symmetrically around zero, e.g. nq=5
leads to equal proportions of 0.2 and noncentrality parameters -2, -1, 1, and 2. If any of p1
, D
, or delta
is specified, nq
is redundant and will be ignored (with a warning). discTMix
will in general make a valiant effort to deduce valid starting values from any combination of nq
, p0
, p1
, D
, and delta
specified by the user, and will complain if that is not possible.
The fitting problem that this function tries to solve is badly conditioned, and will in general depend on the precise set of starting values. Multiple runs from different starting values are usually a good idea. We have found however, that the model seems fairly robust towards misspecification of the number of components, at least when estimating p0
. What happens when too many components are specified is that some of the nominally noncentral t-distributions describing the behaviour of differentially expressed genes are fitted with noncentrality parameters very close to zero, and the true p0
gets spread out between the nominal p0
and the almost-central components. Adding up these different contributions usually gives a similar solution to re-fitting the model with fewer components. The cutoff for the size of non-centrality parameters that can be estimated realistically is specified via threshold.delta
, whose default value is based on a small simulation study reported in Pawitan et al. (2005); see Examples. (Note that the AIC can also be helpful in determining the number of components.)
Pawitan Y, Krishna Murthy KR, Michiels S, Ploner A (2005) Bias in the estimation of false discovery rate in microarray studies, Bioinformatics.