nparncpt: Nonparametric estimation of noncentrality parameters

Description

The functions use Gaussian basis functions to estimate the noncentrality parameters (ncp) from a large number of t-statistics.

Usage

nparncpt(tstat, df, ...)
nparncpt.sqp(tstat, df, penalty=3L, lambdas=10^seq(-1,5,by=1), starts, 
		IC=c('BIC','CAIC','HQIC','AIC'), K=100, 
		bounds=quantile(tstat,c(.01,.99)), 
        solver=c('solve.QP','lsei','ipop','LowRankQP'),
		plotit=FALSE, verbose=FALSE, approx.hess=TRUE, ... )

Arguments

tstat

Numeric vector of noncentrality parameters

Numeric vector of degrees of freedom

penalty

An integer scalar among 1 through 5, indicating the order of derivatives of the estimated density funciton of ncp. The integral of square of such derivatives is the penalty to the log likelihood function. A character value among c('1st.deriv','2nd.deriv','3rd.deriv','4th.deriv','5th.deriv') is also accepted but deprecated.

lambdas

Numeric vector of smoothness tuning parameter lambda to be tried. The one that minimizes NIC will be chosen.

starts

Optional numeric vector of starting values. If missing, parncpt will be called with zeromean set to FALSE to get an initial esimate of pi0. And the starting values (theta) will be set all equal to each other and sum to 1-pi0. Note that this is the starting value for the largest lambdas only. For smaller lambdas, the estimates from larger lambdas will be used as starting values (i.e., warm start).

Character; one of AIC, BIC, CAIC, HQIC, specifying the factor multiplied to the ENP in computing Information Criterion (IC).

The number of basis Gaussian density functions.

bounds

A numeric vector of length 2, giving the approximate bounds where most of the probability of ncp lies.

solver

Character. The name of the function for solving quadratic programming problems. Note that ipop and kernlab are not very reliable. solve.QP is faster but lsei is more stable.

plotit

logical; indicating if plot.nparncpt should be called after estimation. This is always recommended before accepting the results.

verbose

logical; if TRUE, extensive messages will be printed.

approx.hess

either logical or a number between 0 and 1. This helps in reducing time in evaluating the hessian matrix. If it is set to TRUE, for the kth Gaussian basis function and the gth tstat, the marginal t-statistic density evaluated at this tstat will be set to zero if it is below the average of all K*length(tstat) such values. If it is set to FALSE or 0, then none of the density will be treated as zero, no matter how small they are. If it is set to a number between 0 and 1, values below this quantile will be treated as zero. Note that this approximation only affects the computation of hessian matrix, which does not need to be exact in an optimization routine. Hence, a reasonable sparseness speeds up computation of a hessian matrix but might increase the number of iterations to converge. Set this to TRUE seems a reasonable trade-off between the two effects and usually saves computing time.

…

other paramters passed to dtn.mix. Usually, the approximation argument.

Value

A list with class attribute c("nparncpt", "ncpest")

pi0

estimated proportion of true nulls

mu.ncp

mean of ncp

sd.ncp

SD of ncp

logLik

an object of class logLik. The associated df is the estimated effective number of parameters (enp). The log likelihood is also penalized likelihood. See also logLik.ncpest and AIC.

enp

estimated ENP

par

estimated parameters theta

lambda

the lambda that minimizes NIC

gradiant

analytic gradiant at the estimate

hessian

analytic hessian at the estimate

beta

estimated mixing proportions for the NCP distribution

the information criterion specified by the user

all.mus

mean of each basis Gaussian density

all.sigs

SD of each basis Gaussian density

data

a list of tstat and df

i.final

the index of lambdas that minimizes NIC

all.pi0s

estimated pi0 for each lambda

all.enps

ENP for each lambda

all.thetas

parameter estimates for each lambda

all.nics

Network information criterion (NIC) for each lambda

all.nic.sd

SD of NIC for each lambda

all.lambdas

the lambdas argument itself

nobs

the number of test statistics

%% ~Describe the value returned %% If it is a LIST, use %% \item{comp1 }{Description of 'comp1'} %% \item{comp2 }{Description of 'comp2'} %% ...

Details

nparncpt is a wrapper for nparncpt.sqp, the latter of which uses a sequential quadratic programming algorithm to find the mixing proportions of the basis Gaussian density functions.

References

Qu L, Nettleton D, Dekkers JCM. (2012) Improved Estimation of the Noncentrality Parameter Distribution from a Large Number of $t$-statistics, with Applications to False Discovery Rate Estimation in Microarray Data Analysis. Biometrics, 68, 1178--1187.

Examples

Run this code

# NOT RUN {
data(simulatedTstat)
(npfit=nparncpt(tstat=simulatedTstat, df=8)); 
(pfit=parncpt(tstat=simulatedTstat, df=8, zeromean=FALSE)); plot(pfit)
(pfit0=parncpt(tstat=simulatedTstat, df=8, zeromean=TRUE)); plot(pfit0)
(spfit=sparncpt(npfit,pfit)); plot(spfit)
# }

Run the code above in your browser using DataLab