el.trun.test: Empirical likelihood ratio for mean with left truncated data

Description

This program uses EM algorithm to compute the maximized (wrt $p_i$) empirical log likelihood function for left truncated data with the MEAN constraint: $$ \sum p_i f(x_i) = \int f(t) dF(t) = \mu ~. $$ Where $p_i = \Delta F(x_i)$ is a probability. $\mu$ is a given constant. It also returns those $p_i$ and the $p_i$ without constraint, the Lynden-Bell estimator.

The log likelihood been maximized is $$ \sum_{i=1}^n \log \frac{\Delta F(x_i)}{1-F(y_i)} .$$

Usage

el.trun.test(y,x,fun=function(t){t},mu,maxit=20,error=1e-9)

Value

A list with the following components:

"-2LLR": the maximized empirical log likelihood ratio under the constraint.
NPMLE: jumps of NPMLE of CDF at ordered x.
NPMLEmu: same jumps but for constrained NPMLE.

Arguments

y: a vector containing the left truncation times.
x: a vector containing the survival times. truncation means x>y.
fun: a continuous (weight) function used to calculate the mean as in $H_0$. fun(t) must be able to take a vector input t. Default to the identity function $f(t)=t$.
mu: a real number used in the constraint, mean value of $f(X)$.
error: an optional positive real number specifying the tolerance of iteration error. This is the bound of the $L_1$ norm of the difference of two successive weights.
maxit: an optional integer, used to control maximum number of iterations.

Author

Mai Zhou

Details

This implementation is all in R and have several for-loops in it. A faster version would use C to do the for-loop part. But it seems faster enough and is easier to port to Splus.

When the given constants $\mu$ is too far away from the NPMLE, there will be no distribution satisfy the constraint. In this case the computation will stop. The -2 Log empirical likelihood ratio should be infinite.

The constant mu must be inside $( \min f(x_i) , \max f(x_i) ) $ for the computation to continue. It is always true that the NPMLE values are feasible. So when the computation stops, try move the mu closer to the NPMLE --- $$ \sum_{d_i=1} p_i^0 f(x_i) $$ $p_i^0$ taken to be the jumps of the NPMLE of CDF. Or use a different fun.

References

Zhou, M. (2005). Empirical likelihood ratio with arbitrary censored/truncated data by EM algorithm. Journal of Computational and Graphical Statistics, 14, 643-656.

Li, G. (1995). Nonparametric likelihood ratio estimation of probabilities for truncated data. JASA 90, 997-1003.

Turnbull (1976). The empirical distribution function with arbitrarily grouped, censored and truncated data. JRSS B 38, 290-295.

Examples

Run this code

## example with tied observations
vet <- c(30, 384, 4, 54, 13, 123, 97, 153, 59, 117, 16, 151, 22, 56, 21, 18,
         139, 20, 31, 52, 287, 18, 51, 122, 27, 54, 7, 63, 392, 10)
vetstart <- c(0,60,0,0,0,33,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
el.trun.test(vetstart, vet, mu=80, maxit=15)

Run the code above in your browser using DataLab