It turns out that the logliklihood for a conditional logistic
regresson model = loglik from a Cox model with a particular data
structure. Proving this is a nice homework exercise for a PhD
statistics class; not too hard, but the fact that it is true is
surprising. When a well tested Cox model routine is available many packages use
this `trick' rather than writing a new software routine from
scratch, and this is what the clogit routine does.
In detail, a stratified Cox model with each case/control group
assigned to its own stratum, time set to a constant,
status of 1=case 0=control,
and using the exact partial likelihood has the same likelihood formula
as a conditional logistic regression. The clogit routine creates
the necessary dummy variable of times (all 1) and the strata,
then calls coxph.
The computation of the exact partial likelihood can be very slow,
however. If a particular strata had say 10 events out of 20 subjects
we have to add up a denominator that involves all possible ways of
choosing 10 out of 20, which is 20!/(10! 10!) = 184756 terms. Gail et
al describe a fast recursion method which partly ameleorates
this; it was incorporated into version 2.36-11 of the survival
package. The compuation remains infeasable for very large groups of
ties, say 100 ties out of 500 subjects, and may even lead to integer
overflow for the subscripts -- in this latter case the routine will
refuse to undertake the task. The Efron approximation is normally a
sufficiently accurate substitute.
Most of the time conditional logistic modeling
is applied data with 1 case + k controls per set, however, in
which case all of the approximations for ties lead to exactly the
same result.
The 'appoximate' option maps to the
Breslow approximation for the Cox model, for historical reasons.
It is not clear how case weights should be handled. For instance if
there are two deaths in a strata, one with weight=1 and one with
weight=2, should the likelihood calculation consider all subsets of
size 2 or all subsets of size 3?
Consequently, case weights are ignored by the routine.