It turns out that the loglikelihood for a conditional logistic
regression model = loglik from a Cox model with a particular data
structure. Proving this is a nice homework exercise for a PhD
statistics class; not too hard, but the fact that it is true is
surprising.
When a well tested Cox model routine is available many packages use
this `trick' rather than writing a new software routine from
scratch, and this is what the clogit routine does.
In detail, a stratified Cox model with each case/control group
assigned to its own stratum, time set to a constant,
status of 1=case 0=control,
and using the exact partial likelihood has the same likelihood formula
as a conditional logistic regression. The clogit routine creates
the necessary dummy variable of times (all 1) and the strata,
then calls coxph.
The computation of the exact partial likelihood can be very slow,
however. If a particular strata had say 10 events out of 20 subjects
we have to add up a denominator that involves all possible ways of
choosing 10 out of 20, which is 20!/(10! 10!) = 184756 terms. Gail et
al describe a fast recursion method which partly ameliorates
this; it was incorporated into version 2.36-11 of the survival
package. The computation remains infeasible for very large groups of
ties, say 100 ties out of 500 subjects, and may even lead to integer
overflow for the subscripts -- in this latter case the routine will
refuse to undertake the task. The Efron approximation is normally a
sufficiently accurate substitute.
Most of the time conditional logistic modeling
is applied data with 1 case + k controls per set, in
which case all of the approximations for ties lead to exactly the
same result.
The 'approximate' option maps to the
Breslow approximation for the Cox model, for historical reasons.
Case weights are not allowed when the exact option is used, as the
likelihood is not defined for fractional weights.
Even with integer case weights it is not clear how they should be
handled. For instance if
there are two deaths in a strata, one with weight=1 and one with
weight=2, should the likelihood calculation consider all subsets of
size 2 or all subsets of size 3?
Consequently, case weights are ignored by the routine in this case.