The function findInterval
finds the index of one vector x
in
another, vec
, where the latter must be non-decreasing. Where
this is trivial, equivalent to apply( outer(x, vec, ">="), 1, sum)
,
as a matter of fact, the internal algorithm uses interval search
ensuring \(O(n \log N)\) complexity where
n <- length(x)
(and N <- length(vec)
). For (almost)
sorted x
, it will be even faster, basically \(O(n)\).
This is the same computation as for the empirical distribution
function, and indeed, findInterval(t, sort(X))
is
identical to \(n F_n(t; X_1,\dots,X_n)\) where \(F_n\) is the empirical distribution
function of \(X_1,\dots,X_n\).
When rightmost.closed = TRUE
, the result for x[j] = vec[N]
(\( = \max vec\)), is N - 1
as for all other
values in the last interval.
left.open = TRUE
is occasionally useful, e.g., for survival data.
For (anti-)symmetry reasons, it is equivalent to using
“mirrored” data, i.e., the following is always true:
identical(
findInterval( x, v, left.open= TRUE, ...) ,
N - findInterval(-x, -v[N:1], left.open=FALSE, ...) )
where N <- length(vec)
as above.