The data are \((\bold{X}_1, Y_1), \dots, (\bold{X}_n, Y_n)\) where \(\bold{X}_i\) is d-dimensional and \(Y_i\) is a
scalar response. PRIM finds modal (and/or anti-modal) regions in the
conditional expectation \(m(\bold{x}) = \bold{E} (Y | \bold{x}).\)
In general, \(Y_i\) can be real-valued. See
vignette("prim")
.
Here, we focus on the special case for binary \(Y_i\). Let
\(Y_i\) = 1 when
\(\bold{X}_i \sim F^+\); and \(Y_i\) = -1 when
\(\bold{X}_i \sim
F^-\) where \(F^+\) and \(F^-\) are different
distribution functions. In this set-up, PRIM finds the
regions where \(F^+\) and \(F^-\) are most different.
The tuning parameters peel.alpha
and paste.alpha
control
the `patience' of PRIM. Smaller values involve more patience. Larger
values less patience. The peeling steps remove data from a box till
either the box mean is smaller than threshold
or the box mass
is less than mass.min
. Pasting is optional, and is used to correct any
possible over-peeling. The default values for peel.alpha
,
paste.alpha
and mass.min
are taken from Friedman &
Fisher (1999).
The type of PRIM estimate is controlled threshold
and
threshold.type
:
threshold.type=1
search for {\(m(\bold{x}) \geq\) threshold
}.
threshold.type=-1
search for {\(m(\bold{x}) \leq\) threshold
}.
threshold.type=0
search for both {\(m(\bold{x}) \geq\) threshold[1]
} and {\(m(\bold{x}) \leq\) threshold[2]
}.
There are two ways of using PRIM. One is prim.box
with
pre-specified threshold(s). This is appropriate when the threshold(s)
are known to produce good estimates.
On the other hand, if the user doesn't provide threshold values then
prim.box
computes box sequences which cover the data
range. These can then be pruned at a later stage. prim.hdr
allows the user to specify many different threshold values in an
efficient manner, without having to recomputing the entire PRIM box
sequence. prim.combine
can be used to join the regions computed
from prim.hdr
. See the examples below.