Tuning parameters for lmrob
, the MM-type regression
estimator and the associated S-, M- and D-estimators. Using
setting="KS2011"
sets the defaults as suggested by
Koller and Stahel (2011) and analogously for "KS2014"
.
The .M*.default
function
s and
.M*.defaults
list
s contain default tuning
parameters for all the predefined \(\psi\) functions, see also
Mpsi
, etc.
lmrob.control(setting, seed = NULL, nResample = 500,
tuning.chi = NULL, bb = 0.5, tuning.psi = NULL,
max.it = 50, groups = 5, n.group = 400,
k.fast.s = 1, best.r.s = 2,
k.max = 200, maxit.scale = 200, k.m_s = 20,
refine.tol = 1e-7, rel.tol = 1e-7, scale.tol = 1e-10, solve.tol = 1e-7,
trace.lev = 0,
mts = 1000, subsampling = c("nonsingular", "simple"),
compute.rd = FALSE, method = "MM", psi = "bisquare",
numpoints = 10, cov = NULL,
split.type = c("f", "fi", "fii"), fast.s.large.n = 2000,
eps.outlier = function(nobs) 0.1 / nobs,
eps.x = function(maxx) .Machine$double.eps^(.75)*maxx,
compute.outlier.stats = method,
warn.limit.reject = 0.5,
warn.limit.meanrw = 0.5, ...).Mchi.tuning.defaults
.Mchi.tuning.default(psi)
.Mpsi.tuning.defaults
.Mpsi.tuning.default(psi)
.Mchi.tuning.default(psi)
and .Mpsi.tuning.default(psi)
return a short numeric
vector of tuning constants which
are defaults for the corresponding psi-function, see the Details.
They are based on the named list
s
.Mchi.tuning.defaults
and .Mpsi.tuning.defaults
,
respectively.
lmrob.control()
returns a named list
with over
twenty components, corresponding to the arguments, where
tuning.psi
and tuning.chi
are typically computed, as
.Mpsi.tuning.default(psi)
or .Mchi.tuning.default(psi)
,
respectively.
a string specifying alternative default values. Leave
empty for the defaults or use "KS2011"
or "KS2014"
for the defaults suggested by Koller and Stahel (2011, 2017).
See Details.
NULL
or an integer vector compatible with
.Random.seed
: the seed to be used for random
re-sampling used in obtaining candidates for the initial
S-estimator. The current value of .Random.seed
will be
preserved if seed
is set, i.e. non-NULL
;
otherwise, as by default, .Random.seed
will be used and
modified as usual from calls to runif()
etc.
number of re-sampling candidates to be used to find the initial S-estimator. Currently defaults to 500 which works well in most situations (see references).
tuning constant vector for the S-estimator. If
NULL
, as by default, sensible defaults are set (depending on
psi
) to yield a 50% breakdown estimator. See Details.
expected value under the normal model of the
“chi” (rather \(\rho (rho)\)) function with tuning
constant equal to tuning.chi
. This is used to compute the
S-estimator.
tuning constant vector for the redescending
M-estimator. If NULL
, as by default, this is set (depending
on psi
) to yield an estimator with asymptotic efficiency of
95% for normal errors. See Details.
integer specifying the maximum number of IRWLS iterations.
(for the fast-S algorithm): Number of random subsets to use when the data set is large.
(for the fast-S algorithm): Size of each of the
groups
above. Note that this must be at least \(p\).
(for the fast-S algorithm): Number of local improvement steps (“I-steps”) for each re-sampling candidate.
(for the M-S algorithm): specifies after how many unsucessful refinement steps the algorithm stops.
(for the fast-S algorithm): Number of of best candidates to be iterated further (i.e., “refined”); is denoted \(t\) in Salibian-Barrera & Yohai(2006).
(for the fast-S algorithm): maximal number of refinement steps for the “fully” iterated best candidates.
integer specifying the maximum number of C level
find_scale()
iterations.
(for the fast-S algorithm): relative convergence tolerance for the fully iterated best candidates.
(for the RWLS iterations of the MM algorithm): relative convergence tolerance for the parameter vector.
(for the scale estimation iterations of the S algorithm): relative
convergence tolerance for the scale
\(\sigma(.)\).
(for the S algorithm): relative
tolerance for inversion. Hence, this corresponds to
solve.default()
's tol
.
integer indicating if the progress of the MM-algorithm
should be traced (increasingly); default trace.lev = 0
does
no tracing.
maximum number of samples to try in subsampling algorithm.
type of subsampling to be used, a string:
"simple"
for simple subsampling (default prior to version 0.9),
"nonsingular"
for nonsingular subsampling. See also
lmrob.S
.
logical indicating if robust distances (based on
the MCD robust covariance estimator covMcd
) are to be
computed for the robust diagnostic plots. This may take some
time to finish, particularly for large data sets, and can lead to
singularity problems when there are factor
explanatory
variables (with many levels, or levels with “few”
observations). Hence, is FALSE
by default.
string specifying the estimator-chain. MM
is interpreted as SM
. See Details of
lmrob
for a description of the possible values.
string specifying the type \(\psi\)-function
used. See Details of lmrob
. Defaults to
"bisquare"
for S and MM-estimates, otherwise "lqq"
.
number of points used in Gauss quadrature.
function or string with function name to be used to
calculate covariance matrix estimate. The default is
if(method %in% c('SM', 'MM')) ".vcov.avar1" else ".vcov.w"
.
See Details of lmrob
.
determines how categorical and continuous variables
are split. See splitFrame
.
minimum number of observations required to switch from ordinary “fast S” algorithm to an efficient “large n” strategy.
limit on the robustness weight below which an observation
is considered to be an outlier.
Either a numeric(1) or a function that takes the number of observations as
an argument. Used in summary.lmrob
and
outlierStats
.
limit on the absolute value of the elements of the design matrix below which an element is considered zero. Either a numeric(1) or a function that takes the maximum absolute value in the design matrix as an argument.
vector of character
strings, each valid to be used as method
argument. Used to
specify for which estimators outlier statistics (and warnings)
should be produced. Set to empty string if none are required.
limit of ratio
\(\#\mbox{rejected} / \#\mbox{obs in level}\)
above (\(\geq\)) which a warning is produced.
Set to NULL
to disable warning.
limit of the mean robustness per factor level
below which (\(\leq\)) a warning is produced.
Set to NULL
to disable warning.
further arguments to be added as list
components to the result, e.g., those to be used in .vcov.w()
.
Matias Salibian-Barrera, Martin Maechler and Manuel Koller
The option setting="KS2011"
alters the default
arguments. They are changed to method = "SMDM"
, psi = "lqq"
,
max.it = 500
, k.max = 2000
, cov = ".vcov.w"
.
The defaults of all the remaining arguments are not changed.
The option setting="KS2014"
builds upon setting="KS2011"
.
More arguments are changed to best.r.s = 20, k.fast.s = 2,
nResample = 1000
. This setting should produce more stable estimates
for designs with factor
s.
By default, and in .Mpsi.tuning.default()
and .Mchi.tuning.default()
,
tuning.chi
and tuning.psi
are set to yield an
MM-estimate with breakdown point \(0.5\) and efficiency of 95% at
the normal.
If numeric tuning.chi
or tuning.psi
are specified, say
cc
, for psi = "ggw"
or "lqq"
,
.psi.const(cc, psi)
is used, see its help page.
To get the defaults, e.g., .Mpsi.tuning.default(psi)
is
equivalent to but more efficient than the formerly widely used
lmrob.control(psi = psi)$tuning.psi
.
These defaults are:
psi | tuning.chi | tuning.psi |
bisquare | 1.54764 | 4.685061 |
welsh | 0.5773502 | 2.11 |
ggw | c(-0.5, 1.5, NA, 0.5) | c(-0.5, 1.5, 0.95, NA) |
lqq | c(-0.5, 1.5, NA, 0.5) | c(-0.5, 1.5, 0.95, NA) |
optimal | 0.4047 | 1.060158 |
hampel | c(1.5, 3.5, 8)*0.2119163 | c(1.5, 3.5, 8)*0.9014 |
The values for the tuning constant for the ggw
and lqq
psi functions are specified differently here by a vector with four
elements: minimal slope, b (controlling the bend at the maximum of the curve),
efficiency, breakdown point.
Use NA
for an unspecified value of either efficiency or
breakdown point, see examples in the tables (above and below).
For these table examples, the respective “inner constants” are
stored precomputed, see .psi.lqq.findc
for more.
The constants for the "hampel"
psi function are chosen to have a
redescending slope of \(-1/3\). Constants for a slope of \(-1/2\)
would be
psi | tuning.chi | tuning.psi |
"hampel" | c(2, 4, 8) * 0.1981319 | c(2, 4, 8) * 0.690794 |
Alternative coefficients for an efficiency of 85% at the normal are given in the table below.
psi | tuning.psi |
bisquare | 3.443689 |
welsh | 1.456 |
ggw , lqq | c(-0.5, 1.5, 0.85, NA) |
optimal | 0.8684 |
hampel (-1/3) | c(1.5, 3.5, 8)* 0.5704545 |
hampel (-1/2) | c( 2, 4, 8) * 0.4769578 |
Koller, M. and Stahel, W.A. (2011) Sharpening Wald-type inference in robust regression for small samples. Computational Statistics & Data Analysis 55(8), 2504--2515.
Koller, M. and Stahel, W.A. (2017)
Nonsingular subsampling for regression S~estimators with categorical predictors,
Computational Statistics 32(2): 631--646.
tools:::Rd_expr_doi("10.1007/s00180-016-0679-x").
Referred as "KS2014"
everywhere in robustbase; A shorter first
version, Koller (2012) has been available from https://arxiv.org/abs/1208.5595.
Mpsi
, etc, for the (fast!) psi function computations;
lmrob
, also for references and examples.
## Show the default settings:
str(lmrob.control())
## Artificial data for a simple "robust t test":
set.seed(17)
y <- y0 <- rnorm(200)
y[sample(200,20)] <- 100*rnorm(20)
gr <- as.factor(rbinom(200, 1, prob = 1/8))
lmrob(y0 ~ 0+gr)
## Use Koller & Stahel(2011)'s recommendation but a larger 'max.it':
str(ctrl <- lmrob.control("KS2011", max.it = 1000))
str(.Mpsi.tuning.defaults)
stopifnot(identical(.Mpsi.tuning.defaults,
sapply(names(.Mpsi.tuning.defaults),
.Mpsi.tuning.default)))
## Containing (names!) all our (pre-defined) redescenders:
str(.Mchi.tuning.defaults)
## Difference between settings:
C11 <- lmrob.control("KS2011")
C14 <- lmrob.control("KS2014")
str(C14)
## Apart from `setting` itself, they only differ in three places:
diffC <- names(which(!mapply(identical, C11,C14, ignore.environment=TRUE)))
cbind(KS11 = unlist(C11[diffC[-1]]),
KS14 = unlist(C14[diffC[-1]]))
## KS11 KS14
## nResample 500 1000
## best.r.s 2 20
## k.fast.s 1 2
Run the code above in your browser using DataLab