This function computes 2SGS as described in Agostinelli et al. (2015) and Leung and Zamar (2016). The procedure has two major steps:
In Step I, the method filters (i.e., flags and removes) cell-wise outliers using Gervini-Yohai univariate filter (Agostinelli et al., 2015)
or univariate-bivariate filter (Leung et al., 2016) or univariate-bivariate-plus-DDC filter (Leung et al., 2016; Rousseeuw and Van den Bossche, 2016).
The filtering step can be called on its own by using the function gy.filt
or DDC
.
In Step II, the method applies GSE or GRE (GSE with a Rocke-type loss function), which has been specifically designed to deal with
incomplete multivariate data with case-wise outliers, to the filted data coming from Step I. The second step can be called on its own
by using the function GSE
.
The 2SGS method is intended for continuous variables, and requires that the number of observations
n be relatively larger than 5 times the number of variables p for desirable performance (see the rejoinder in Agostinelli et al., 2015).
In our numerical studies, for n too small relative to p, 2SGS may experience a lack of convergence, especially for filtered data
sets with a proportion of complete observations less than
1/2 + (p+1)/n. To overcome this problem, partial imputation prior to estimation is proposed
(see the rejoinder in Agostinelli et al., 2015). The procedure is rather ad hoc, but initial numerical experiements
show that partial imputation may work. Further research on this topic is still needed.
By default, partial imputation is not used, unless specified.
In general, we warn users to use 2SGS with caution for data set with n relatively smaller than 5 times p.
The application to the chemical data set analyzed in Agostinelli et al. (2015) can be found in geochem
.
The tools that were used to generate contaminated data in the simulation study in Agostinelli et al. (2015) can be found in generate.cellcontam
and generate.casecontam
.