(Robustly) sequence groups of candidate predictors and their respective lagged values according to their predictive content and find the optimal model along the sequence. Note that lagged values of the response are included as a predictor group as well.
tslars(x, ...)# S3 method for formula
tslars(formula, data, ...)
# S3 method for default
tslars(
x,
y,
h = 1,
pMax = 3,
sMax = NA,
fit = TRUE,
s = c(0, sMax),
crit = "BIC",
ncores = 1,
cl = NULL,
model = TRUE,
...
)
rtslars(x, ...)
# S3 method for formula
rtslars(formula, data, ...)
# S3 method for default
rtslars(
x,
y,
h = 1,
pMax = 3,
sMax = NA,
centerFun = median,
scaleFun = mad,
regFun = lmrob,
regArgs = list(),
combine = c("min", "euclidean", "mahalanobis"),
winsorize = FALSE,
const = 2,
prob = 0.95,
fit = TRUE,
s = c(0, sMax),
crit = "BIC",
ncores = 1,
cl = NULL,
seed = NULL,
model = TRUE,
...
)
If fit
is FALSE
, an integer matrix in which each column
contains the indices of the sequenced predictor series for the corresponding
lag length.
Otherwise an object of class "tslars"
with the following components:
pFit
a list containing the fits for the respective lag
lengths (see tslarsP
).
pOpt
an integer giving the optimal number of lags.
pMax
the maximum number of lags considered.
x
the matrix of candidate predictor series (if model
is TRUE
).
y
the response series (if model
is TRUE
).
call
the matched function call.
a numeric matrix or data frame containing the candidate predictor series.
additional arguments to be passed down.
a formula describing the full model.
an optional data frame, list or environment (or object coercible
to a data frame by as.data.frame
) containing the variables in
the model. If not found in data, the variables are taken from
environment(formula)
, typically the environment from which
tslars
or rtslars
is called.
a numeric vector containing the response series.
an integer giving the forecast horizon (defaults to 1).
an integer giving the maximum number of lags in the model (defaults to 3).
an integer giving the number of predictor series to be
sequenced. If it is NA
(the default), predictor groups are sequenced
as long as there are twice as many observations as predictor variables.
a logical indicating whether to fit submodels along the sequence
(TRUE
, the default) or to simply return the sequence (FALSE
).
an integer vector of length two giving the first and last
step along the sequence for which to compute submodels. The default
is to start with a model containing only an intercept (step 0) and
iteratively add all series along the sequence (step sMax
). If
the second element is NA
, predictor groups are added to the
model as long as there are twice as many observations as predictor
variables. If only one value is supplied, it is recycled.
a character string specifying the optimality criterion to be
used for selecting the final model. Currently, only "BIC"
for the
Bayes information criterion is implemented.
a positive integer giving the number of processor cores to be
used for parallel computing (the default is 1 for no parallelization). If
this is set to NA
, all available processor cores are used. For
each lag length, parallel computing for obtaining the data cleaning weights
and for fitting models along the sequence is implemented on the R level
using package parallel. Otherwise parallel computing for some of of
the more computer-intensive computations in the sequencing step is
implemented on the C++ level via OpenMP (https://www.openmp.org/).
a parallel cluster for parallel computing as generated by
makeCluster
. This is preferred over ncores
for tasks that are parallelized on the R level, in which case ncores
is only used for tasks that are parallelized on the C++ level.
a logical indicating whether the model data should be included in the returned object.
a function to compute a robust estimate for the center
(defaults to median
).
a function to compute a robust estimate for the scale
(defaults to mad
).
a function to compute robust linear regressions that can be
interpreted as weighted least squares (defaults to
lmrob
).
a list of arguments to be passed to regFun
.
a character string specifying how to combine the data
cleaning weights from the robust regressions with each predictor group.
Possible values are "min"
for taking the minimum weight for each
observation, "euclidean"
for weights based on Euclidean distances
of the multivariate set of standardized residuals (i.e., multivariate
winsorization of the standardized residuals assuming independence), or
"mahalanobis"
for weights based on Mahalanobis distances of the
multivariate set of standardized residuals (i.e., multivariate winsorization
of the standardized residuals).
a logical indicating whether to clean the data by multivariate winsorization.
numeric; tuning constant for multivariate winsorization to be used in the initial corralation estimates based on adjusted univariate winsorization (defaults to 2).
numeric; probability for the quantile of the \(\chi^{2}\) distribution to be used in multivariate winsorization (defaults to 0.95).
optional initial seed for the random number generator
(see .Random.seed
), which is useful because many robust
regression functions (including lmrob
) involve
randomness. On parallel R worker processes, random number streams are
used and the seed is set via clusterSetRNGStream
.
Andreas Alfons, based on code by Sarah Gelper
Alfons, A., Croux, C. and Gelper, S. (2016) Robust groupwise least angle regression. Computational Statistics & Data Analysis, 93, 421--435. tools:::Rd_expr_doi("10.1016/j.csda.2015.02.007")
coef
,
fitted
,
plot
,
predict
,
residuals
,
tslarsP
, lmrob