mscmt
performs the Multivariate Synthetic Control Method Using Time
Series.
mscmt(
data,
treatment.identifier = NULL,
controls.identifier = NULL,
times.dep = NULL,
times.pred = NULL,
agg.fns = NULL,
placebo = FALSE,
placebo.with.treated = FALSE,
univariate = FALSE,
univariate.with.dependent = FALSE,
check.global = TRUE,
inner.optim = "wnnlsOpt",
inner.opar = list(),
outer.optim = "DEoptC",
outer.par = list(),
outer.opar = list(),
std.v = c("sum", "mean", "min", "max"),
alpha = NULL,
beta = NULL,
gamma = NULL,
return.ts = TRUE,
single.v = FALSE,
verbose = TRUE,
debug = FALSE,
seed = NULL,
cl = NULL,
times.pred.training = NULL,
times.dep.validation = NULL,
v.special = integer(),
cv.alpha = 0,
spec.search.treated = FALSE,
spec.search.placebos = FALSE
)
An object of class "mscmt"
, which is essentially a list
containing the results of the estimation and, if applicable, the placebo
study.
The most important list elements are
the weight vector w
for the control units,
a matrix v
with weight vectors for the predictors in its
columns,
scalars loss.v
and rmspe
with the dependent loss and its
square root,
a vector loss.w
with the predictor losses corresponding to the
various weight vectors in the columns of v
,
a matrix predictor.table
containing aggregated statistics of
predictor values (similar to list element tab.pred
of
function synth.tab
of package 'Synth'
),
a list of multivariate time series combined
containing,
for each dependent and predictor variable, a multivariate time series
with elements treated
for the actual values of the treated unit,
synth
for the synthesized values, and gaps
for the differences.
Placebo studies produce a list containing individual results for each
unit (as treated unit), starting with the original treated unit, as well
as a list element named placebo
with aggregated results for each
dependent and predictor variable.
If times.pred.training
and times.dep.validation
are not
NULL
, a cross-validation is done and a list of elements cv
with the results of the cross-validation period and main
with
the results of the main period is returned.
Typically, a list of matrices with rows corresponding to times
and columns corresponding to units for all relevant features (dependent as
well as predictor variables, identified by the list elements' names).
This might be the result of converting from a
data.frame
by using function listFromLong
.
For convenience, data
may alternatively be the
result of function dataprep
of package
'Synth'
. In this case, the parameters treatment.identifier
,
controls.identifier
, times.dep
, times.pred
,
and agg.fns
are ignored, as these input parameters are generated
automatically from data
. The parameters univariate
,
alpha
, beta
, and gamma
are ignored by fixing them to
their defaults.
Using results of dataprep
is experimental, because
the automatic generation of input parameters may fail due to lack of
information contained in results of dataprep
.
A character scalar containing the name of the
treated unit.
Must be contained in the column names of the matrices in data
.
A character vector containing the names of at
least two control units.
Entries must be contained in the column names of the matrices in data
.
A matrix with two rows (containing start times in the first and end times in the second row) and one column for each dependent variable, where the column names must exactly match the names of the corresponding dependent variables. A sequence of dates with the given start and end times of
annual dates, if the format of start/end time is "dddd", e.g. "2016",
quarterly dates, if the format of start/end time is "ddddQd", e.g. "2016Q1",
monthly dates, if the format of start/end time is "dddd?dd" with "?" different from "W" (see below), e.g. "2016/03" or "2016-10",
weekly dates, if the format of start/end time is "ddddWdd", e.g. "2016W23",
daily dates, if the format of start/end time is "dddd-dd-dd", e.g. "2016-08-18",
will be constructed; these dates are looked for in the row names of
the respective matrices in data
. In applications with
cross-validation, times.dep
belongs to the main period.
A matrix with two rows (containing start times in the first and end times in the second row) and one column for each predictor variable, where the column names must exactly match the names of the corresponding predictor variables. A sequence of dates with the given start and end times of
annual dates, if the format of start/end time is "dddd", e.g. "2016",
quarterly dates, if the format of start/end time is "ddddQd", e.g. "2016Q1",
monthly dates, if the format of start/end time is "dddd?dd" with "?" different from "W" (see below), e.g. "2016/03" or "2016-10",
weekly dates, if the format of start/end time is "ddddWdd", e.g. "2016W23",
daily dates, if the format of start/end time is "dddd-dd-dd", e.g. "2016-08-18",
will be constructed; these dates are looked for in the row names of
the respective matrices in data
. In applications with
cross-validation, times.pred
belongs to the main period.
Either NULL
(default) or a character vector containing
one name of an aggregation function for each predictor variable (i.e., each
column of times.pred
). The character string "id" may be used as a
"no-op" aggregation. Each aggregation function must accept a numeric vector
and return either a numeric scalar ("classical" MSCM) or a numeric vector
(leading to MSCM*T* if length of vector is at least two).
A logical scalar. If TRUE
, a placebo study is
performed where, apart from the treated unit, each control unit is considered
as treated unit in separate optimizations. Defaults to FALSE
.
Depending on the number of control units and the complexity of the problem,
placebo studies may take a long time to finish.
A logical scalar. If TRUE
, the treated
unit is included as control unit (for other treated units in placebo
studies). Defaults to FALSE
.
A logical scalar. If TRUE
, a series of univariate
SCMT optimizations is done (instead of one MSCMT optimization) even if
there is more than one dependent variable. Defaults to FALSE
.
A logical scalar. If TRUE
(and if
univariate
is also TRUE
), all dependent variables (contained
in the column names of times.dep
) apart from the current (real)
dependent variable are included as predictors in the series of univariate
SCMT optimizations. Defaults to FALSE
.
A logical scalar. If TRUE
(default), a check for
the feasibility of the unrestricted outer optimum (where actually no
restrictions are imposed by the predictor variables) is made before
starting the actual optimization procedure.
A character scalar containing the name of the optimization
method for the inner optimization. Defaults to "wnnlsOpt"
, which
(currently) is the only supported implementation, because it outperforms
all other inner optimizers we are aware of.
"ipopOpt"
, which uses ipop
, and
LowRankQPOpt
, which uses LowRankQP
as inner
optimizer, have experimental support for benchmark purposes.
A list containing further parameters for the inner
optimizer. Defaults to the empty list. (For "wnnlsOpt"
, there are no
meaningful further parameters.)
A character vector containing the name(s) of the
optimization method(s) for the outer optimization. Defaults to
"DEoptC"
, which (currently) is the recommended global optimizer.
The optimizers currently supported can be found in the documentation of
parameter outer.opar
, where the default control parameters for
the various optimizers are listed.
If outer.optim
has length greater
than 1, one optimization is invoked for each outer optimizer (and,
potentially, each random seed, see below), and the best result is used.
A list containing further parameters for the outer optimization procedure. Defaults to the empty list. Entries in this list may override the following hard-coded general defaults:
lb=1e-8
, corresponding to the lower bound for the ratio of
predictor weights,
opt.separate=TRUE
, corresponding
to an improved outer optimization where each predictor is treated as the
(potentially) most important predictor (i.e. with maximal weight) in
separate optimizations (one for each predictor), see [1].
A list (or a list of lists, if outer.optim
has
length greater than 1) containing further parameters for the outer
optimizer(s). Defaults to the empty list. Entries in this list may override
the following hard-coded defaults for the individual optimizers, which
are quite modest concerning the computing time.
dim
is a variable holding the problem dimension,
typically the number of predictors minus one.
Optimizer | Package | Default parameters |
DEoptC | MSCMT | nG=500 , nP=20*dim , waitgen=100 , |
minimpr=1e-14 , F=0.5 , CR=0.9 | ||
cma_es | cmaes | maxit=2500 |
crs | nloptr | maxeval=2.5e4 , xtol_rel=1e-14 , |
population=20*dim , algorithm="NLOPT_GN_CRS2_LM" | ||
DEopt | NMOF | nG=100 , nP=20*dim |
DEoptim | DEoptim | nP=20*dim |
ga | GA | maxiter=50 , monitor=FALSE , |
popSize=20*dim | ||
genoud | rgenoud | print.level=0 , max.generations=70 , |
solution.tolerance=1e-12 , pop.size=20*dim , | ||
wait.generations=dim , boundary.enforcement=2 , | ||
gradient.check=FALSE , MemoryMatrix=FALSE | ||
GenSA | GenSA | max.call=1e7 , max.time=25/dim , |
trace.mat=FALSE | ||
isres | nloptr | maxeval=2e4 , xtol_rel=1e-14 , |
population=20*dim , algorithm="NLOPT_GN_ISRES" | ||
malschains | Rmalschains | popsize=20*dim , maxEvals=25000 |
nlminbOpt | MSCMT/stats | nrandom=30 |
optimOpt | MSCMT/stats | nrandom=25 |
PSopt | NMOF | nG=100 , nP=20*dim |
psoptim | pso | maxit=700 |
soma | soma | nMigrations=100 |
If outer.opar
is a list of lists, its names must correspond to (a
subset of) the outer optimizers chosen in outer.optim
.
A character scalar containing one of the function names
"sum", "mean", "min", or "max" for the standardization of the predictor
weights (weights are divided by std.v(weights)
before reporting).
Defaults to "sum", partial matching allowed.
A numerical vector with weights for the dependent variables
in an MSCMT optimization or NULL
(default). If not NULL
,
the length of alpha
must agree with the number of dependent
variables, NULL
is equivalent to weight 1 for all dependent
variables.
Either NULL
(default), a numerical vector, or a list.
If beta
is a numerical vector or a list, its length must agree
with the number of dependent variables.
If beta
is a numerical vector,
the i
th dependent variable is discounted with discount factor
beta[i]
(the observations of the dependent variables must thus be
in chronological order!).
If beta
is a list, the components of beta
must be
numerical vectors with lengths corresponding to the numbers of observations
for the individual dependent variables. These observations are then
multiplied with the corresponding component of beta
.
Either NULL
(default), a numerical vector, or a list.
If gamma
is a numerical vector or a list, its length must agree
with the number of predictor variables.
If gamma
is a numerical vector,
the output of agg.fns[i]
applied to the i
th predictor variable
is discounted with discount factor gamma[i]
(the output of
agg.fns[i]
must therefore be in chronological order!).
If gamma
is a list, the components of gamma
must be
numerical vectors with lengths corresponding to the lengths of the output of
agg.fns
for the individual predictor variables. The output of
agg.fns
is then multiplied with the corresponding component of
gamma
.
A logical scalar. If TRUE
(default), most results are
converted to time series.
A logical scalar. If FALSE
(default), a selection
of feasible (optimal!) predictor weight vectors is generated. If TRUE
,
the one optimal weight vector which has maximal order statistics is generated
to facilitate cross validation studies.
A logical scalar. If TRUE
(default), output is verbose.
A logical scalar. If TRUE
, output is very verbose.
Defaults to FALSE
.
A numerical vector or NULL
. If not NULL
, the
random number generator is initialized with the elements of seed
via
set.seed(seed)
(see Random) before
calling the optimizer, performing repeated optimizations (and staying with
the best) if seed
has length greater than 1. Defaults to NULL
.
If not NULL
, the seeds int.seed
(default: 53058) and
unif.seed
(default: 812821) for genoud
are
also initialized to the corresponding element of seed
, but this can
be overridden with the list elements int.seed
and unif.seed
of (the corresponding element of) outer.opar
.
NULL
(default) or an object of class cluster
obtained by makeCluster
of package parallel
.
Repeated estimations (see outer.optim
and seed
) and
placebo studies will make use of the cluster cl
(if not NULL
).
A matrix with two rows (containing start times in
the first and end times in the second row) and one column for each predictor
variable, where the column names must exactly match the names of the
corresponding predictor variables (or NULL
by default).
If not NULL
, times.pred.training
defines training periods
for cross-validation applications. For the format of the start and end times,
see the documentation of parameter times.pred
.
A matrix with two rows (containing start times in
the first and end times in the second row) and one column for each dependent
variable, where the column names must exactly match the names of the
corresponding dependent variables (or NULL
by default).
If not NULL
, times.dep.validation
defines validation period(s)
for cross-validation applications. For the format of the start and end times,
see the documentation of parameter times.dep
.
integer vector containing indices of important predictors with special treatment (see below). Defaults to the empty set.
numeric scalar containing the minimal proportion (of the
maximal feasible weight) for the weights of the predictors selected by
v.special
. Defaults to 0
.
A logical scalar. If TRUE
, a specification
search (for the optimal set of included predictors) is done for the treated unit. Defaults to FALSE
.
A logical scalar. If TRUE
, a specification
search (for the optimal set of included predictors) is done for the control unit. Defaults to FALSE
.
mscmt
combines, if necessary, the preparation of the raw data (which
is expected to be in "list" format, possibly after conversion from a
data.frame
with function listFromLong
) and the call to the appropriate
MSCMT optimization procedures (depending on the input parameters).
For details on the input parameters alpha
, beta
, and
gamma
, see [1]. For details on cross-validation, see [2].
[1] FastReliableMSCMT
[2] CVMSCMT
if (FALSE) {
## for examples, see the package vignettes:
browseVignettes(package="MSCMT")
}
Run the code above in your browser using DataLab